Skip to content (access key 's')
Logo of Technion
Logo of CS Department


Massively parallel query processing of large tree and graph structured databases
event speaker icon
Lila Shnaiderman, Ph.D. Thesis Seminar
event date icon
Wednesday, 10.9.2014, 10:00
event location icon
Taub 601
In light of the rapidly increasing amount of data and demand for fast query processing, increasing the efficiency of database operations continues to be a challenging and important task. XML is based on a tree-structured data model. Naturally, the most popular XML querying language (XPath) uses patterns of selection predicates on multiple hierarchically structured elements. These patterns are often abstracted by twig patterns. Finding all occurrences of such a (XML query) twig pattern in an XML document is a core operation for XML query processing. I present the Parallel Path Stack algorithm (PPS) and the Parallel Twig Stack algorithm (PTS). PPS and PTS are novel and efficient algorithms for matching (i.e., finding) XML query twig patterns over a parallel multi-threaded computing platform. PPS and PTS are based on the PathStack and TwigStack algorithms. These algorithms employ a sophisticated search technique for limiting the processing to specific subtrees. Another, recently popular, approach is to leverage the power of parallel hardware platforms. With the introduction of general purpose GPU (Graphics Processing Unit) computing, massively parallel hardware has become available as commodity hardware. I present a new algorithm, GPU-Twig, for matching twig patterns in large XML documents, using a GPU. The algorithm is based on parallel matching in a bottom-up fashion, followed by a consolidation phase. Fast query processing is important also in the Graph Databases field. I present, GGQ, algorithm that finds matches between TPQs (Tree Pattern Queries), and graph databases using the GPU platform. Its principle of operation is very different than that of GPU-Twig; it operates by relying on thread identifiers (IDs) to coordinate automated massive computing task distribution. It also addresses scale-up issues.
[Back to the index of events]