Fast in-memory XPath search using compressed indexes
Author
dc.contributor.author
Arroyuelo, Diego
Author
dc.contributor.author
Claude, Francisco
Author
dc.contributor.author
Maneth, Sebastian
Author
dc.contributor.author
Mäkinen, Veli
Author
dc.contributor.author
Navarro, Gonzalo
Author
dc.contributor.author
Nguyen, Kim
Author
dc.contributor.author
Sirén, Jouni
Author
dc.contributor.author
Välimäki, Niko
Admission date
dc.date.accessioned
2015-08-25T02:21:52Z
Available date
dc.date.available
2015-08-25T02:21:52Z
Publication date
dc.date.issued
2015
Cita de ítem
dc.identifier.citation
Software-Practice and Experience. Volumen: 45 Número: 3 Páginas: 399-434
en_US
Identifier
dc.identifier.other
DOI: 10.1002/spe.2227
Identifier
dc.identifier.uri
https://repositorio.uchile.cl/handle/2250/133086
General note
dc.description
Artículo de publicación ISI
en_US
Abstract
dc.description.abstract
Extensible Markup Language (XML) documents consist of text data plus structured data (markup). XPath
allows to query both text and structure. Evaluating such hybrid queries is challenging. We present a system
for in-memory evaluation of XPath search queries, that is, queries with text and structure predicates,
yet without advanced features such as backward axes, arithmetics, and joins. We show that for this query
fragment, which contains Forward Core XPath, our system, dubbed Succinct XML Self-Index (‘SXSI’),
outperforms existing systems by 1–3 orders of magnitude. SXSI is based on state-of-the-art indexes for text
and structure data. It combines two novelties. On one hand, it represents the XML data in a compact indexed
form, which allows it to handle larger collections in main memory while supporting powerful search and
navigation operations over the text and the structure. On the other hand, it features an execution engine that
uses tree automata and cleverly chooses evaluation orders that leverage the speeds of the respective indexes.
SXSI is modular and allows seamless replacement of its indexes. This is demonstrated through experiments
with (1) a text index specialized for search of bio sequences, and (2) a word-based text index specialized for
natural language search.