Fast in-memory XPath search using compressed indexes

Arroyuelo, Diego; Claude, Francisco; Maneth, Sebastian; Mäkinen, Veli; Navarro, Gonzalo; Nguyen, Kim; Sirén, Jouni; Välimäki, Niko

Author	dc.contributor.author	Arroyuelo, Diego
Author	dc.contributor.author	Claude, Francisco
Author	dc.contributor.author	Maneth, Sebastian
Author	dc.contributor.author	Mäkinen, Veli
Author	dc.contributor.author	Navarro, Gonzalo
Author	dc.contributor.author	Nguyen, Kim
Author	dc.contributor.author	Sirén, Jouni
Author	dc.contributor.author	Välimäki, Niko
Admission date	dc.date.accessioned	2015-08-25T02:21:52Z
Available date	dc.date.available	2015-08-25T02:21:52Z
Publication date	dc.date.issued	2015
Cita de ítem	dc.identifier.citation	Software-Practice and Experience. Volumen: 45 Número: 3 Páginas: 399-434	en_US
Identifier	dc.identifier.other	DOI: 10.1002/spe.2227
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/133086
General note	dc.description	Artículo de publicación ISI	en_US
Abstract	dc.description.abstract	Extensible Markup Language (XML) documents consist of text data plus structured data (markup). XPath allows to query both text and structure. Evaluating such hybrid queries is challenging. We present a system for in-memory evaluation of XPath search queries, that is, queries with text and structure predicates, yet without advanced features such as backward axes, arithmetics, and joins. We show that for this query fragment, which contains Forward Core XPath, our system, dubbed Succinct XML Self-Index (‘SXSI’), outperforms existing systems by 1–3 orders of magnitude. SXSI is based on state-of-the-art indexes for text and structure data. It combines two novelties. On one hand, it represents the XML data in a compact indexed form, which allows it to handle larger collections in main memory while supporting powerful search and navigation operations over the text and the structure. On the other hand, it features an execution engine that uses tree automata and cleverly chooses evaluation orders that leverage the speeds of the respective indexes. SXSI is modular and allows seamless replacement of its indexes. This is demonstrated through experiments with (1) a text index specialized for search of bio sequences, and (2) a word-based text index specialized for natural language search.	en_US
Patrocinador	dc.description.sponsorship	Fondecyt, Chile 1-110066	en_US
Lenguage	dc.language.iso	en	en_US
Publisher	dc.publisher	Wiley-Blackwell	en_US
Type of license	dc.rights	Atribución-NoComercial-SinDerivadas 3.0 Chile	*
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/	*
Keywords	dc.subject	XML	en_US
Keywords	dc.subject	Succinct data structures	en_US
Keywords	dc.subject	XPath	en_US
Keywords	dc.subject	Tree automata	en_US
Título	dc.title	Fast in-memory XPath search using compressed indexes	en_US
Document type	dc.type	Artículo de revista

Files in this item

Name:: Fast-in-memory-XPath-search-us ...
Size:: 3.503Mb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Atribución-NoComercial-SinDerivadas 3.0 Chile