Lempel–Ziv compressed structures for document retrieval

Ferrada, Héctor; Navarro, Gonzalo

Author	dc.contributor.author	Ferrada, Héctor
Author	dc.contributor.author	Navarro, Gonzalo
Admission date	dc.date.accessioned	2019-10-15T12:23:50Z
Available date	dc.date.available	2019-10-15T12:23:50Z
Publication date	dc.date.issued	2019
Cita de ítem	dc.identifier.citation	Information and Computation, Volumen 265,
Identifier	dc.identifier.issn	10902651
Identifier	dc.identifier.issn	08905401
Identifier	dc.identifier.other	10.1016/j.ic.2019.01.006
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/171625
Abstract	dc.description.abstract	Document retrieval structures index a collection of string documents, to retrieve those that are relevant to query strings p: document listing retrieves all documents where p appears; top-k retrieval retrieves the k most relevant of those. Classical structures use too much space in practice. Most current research uses compressed suffix arrays, but fast indices still use 17–21 bpc (bits per character), whereas small ones take milliseconds per returned answer. We present the first document retrieval structures based on Lempel–Ziv compression, precisely LZ78. Our structures use 7–10 bpc and dominate a large part of the space/time tradeoffs. They also enable more efficient partial or approximate answers: our document listing outputs the first 75%–80% of the answers at a rate of one per microsecond; for top-k retrieval we return a result of 90% quality at the same rate and using just 4–6 bpc. This outperforms current indices by a wide margin.
Lenguage	dc.language.iso	en
Publisher	dc.publisher	Elsevier Inc.
Type of license	dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Chile
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/
Source	dc.source	Information and Computation
Keywords	dc.subject	Compressed data structures
Keywords	dc.subject	Document listing
Keywords	dc.subject	Document retrieval
Keywords	dc.subject	String databases
Keywords	dc.subject	Top-k queries
Título	dc.title	Lempel–Ziv compressed structures for document retrieval
Document type	dc.type	Artículo de revista
Cataloguer	uchile.catalogador	SCOPUS
Indexation	uchile.index	Artículo de publicación SCOPUS
uchile.cosecha	uchile.cosecha	SI

Files in this item

Name:: item_85061048293.pdf
Size:: 1.937Kb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile