New Space / Time Tradeo s for Top- k Document Retrieval on Sequences

Navarro, Gonzalo; Thankachan, Sharma V.

Author	dc.contributor.author	Navarro, Gonzalo
Author	dc.contributor.author	Thankachan, Sharma V.	es_CL
Admission date	dc.date.accessioned	2014-12-15T12:47:03Z
Available date	dc.date.available	2014-12-15T12:47:03Z
Publication date	dc.date.issued	2014
Cita de ítem	dc.identifier.citation	Theoretical Computer Science 542 (2014) 83–97	en_US
Identifier	dc.identifier.other	dx.doi.org/10.1016/j.tcs.2014.05.005
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/126562
General note	dc.description	Artículo de publicación ISI	en_US
Abstract	dc.description.abstract	We address the problem of indexing a collection D = f T 1 ; T 2 ;::: T D g of D string documents of total length n , so that we can e ciently answer top-k queries : retrieve k documents most relevant to a pattern P of length p given at query time. There exist linear-space data structures, that is, using O ( n ) words, that answer such queries in optimal O ( p + k ) time for an ample set of notions of relevance. However, using linear space is not su ciently good for large text collections. In this paper we explore how far the space / time tradeo for this problem can be pushed. We obtain three results: (1) When relevance is measured as term frequency (number of times P appears in a document T i ), an index occupying j CSA j + o ( n ) bits answers the query in time O ( t search ( p ) + k lg 2 k lg " n ), where CSA is a compressed su x array indexing D , t search is its time to find the su x array interval of P , and " > 0 is any constant. (2) With the same measure of relevance, an index occupying j CSA j + n lg D + o ( n lg + n lg D ) bits answers the query in time O ( t search ( p ) + k lg k ), where lg k is the iterated logarithm of k . (3) When the relevance depends only on the documents, an index occupying j CSA j + O ( n lg lg n ) bits answers the query in O ( t search ( p ) + k t SA ) time, where t SA is the time the CSA needs to retrieve a su x array cell. On our way, we obtain some other results of independent interest	en_US
Lenguage	dc.language.iso	en	en_US
Publisher	dc.publisher	Elsevier	en_US
Type of license	dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Chile	*
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/	*
Keywords	dc.subject	Document retrieval	en_US
Título	dc.title	New Space / Time Tradeo s for Top- k Document Retrieval on Sequences	en_US
Document type	dc.type	Artículo de revista

Files in this item

Name:: New-space-time-tradeoffs-for-t ...
Size:: 406.8Kb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile