Show simple item record

Authordc.contributor.authorNavarro, Gonzalo 
Authordc.contributor.authorThankachan, Sharma V. es_CL
Admission datedc.date.accessioned2014-12-15T12:47:03Z
Available datedc.date.available2014-12-15T12:47:03Z
Publication datedc.date.issued2014
Cita de ítemdc.identifier.citationTheoretical Computer Science 542 (2014) 83–97en_US
Identifierdc.identifier.otherdx.doi.org/10.1016/j.tcs.2014.05.005
Identifierdc.identifier.urihttps://repositorio.uchile.cl/handle/2250/126562
General notedc.descriptionArtículo de publicación ISIen_US
Abstractdc.description.abstractWe address the problem of indexing a collection D = f T 1 ; T 2 ;::: T D g of D string documents of total length n , so that we can e ciently answer top-k queries : retrieve k documents most relevant to a pattern P of length p given at query time. There exist linear-space data structures, that is, using O ( n ) words, that answer such queries in optimal O ( p + k ) time for an ample set of notions of relevance. However, using linear space is not su ciently good for large text collections. In this paper we explore how far the space / time tradeo for this problem can be pushed. We obtain three results: (1) When relevance is measured as term frequency (number of times P appears in a document T i ), an index occupying j CSA j + o ( n ) bits answers the query in time O ( t search ( p ) + k lg 2 k lg " n ), where CSA is a compressed su x array indexing D , t search is its time to find the su x array interval of P , and " > 0 is any constant. (2) With the same measure of relevance, an index occupying j CSA j + n lg D + o ( n lg + n lg D ) bits answers the query in time O ( t search ( p ) + k lg k ), where lg k is the iterated logarithm of k . (3) When the relevance depends only on the documents, an index occupying j CSA j + O ( n lg lg n ) bits answers the query in O ( t search ( p ) + k t SA ) time, where t SA is the time the CSA needs to retrieve a su x array cell. On our way, we obtain some other results of independent interesten_US
Lenguagedc.language.isoenen_US
Publisherdc.publisherElsevieren_US
Type of licensedc.rightsAttribution-NonCommercial-NoDerivs 3.0 Chile*
Link to Licensedc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/cl/*
Keywordsdc.subjectDocument retrievalen_US
Títulodc.titleNew Space / Time Tradeo s for Top- k Document Retrieval on Sequencesen_US
Document typedc.typeArtículo de revista


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Chile
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile