New Space / Time Tradeo s for Top- k Document Retrieval on Sequences

Navarro, Gonzalo; Thankachan, Sharma V.

Artículo

Open/Download

New-space-time-tradeoffs-for-top-k-document-retrieval-on-sequences.pdf (406.8Kb)

Publication date

2014

Metadata

Show full item record

Cómo citar

New Space / Time Tradeo s for Top- k Document Retrieval on SequencesFormato de cita

Copiar

Cerrar

Author

Abstract

We address the problem of indexing a collection D = f T 1 ; T 2 ;::: T D g of D string documents of total length n , so that we can e ciently answer top-k queries : retrieve k documents most relevant to a pattern P of length p given at query time. There exist linear-space data structures, that is, using O ( n ) words, that answer such queries in optimal O ( p + k ) time for an ample set of notions of relevance. However, using linear space is not su ciently good for large text collections. In this paper we explore how far the space / time tradeo for this problem can be pushed. We obtain three results: (1) When relevance is measured as term frequency (number of times P appears in a document T i ), an index occupying j CSA j + o ( n ) bits answers the query in time O ( t search ( p ) + k lg 2 k lg " n ), where CSA is a compressed su x array indexing D , t search is its time to find the su x array interval of P , and " > 0 is any constant. (2) With the same measure of relevance, an index occupying j CSA j + n lg D + o ( n lg + n lg D ) bits answers the query in time O ( t search ( p ) + k lg k ), where lg k is the iterated logarithm of k . (3) When the relevance depends only on the documents, an index occupying j CSA j + O ( n lg lg n ) bits answers the query in O ( t search ( p ) + k t SA ) time, where t SA is the time the CSA needs to retrieve a su x array cell. On our way, we obtain some other results of independent interest

General note

Artículo de publicación ISI

Identifier

URI: https://repositorio.uchile.cl/handle/2250/126562
DOI: dx.doi.org/10.1016/j.tcs.2014.05.005

Quote Item

Theoretical Computer Science 542 (2014) 83–97

Collections

Artículos de revistas

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile