Show simple item record

Authordc.contributor.authorNavarro, Gonzalo 
Authordc.contributor.authorNekrich, Yakov 
Admission datedc.date.accessioned2019-05-29T13:10:26Z
Available datedc.date.available2019-05-29T13:10:26Z
Publication datedc.date.issued2017
Cita de ítemdc.identifier.citationSIAM Journal on Computing, Volumen 46, Issue 1, 2017, Pages 80-113
Identifierdc.identifier.issn10957111
Identifierdc.identifier.issn00975397
Identifierdc.identifier.other10.1137/140998949
Identifierdc.identifier.urihttps://repositorio.uchile.cl/handle/2250/168813
Abstractdc.description.abstractLetDbe a collection ofDdocuments, which are strings over an alphabet of sizeσ,of total lengthn. We describe a data structure that uses linear space and reportskmost relevantdocuments that contain a query patternP, which is a string of lengthppacked inp/logσnwords, intimeO(p/logσn+k). This is optimal in the RAM model in the general case where logD= Θ(logn),and involves a novel RAM-optimal suffix tree search. Our construction supports an ample setof important relevance measures, such as the number of timesPappears in a document (calledterm frequency), a fixed document importance, and the minimal distance between two occurrencesofPin a document. When logD=o(logn), we show how to reduce the space of the data structurefromO(nlogn) toO(n(logσ+ logD+ log logn)) bits, and toO(n(logσ+ logD)) bits in the caseof the popular term frequency measure of relevance, at the price of an additive termO(logεσn)in the query time, for any constantε >0. We also consider the dynamic scenario, where doc-uments can be inserted and deleted from the collection. We obtain linear space and query timeO(p(log logn)2/logσn+ logn+klog logk), whereas insertions and deletions requireO(log1+εn)time per symbol, for any constantε >0. Finally, we consider an extended static scenario wherean extra parameterpar(P,d) is defined, and the query must retrieve only documentsdsuch thatpar(P,d)∈[τ1,τ2], where this range is specified at query time. We solve these queries using linearspace andO(p/logσn+ log1+εn+klogεn) time, for any constantε >0. Our technique is to trans-late these top-kproblems into multidimensional geometric search problems. As a bonus, we describesome improvements to those problems.
Lenguagedc.language.isoen
Publisherdc.publisherSociety for Industrial and Applied Mathematics
Type of licensedc.rightsAttribution-NonCommercial-NoDerivs 3.0 Chile
Link to Licensedc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/cl/
Sourcedc.sourceSIAM Journal on Computing
Keywordsdc.subjectGeometric data structures
Keywordsdc.subjectString collections
Keywordsdc.subjectSuffix trees
Keywordsdc.subjectText retrieval
Títulodc.titleTime-optimal top-k document retrieval
Document typedc.typeArtículo de revista
Catalogueruchile.catalogadorlaj
Indexationuchile.indexArtículo de publicación SCOPUS
uchile.cosechauchile.cosechaSI


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Chile
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile