Show simple item record

Authordc.contributor.authorKonow, Roberto 
Authordc.contributor.authorNavarro, Gonzalo 
Authordc.contributor.authorClarke, Charles 
Authordc.contributor.authorLópez Ortiz, Alejandro 
Admission datedc.date.accessioned2019-05-29T13:10:40Z
Available datedc.date.available2019-05-29T13:10:40Z
Publication datedc.date.issued2017
Cita de ítemdc.identifier.citationACM Transactions on Information Systems, Volumen 35, Issue 3, 2017
Identifierdc.identifier.issn15582868
Identifierdc.identifier.issn10468188
Identifierdc.identifier.other10.1145/3007186
Identifierdc.identifier.urihttps://repositorio.uchile.cl/handle/2250/168850
Abstractdc.description.abstractWe introduce a new representation of the inverted index that performs faster ranked unions and intersections while using similar space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression, we represent the treap topology using different alternative compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. We also show how to extend this representation to support incremental updates over the index. Results show that, under the tf-idf scoring scheme, our index uses about the same space as state-of-the-art compact representations, while performing up to 2-20 times faster on ranked single-word, union, or intersection queries. Under the BM25 scoring scheme, our index may use up to 40% more space than the others and outperforms them less frequently but still reaches improvement factors of 2-20 in the best cases. The index supporting incremental updates poses an overhead of 50%-100% over the static variants in terms of space, construction, and query time.
Lenguagedc.language.isoen
Publisherdc.publisherAssociation for Computing Machinery
Type of licensedc.rightsAttribution-NonCommercial-NoDerivs 3.0 Chile
Link to Licensedc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/cl/
Sourcedc.sourceACM Transactions on Information Systems
Keywordsdc.subjectCompact data structure
Keywordsdc.subjectTop-k document retrieval
Títulodc.titleInverted treaps
Document typedc.typeArtículo de revista
Catalogueruchile.catalogadorlaj
Indexationuchile.indexArtículo de publicación SCOPUS
uchile.cosechauchile.cosechaSI


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Chile
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile