Show simple item record

Authordc.contributor.authorMartínez Prieto, Miguel A. 
Authordc.contributor.authorBrisaboa, Nieves 
Authordc.contributor.authorCánovas, Rodrigo 
Authordc.contributor.authorClaude, Francisco 
Authordc.contributor.authorNavarro, Gonzalo 
Admission datedc.date.accessioned2016-05-14T21:51:00Z
Available datedc.date.available2016-05-14T21:51:00Z
Publication datedc.date.issued2016
Cita de ítemdc.identifier.citationInformation Systems 56( 2016) 73–108en_US
Identifierdc.identifier.otherDOI: 10.1016/j.is.2015.08.008
Identifierdc.identifier.urihttps://repositorio.uchile.cl/handle/2250/138288
General notedc.descriptionArtículo de publicación ISIen_US
Abstractdc.description.abstractThe need to store and query a set of strings - a string dictionary - arises in many kinds of applications. While classically these string dictionaries have accounted for a small share of the total space budget (e.g., in Natural Language Processing or when indexing text collections), recent applications in Web engines, Semantic Web (RDF) graphs, Bioinformatics, and many others handle very large string dictionaries, whose size is a significant fraction of the whole data. In these cases, string dictionary management is a scalability issue by itself. This paper focuses on the problem of managing large static string dictionaries in compressed main memory space. We revisit classical solutions for string dictionaries like hashing, tries, and front-coding, and improve them by using compression techniques. We also introduce some novel string dictionary representations built on top of recent advances in succinct data structures and full-text indexes. All these structures are empirically compared on a heterogeneous testbed formed by real-world string dictionaries. We show that the compressed representations may use as little as 5% of the original dictionary size, while supporting lookup operations within a few microseconds. These numbers outperform the state-of-the-art space/time tradeoffs in many cases. Furthermore, we enhance some representations to provide prefix- and substring-based searches, which also perform competitively. The results show that compressed string dictionaries are a useful building block for various data-intensive applications in different domains.en_US
Patrocinadordc.description.sponsorshipSpanish Ministry of Economy and Competitiveness TIN2013-46238-C4-3-R ICT COST Action KEYSTONE IC1302 Conicyt, Chile FB0001 Fondecyt Iniciacion 11130104en_US
Lenguagedc.language.isoenen_US
Publisherdc.publisherElsevieren_US
Type of licensedc.rightsAtribución-NoComercial-SinDerivadas 3.0 Chile*
Link to Licensedc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/cl/*
Keywordsdc.subjectCompressed string dictionariesen_US
Keywordsdc.subjectText processingen_US
Keywordsdc.subjectText databasesen_US
Keywordsdc.subjectCompressed data structuresen_US
Títulodc.titlePractical compressed string dictionariesen_US
Document typedc.typeArtículo de revista


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record

Atribución-NoComercial-SinDerivadas 3.0 Chile
Except where otherwise noted, this item's license is described as Atribución-NoComercial-SinDerivadas 3.0 Chile