Show simple item record

Authordc.contributor.authorGagie, Travis 
Authordc.contributor.authorNavarro, Gonzalo 
Authordc.contributor.authorPrezza, Nicola 
Admission datedc.date.accessioned2021-03-15T19:17:41Z
Available datedc.date.available2021-03-15T19:17:41Z
Publication datedc.date.issued2020
Cita de ítemdc.identifier.citationJournal of the ACM Volumen: 67 Número: 1 Apr 2020es_ES
Identifierdc.identifier.other10.1145/3375890
Identifierdc.identifier.urihttps://repositorio.uchile.cl/handle/2250/178684
Abstractdc.description.abstractIndexing highly repetitive texts-such as genomic databases, software repositories and versioned text collections-has become an important problem since the turn of the millennium. A relevant compressibility measure for repetitive texts is r, the number of runs in their Burrows-Wheeler Transforms (BWTs). One of the earliest indexes for repetitive collections, the Run-Length FM-index, used O(r) space and was able to efficiently count the number of occurrences of a pattern of length m in a text of length n (in O(m log log n) time, with current techniques). However, it was unable to locate the positions of those occurrences efficiently within a space bounded in terms of r. In this article, we close this long-standing problem, showing how to extend the Run-Length FM-index so that it can locate the occ occurrences efficiently (in O(occ log log n) time) within O(r) space. By raising the space to O(r log log n), our index counts the occurrences in optimal time, O(m), and locates them in optimal time as well, O(m + occ). By further raising the space by an 0(w/ log sigma) factor, where es is the alphabet size and w = Omega(log n) is the RAM machine size in bits, we support count and locate in O([m log(sigma)/w]) and O([m log(sigma)/w] + occ) time, which is optimal in the packed setting and had not been obtained before in compressed space. We also describe a structure using O(r log(n/r)) space that replaces the text and extracts any text substring of length (sic) in the almost-optimal time O(log(n/ r) + l log(sigma)/w). Within that space, we similarly provide access to arbitrary suffix array, inverse suffix array, and longest common prefix array cells in time O(log(n/r)), and extend these capabilities to full suffix tree functionality, typically in O(log(n1r)) time per operation. Our experiments show that our O(r)-space index outperforms the space-competitive alternatives by 1-2 orders of magnitude in time. Competitive implementations of the original FM-index are outperformed by 1-2 orders of magnitude in space and/or 2-3 in time.es_ES
Patrocinadordc.description.sponsorshipBasal Funds FB0001 Comision Nacional de Investigacion Cientifica y Tecnologica (CONICYT) CONICYT FONDECYT 1-170048 1-171058 project MIUR-SIR CMACBioSeq ("Combinatorial methods for analysis and compression of biological sequences") RBSI146R5Les_ES
Lenguagedc.language.isoenes_ES
Publisherdc.publisherAssoc Computing Machinery, USAes_ES
Type of licensedc.rightsAttribution-NonCommercial-NoDerivs 3.0 Chile*
Link to Licensedc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/cl/*
Sourcedc.sourceJournal of the ACMes_ES
Keywordsdc.subjectRepetitive string collectionses_ES
Keywordsdc.subjectCompressed text indexeses_ES
Keywordsdc.subjectBurrows-Wheeler transformes_ES
Keywordsdc.subjectCompressed suffix treeses_ES
Títulodc.titleFully functional suffix trees and optimal text searching in BWT-Runs bounded spacees_ES
Document typedc.typeArtículo de revistaes_ES
dcterms.accessRightsdcterms.accessRightsAcceso Abierto
Catalogueruchile.catalogadorcfres_ES
Indexationuchile.indexArtículo de publicación ISI
Indexationuchile.indexArtículo de publicación SCOPUS


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Chile
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile