Approximate String Matching with Compressed Indexes
Author
dc.contributor.author
Russo, Luís M. S.
Author
dc.contributor.author
Navarro, Gonzalo
es_CL
Author
dc.contributor.author
Oliveira, Arlindo L.
es_CL
Author
dc.contributor.author
Morales, Pedro
es_CL
Admission date
dc.date.accessioned
2014-01-08T20:26:37Z
Available date
dc.date.available
2014-01-08T20:26:37Z
Publication date
dc.date.issued
2009
Cita de ítem
dc.identifier.citation
Algorithms 2009, 2, 1105-1136
en_US
Identifier
dc.identifier.issn
1999-4893
Identifier
dc.identifier.other
doi:10.3390/a2031105
Identifier
dc.identifier.uri
https://repositorio.uchile.cl/handle/2250/126086
Abstract
dc.description.abstract
A compressed full-text self-index for a text T is a data structure requiring reduced
space and able to search for patterns P in T. It can also reproduce any substring of T, thus
actually replacing T. Despite the recent explosion of interest on compressed indexes, there
has not been much progress on functionalities beyond the basic exact search. In this paper
we focus on indexed approximate string matching (ASM), which is of great interest, say,
in bioinformatics. We study ASM algorithms for Lempel-Ziv compressed indexes and for
compressed suffix trees/arrays. Most compressed self-indexes belong to one of these classes.
We start by adapting the classical method of partitioning into exact search to self-indexes, and
optimize it over a representative of either class of self-index. Then, we show that a Lempel-
Ziv index can be seen as an extension of the classical q-samples index. We give new insights
on this type of index, which can be of independent interest, and then apply them to a Lempel-
Ziv index. Finally, we improve hierarchical verification, a successful technique for sequential
searching, so as to extend the matches of pattern pieces to the left or right. Most compressed
suffix trees/arrays support the required bidirectionality, thus enabling the implementation of
the improved technique. In turn, the improved verification largely reduces the accesses to the
text, which are expensive in self-indexes. We show experimentally that our algorithms are
competitive and provide useful space-time tradeoffs compared to classical indexes.