Approximate String Matching with Compressed Indexes

Russo, Luís M. S.; Navarro, Gonzalo; Oliveira, Arlindo L.; Morales, Pedro

Author	dc.contributor.author	Russo, Luís M. S.
Author	dc.contributor.author	Navarro, Gonzalo	es_CL
Author	dc.contributor.author	Oliveira, Arlindo L.	es_CL
Author	dc.contributor.author	Morales, Pedro	es_CL
Admission date	dc.date.accessioned	2014-01-08T20:26:37Z
Available date	dc.date.available	2014-01-08T20:26:37Z
Publication date	dc.date.issued	2009
Cita de ítem	dc.identifier.citation	Algorithms 2009, 2, 1105-1136	en_US
Identifier	dc.identifier.issn	1999-4893
Identifier	dc.identifier.other	doi:10.3390/a2031105
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/126086
Abstract	dc.description.abstract	A compressed full-text self-index for a text T is a data structure requiring reduced space and able to search for patterns P in T. It can also reproduce any substring of T, thus actually replacing T. Despite the recent explosion of interest on compressed indexes, there has not been much progress on functionalities beyond the basic exact search. In this paper we focus on indexed approximate string matching (ASM), which is of great interest, say, in bioinformatics. We study ASM algorithms for Lempel-Ziv compressed indexes and for compressed suffix trees/arrays. Most compressed self-indexes belong to one of these classes. We start by adapting the classical method of partitioning into exact search to self-indexes, and optimize it over a representative of either class of self-index. Then, we show that a Lempel- Ziv index can be seen as an extension of the classical q-samples index. We give new insights on this type of index, which can be of independent interest, and then apply them to a Lempel- Ziv index. Finally, we improve hierarchical verification, a successful technique for sequential searching, so as to extend the matches of pattern pieces to the left or right. Most compressed suffix trees/arrays support the required bidirectionality, thus enabling the implementation of the improved technique. In turn, the improved verification largely reduces the accesses to the text, which are expensive in self-indexes. We show experimentally that our algorithms are competitive and provide useful space-time tradeoffs compared to classical indexes.	en_US
Lenguage	dc.language.iso	en	en_US
Type of license	dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Chile	*
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/	*
Keywords	dc.subject	compressed index	en_US
Título	dc.title	Approximate String Matching with Compressed Indexes	en_US
Document type	dc.type	Artículo de revista

Files in this item

Name:: Russo_Luis_Approximate string.pdf
Size:: 333.4Kb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile