Approximate String Matching with Lempel-Ziv Compressed Indexes
Russo, Luís M. S.
Oliveira, Arlindo L.
Cita de ítem
En: String Processing and Information Retrieval. 14th International Symposium, SPIRE 2007 Santiago, Chile, October 29-31, 2007 Proceedings. pp 264-275
A compressed full-text self-index for a text T is a data structure
requiring reduced space and able of searching for patterns P in T.
Furthermore, the structure can reproduce any substring of T, thus it
actually replaces T. Despite the explosion of interest on self-indexes in
recent years, there has not been much progress on search functionalities
beyond the basic exact search. In this paper we focus on indexed
approximate string matching (ASM), which is of great interest, say, in
computational biology applications. We present an ASM algorithm that
works on top of a Lempel-Ziv self-index.We consider the so-called hybrid
indexes, which are the best in practice for this problem. We show that a
Lempel-Ziv index can be seen as an extension of the classical q-samples
index. We give new insights on this type of index, which can be of independent
interest, and then apply them to the Lempel-Ziv index. We
show experimentally that our algorithm has a competitive performance
and provides a useful space-time tradeoff compared to classical indexes.