Compressed Text Indexes with Fast Locate
Abstract
Compressed text (self-)indexes have matured up to a point
where they can replace a text by a data structure that requires less
space and, in addition to giving access to arbitrary text passages, support
indexed text searches. At this point those indexes are competitive with
traditional text indexes (which are very large) for counting the number
of occurrences of a pattern in the text. Yet, they are still hundreds to
thousands of times slower when it comes to locating those occurrences in
the text. In this paper we introduce a new compression scheme for suffix
arrays which permits locating the occurrences extremely fast, while still
being much smaller than classical indexes. In addition, our index permits
a very efficient secondary memory implementation, where compression
permits reducing the amount of I/O needed to answer queries.
Patrocinador
Mecesup Grant UCH 0109, Chile
Quote Item
En: Combinatorial Pattern Matching 18th Annual Symposium, CPM 2007, London, Canada, July 9-11, 2007. Proceedings. pp. 216-227
Collections