An alphabet-friendly FM-index

Ferragina, Paolo; Manzini, Giovanni; Mäkinen, Veli; Navarro, Gonzalo

Artículo

Open/Download

Ferragina P.pdf (254.9Kb)

Publication date

2004

Metadata

Show full item record

Cómo citar

An alphabet-friendly FM-indexFormato de cita

Copiar

Cerrar

Author

Abstract

We show that, by combining an existing compression boosting technique with the wavelet tree data structure, we are able to design a variant of the FM-index which scales well with the size of the input alphabet Sigma. The size of the new index built on a string T[1, n] is bounded by nH(k) (T)+O ((n log log n) / log(\Sigma\) n) bits, where H-k(T) is the k-th order empirical entropy of T. The above bound holds simultaneously for all k less than or equal to alphalog(\Sigma\) n and 0 < alpha < 1. Moreover, the index design does not depend on the parameter k, which plays a role only in analysis of the space occupancy. Using our index, the counting of the occurrences of an arbitrary pattern P[1,p] as a substring of T takes O(p log \Sigma\) time. Locating each pattern occurrence takes O(log \Sigma\ (log(2) n / log log n)) time. Reporting a text substring of length 2 takes O((l + log(2) n/ log log n) log \Sigma\) time.

Identifier

URI: https://repositorio.uchile.cl/handle/2250/124545
ISSN: 0302-9743

Quote Item

STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS LECTURE NOTES IN COMPUTER SCIENCE 3246: 150-160 2004

Collections

Artículos de revistas