First Huffman, then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index
Artículo
Open/ Download
Publication date
2014-01-08Metadata
Show full item record
Cómo citar
Grabowski, Szymon
Cómo citar
First Huffman, then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index
Author
Abstract
We design a succinct full-text index based on the idea of Huffman-compressing the text and
then applying the Burrows-Wheeler transform over it. The resulting structure can be searched as an
FM-index, with the benefit of removing the sharp dependence on the alphabet size, , present in that
structure. On a text of length n with zero-order entropy H0, our index needs O(n(H0+1)) bits of space,
without any dependence on . The average search time for a pattern of length m is O(m(H0 + 1)),
under reasonable assumptions. Each position of a text occurrence can be reported in worst case time
O((H0 + 1) log n), while any text substring of length L can be retrieved in O((H0 + 1)L) average time
in addition to the previous worst case time. By paying 2n additional bits of space, it is possible to
maintain the average complexities and ensure that H0 + 1 becomes log in the worst case for all the
time complexities. Our index provides a relevant space/time tradeoff between existing succinct data
structures, with the additional interest of being easy to implement.
Identifier
URI: https://repositorio.uchile.cl/handle/2250/126044
Collections