First Huffman, then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index
Author
dc.contributor.author
Grabowski, Szymon
Author
dc.contributor.author
Mäkinen, Veli
es_CL
Author
dc.contributor.author
Navarro, Gonzalo
es_CL
Admission date
dc.date.accessioned
2014-01-08T14:11:40Z
Available date
dc.date.available
2014-01-08T14:11:40Z
Publication date
dc.date.issued
2014-01-08
Identifier
dc.identifier.uri
https://repositorio.uchile.cl/handle/2250/126044
Abstract
dc.description.abstract
We design a succinct full-text index based on the idea of Huffman-compressing the text and
then applying the Burrows-Wheeler transform over it. The resulting structure can be searched as an
FM-index, with the benefit of removing the sharp dependence on the alphabet size, , present in that
structure. On a text of length n with zero-order entropy H0, our index needs O(n(H0+1)) bits of space,
without any dependence on . The average search time for a pattern of length m is O(m(H0 + 1)),
under reasonable assumptions. Each position of a text occurrence can be reported in worst case time
O((H0 + 1) log n), while any text substring of length L can be retrieved in O((H0 + 1)L) average time
in addition to the previous worst case time. By paying 2n additional bits of space, it is possible to
maintain the average complexities and ensure that H0 + 1 becomes log in the worst case for all the
time complexities. Our index provides a relevant space/time tradeoff between existing succinct data
structures, with the additional interest of being easy to implement.