Reducing the space requirement of LZ-index
Abstract
The LZ-index is a compressed full-text self-index able to represent a text T-1...u, over an alphabet of size sigma = O(polylog(u)) and with k-th order empirical entropy H-k(T), using 4uH(k)(T) + o(u log sigma) bits for any k = o(log(sigma) u). It can report all the occ occurrences of a pattern P-1...m in T in O(m(3) log sigma + (m + occ) log u) worst case time. Its main drawback is the factor 4 in its space complexity, which makes it larger than other state-of-the-art alternatives. In this paper we present two different approaches to reduce the space requirement of LZ-index. In both cases we achieve (2 + e)uHk(T) + o(u logo,) bits of space, for any. constant is an element of > 0, and we simultaneously improve the search time to O(m(2) log m + (m + occ) log u). Both indexes support displaying any subtext of length l in optimal O(l/log(sigma) u) time. In addition, we show how the space can be squeezed to (1 + is an element of)uH(k)(T) + o(u log sigma) to obtain a structure with O(m(2)) average search time for m >= 2 log(sigma) u.
Quote Item
COMBINATORIAL PATTERN MATCHING, PROCEEDINGS Book Series: LECTURE NOTES IN COMPUTER SCIENCE Volume: 4009 Pages: 318-329 Published: 2006
Collections