On the approximation ratio of Lempel-Ziv parsing
Author
Abstract
Shannon’s entropy is a clear lower bound for statistical compression. The situation is not so well understood for dictionary-based
compression. A plausible lower bound is b, the least number of phrases
of a general bidirectional parse of a text, where phrases can be copied
from anywhere else in the text. Since computing b is NP-complete, a
popular gold standard is z, the number of phrases in the Lempel-Ziv
parse of the text, where phrases can be copied only from the left. While
z can be computed in linear time, almost nothing has been known for
decades about its approximation ratio with respect to b. In this paper
we prove that z = O(b log(n/b)), where n is the text length. We also
show that the bound is tight as a function of n, by exhibiting a string
family where z = Ω(b log n). Our upper bound is obtained by building a
run-length context-free grammar based on a locally consistent parsing of
the text. Our lower bound is obtained by relating b with r, the number of
equal-letter runs in the Burrows-Wheeler transform of the text. On our
way, we prove other relevant bounds between compressibility measures.
Indexation
Artículo de publicación SCOPUS
Identifier
URI: https://repositorio.uchile.cl/handle/2250/169320
DOI: 10.1007/978-3-319-77404-6_36
ISSN: 16113349
03029743
Quote Item
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Volumen 10807 LNCS, 2018, Pages 490–503
Collections