On the approximation ratio of Lempel-Ziv parsing

Gagie, Travis; Navarro, Gonzalo; Prezza, Nicola

Author	dc.contributor.author	Gagie, Travis
Author	dc.contributor.author	Navarro, Gonzalo
Author	dc.contributor.author	Prezza, Nicola
Admission date	dc.date.accessioned	2019-05-31T15:19:06Z
Available date	dc.date.available	2019-05-31T15:19:06Z
Publication date	dc.date.issued	2018
Cita de ítem	dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Volumen 10807 LNCS, 2018, Pages 490–503
Identifier	dc.identifier.issn	16113349
Identifier	dc.identifier.issn	03029743
Identifier	dc.identifier.other	10.1007/978-3-319-77404-6_36
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/169320
Abstract	dc.description.abstract	Shannon’s entropy is a clear lower bound for statistical compression. The situation is not so well understood for dictionary-based compression. A plausible lower bound is b, the least number of phrases of a general bidirectional parse of a text, where phrases can be copied from anywhere else in the text. Since computing b is NP-complete, a popular gold standard is z, the number of phrases in the Lempel-Ziv parse of the text, where phrases can be copied only from the left. While z can be computed in linear time, almost nothing has been known for decades about its approximation ratio with respect to b. In this paper we prove that z = O(b log(n/b)), where n is the text length. We also show that the bound is tight as a function of n, by exhibiting a string family where z = Ω(b log n). Our upper bound is obtained by building a run-length context-free grammar based on a locally consistent parsing of the text. Our lower bound is obtained by relating b with r, the number of equal-letter runs in the Burrows-Wheeler transform of the text. On our way, we prove other relevant bounds between compressibility measures.
Lenguage	dc.language.iso	en
Publisher	dc.publisher	Springer Verlag
Type of license	dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Chile
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/
Source	dc.source	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Keywords	dc.subject	Theoretical Computer Science
Keywords	dc.subject	Computer Science (all)
Título	dc.title	On the approximation ratio of Lempel-Ziv parsing
Document type	dc.type	Artículo de revista
Cataloguer	uchile.catalogador	jmm
Indexation	uchile.index	Artículo de publicación SCOPUS
uchile.cosecha	uchile.cosecha	SI

Files in this item

Name:: On_the_approximation.pdf
Size:: 375.3Kb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile