New adaptive compressors for natural language text

Brisaboa, Nieves; Fariña, A.; Navarro, Gonzalo; Parama, J. R.

Artículo

Open/Download

Brisaboa_NR.pdf (929.9Kb)

Publication date

2008-11-10

Metadata

Show full item record

Cómo citar

New adaptive compressors for natural language textFormato de cita

Copiar

Cerrar

Author

Abstract

Semistatic byte-oriented word-based compression codes have been shown to be an attractive alternative to compress natural language text databases, because of the combination of speed, effectiveness, and direct searchability they offer. In particular, our recently proposed family of dense compression codes has been shown to be superior to the more traditional byte-oriented word-based Huffman codes in most aspects. In this paper, we focus on the problem of transmitting texts among peers that do not share the vocabulary. This is the typical scenario for adaptive compression methods. We design adaptive variants of our semistatic dense codes, showing that they are much simpler and faster than dynamic Huffman codes and reach almost the same compression effectiveness. We show that our variants have a very compelling trade-off between compression/decompression speed, compression ratio, and search speed compared with most of the state-of-the-art general compressors.

Patrocinador

Contract/grant sponsor: Funded in part (for the Spanish group) by MEC (TIN2006-15071-C03-03), Xunta de Galicia (PGIDIT05-SIN-10502PR) and (for the third author) by Millennium Nucleus Center for Web Research, grant (P04-067-F), Mideplan, Chile.

Identifier

URI: https://repositorio.uchile.cl/handle/2250/125077
DOI: 10.1002/spe.882
ISSN: 0038-0644

Quote Item

SOFTWARE-PRACTICE & EXPERIENCE Volume: 38 Issue: 13 Pages: 1429-1450 Published: NOV 10 2008

Collections

Artículos de revistas