About
Contact
Help
Sending publications
How to publish
Advanced Search
View Item 
  •   Home
  • Facultad de Ciencias Físicas y Matemáticas
  • Artículos de revistas
  • View Item
  •   Home
  • Facultad de Ciencias Físicas y Matemáticas
  • Artículos de revistas
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse byCommunities and CollectionsDateAuthorsTitlesSubjectsThis CollectionDateAuthorsTitlesSubjects

My Account

Login to my accountRegister
Biblioteca Digital - Universidad de Chile
Revistas Chilenas
Repositorios Latinoamericanos
Tesis LatinoAmericanas
Tesis chilenas
Related linksRegistry of Open Access RepositoriesOpenDOARGoogle scholarCOREBASE
My Account
Login to my accountRegister

Stronger Lempel-Ziv Based Compressed Text Indexing

Artículo
Thumbnail
Open/Download
IconArroyuelo_Diego.pdf (1.488Mb)
Publication date
2012
Metadata
Show full item record
Cómo citar
Arroyuelo, Diego
Cómo citar
Stronger Lempel-Ziv Based Compressed Text Indexing
.
Copiar
Cerrar

Author
  • Arroyuelo, Diego;
  • Navarro, Gonzalo;
  • Sadakane, Kunihiko;
Abstract
Given a text T [1..u] over an alphabet of size σ, the full-text search problem consists in finding the occ occurrences of a given pattern P[1..m] in T . In indexed text searching we build an index on T to improve the search time, yet increasing the space requirement. The current trend in indexed text searching is that of compressed full-text self-indices, which replace the text with a more space-efficient representation of it, at the same time providing indexed access to the text. Thus, we can provide efficient access within compressed space. The Lempel-Ziv index (LZ-index) of Navarro is a compressed full-text self-index able to represent T using 4uHk(T ) + o(ulogσ) bits of space, where Hk(T ) denotes the k-th order empirical entropy of T , for any k = o(logσ u). This space is about four times the compressed text size. The index can locate all the occ occurrences of a pattern P in T in O(m3 log σ + (m + occ) log u) worst-case time. Although this index has proven very competitive in practice, the O(m3 logσ) term can be excessive for long patterns. Also, the factor 4 in its space complexity makes it larger than other state-of-the-art alternatives. In this paper we present stronger Lempel-Ziv based indices (LZ-indices), improving the overall performance of the original LZ-index. We achieve indices requiring (2 + )uHk(T ) + o(ulogσ) bits of space, for any constant > 0, which makes them the smallest existing LZ-indices. We simultaneously improve the search time to O(m2 + (m + occ) log u), which makes our indices very competitive with stateof- the-art alternatives. Our indices support displaying any text substring of length in optimal O( / logσ u) time. In addition, we show how the space can be squeezed to (1+ )uHk(T )+o(ulogσ) to obtain a structure with O(m2) average search time for m 2 logσ u. Alternatively, the search time of LZ-indices can be improved to O((m + occ) log u) with (3 + )uHk(T ) + o(ulogσ) bits of space, which is much less than the space needed by other Lempel-Ziv-based indices achieving the same search time. Overall our indices stand out as a very attractive alternative for spaceefficient indexed text searching.
Patrocinador
CONICYT PhD Fellowship Program, Fondecyt Grant 1-080019, Grant-in-Aid of the Ministry of Education, Science, Sports and Culture of Japan
Identifier
URI: https://repositorio.uchile.cl/handle/2250/125597
DOI: DOI 10.1007/s00453-010-9443-8
Quote Item
Algorithmica (2012) 62:54–101
Collections
  • Artículos de revistas
xmlui.footer.title
31 participating institutions
More than 73,000 publications
More than 110,000 topics
More than 75,000 authors
Published in the repository
  • How to publish
  • Definitions
  • Copyright
  • Frequent questions
Documents
  • Dating Guide
  • Thesis authorization
  • Document authorization
  • How to prepare a thesis (PDF)
Services
  • Digital library
  • Chilean academic journals portal
  • Latin American Repository Network
  • Latin American theses
  • Chilean theses
Dirección de Servicios de Información y Bibliotecas (SISIB)
Universidad de Chile

© 2020 DSpace
  • Access my account