Show simple item record

Authordc.contributor.authorFerragina, Paolo 
Authordc.contributor.authorManzini, Giovanni es_CL
Authordc.contributor.authorMäkinen, Veli es_CL
Authordc.contributor.authorNavarro, Gonzalo es_CL
Admission datedc.date.accessioned2007-05-04T19:24:21Z
Available datedc.date.available2007-05-04T19:24:21Z
Publication datedc.date.issued2004
Cita de ítemdc.identifier.citationSTRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS LECTURE NOTES IN COMPUTER SCIENCE 3246: 150-160 2004en
Identifierdc.identifier.issn0302-9743
Identifierdc.identifier.urihttps://repositorio.uchile.cl/handle/2250/124545
Abstractdc.description.abstractWe show that, by combining an existing compression boosting technique with the wavelet tree data structure, we are able to design a variant of the FM-index which scales well with the size of the input alphabet Sigma. The size of the new index built on a string T[1, n] is bounded by nH(k) (T)+O ((n log log n) / log(\Sigma\) n) bits, where H-k(T) is the k-th order empirical entropy of T. The above bound holds simultaneously for all k less than or equal to alphalog(\Sigma\) n and 0 < alpha < 1. Moreover, the index design does not depend on the parameter k, which plays a role only in analysis of the space occupancy. Using our index, the counting of the occurrences of an arbitrary pattern P[1,p] as a substring of T takes O(p log \Sigma\) time. Locating each pattern occurrence takes O(log \Sigma\ (log(2) n / log log n)) time. Reporting a text substring of length 2 takes O((l + log(2) n/ log log n) log \Sigma\) time.en
Lenguagedc.language.isoenen
Publisherdc.publisherSPRINGER-VERLAG BERLINen
Títulodc.titleAn alphabet-friendly FM-indexen
Document typedc.typeArtículo de revista


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record