Position-Restricted Substring Searching
Author
Abstract
A full-text index is a data structure built over a text string
T[1, n]. The most basic functionality provided is (a) counting how many
times a pattern string P[1,m] appears in T and (b) locating all those
occ positions. There exist several indexes that solve (a) in O(m) time
and (b) in O(occ) time. In this paper we propose two new queries, (c)
counting how many times P[1,m] appears in T[l, r] and (d) locating all
those occl,r positions. These can be solved using (a) and (b) but this
requires O(occ) time. We present two solutions to (c) and (d) in this
paper. The first is an index that requires O(n log n) bits of space and
answers (c) in O(m + log n) time and (d) in O(log n) time per occurrence
(that is, O(occl,r log n) time overall). A variant of the first solution
answers (c) in O(m+log log n) time and (d) in constant time per occurrence,
but requires O(n log1+Ç« n) bits of space for any constant Ç« > 0.
The second solution requires O(nmlog σ) bits of space, solving (c) in
O(m⌈log σ/ log log n⌉) time and (d) in O(m⌈log σ/ log log n⌉) time per
occurrence, where σ is the alphabet size. This second structure takes
less space when the text is compressible.
Our solutions can be seen as a generalization of rank and select dictionaries,
which allow computing how many times a given character c appears
in a prefix T[1, i] and also locate the i-th occurrence of c in T. Our solution
to (c) extends character rank queries to substring rank queries, and
our solution to (d) extends character select to substring select queries.
As a byproduct, we show how rank queries can be used to implement
fractional cascading in little space, so as to obtain an alternative implementation
of a well-known two-dimensional range search data structure
by Chazelle. We also show how Grossi et al.’s wavelet trees are suitable
for two-dimensional range searching, and their connection with Chazelle’s
data structure.
Identifier
URI: https://repositorio.uchile.cl/handle/2250/124918
ISSN: 0302-9743 (Print) 1611-3349 (Online)
Quote Item
Lecture Notes in Computer Science, Volume 3887/2006
Collections