Near neighbor searching with K nearest references
Author
Abstract
Proximity searching is the problem of retrieving,from a given database, those objects
closest to a query.To avoid exhaustive searching,data structures called indexes are builton
the database prior to serving queries.The curse of dimensionality is a well-known problem
for indexes:in spaces with sufficiently concentrated distance histograms,no index
out performs an exhaustives can of the database.
In recent years,a number of index es for approximate proximity searching have been
proposed. These are able to cope with the curse of dimensionality inexchangefor
returning an answer that might beslightly different from the correctone.
In this paper we show that many of those recent indexes can be understood as variants
of a simple general model basedon K-nearest reference signatures.A set o freferences is
chosen from the database,and the signature of each object consists of the K references
nearest to the object. At query time,the signature of the query is computed and the search
examines only the objects whose signature is close enough to that of the query.
Many known and novelindexes are obtained by considering different ways to
determine how much detail the signature records(e.g.,justthesetofnearestreferences,
or also their proximity order to the object, oral so their distances to the object,and soon),
how the similarity between signatures is defined,and how the parameters are tuned.In
addition,we introduce a space-efficient representation for those families of indexes,
making it possible to search very large databases in main memory.Small indexes are
cache friendly,inducing faster queries.
We perform exhaustive experiments comparing several known and new indexes that
derive from our framework,evaluatingtheir time performance,memory usage,and
quality of approximation.The best indexes out perform the state of the art,offering an
attractive balance between all these aspects,and turn out to be excellent choices in many
scenarios.Our framework gives high flexibility to design new indexes.
General note
Artículo de publicación ISI
Patrocinador
FB0001,Conicyt,Chile and CONACyTGrant179795 Mexico
Identifier
URI: https://repositorio.uchile.cl/handle/2250/132859
DOI: DOI: 10.1016/j.is.2015.02.001
ISSN: 0306-4379
Quote Item
Information Systems51(2015)43–61
Collections
The following license files are associated with this item: