A simple, efficient, parallelizable algorithm for approximated nearest neighbors
Author
dc.contributor.author
Ferrada, Sebastián
Author
dc.contributor.author
Bustos, Benjamin
Author
dc.contributor.author
Reyes, Nora
Admission date
dc.date.accessioned
2019-05-31T15:20:02Z
Available date
dc.date.available
2019-05-31T15:20:02Z
Publication date
dc.date.issued
2018
Cita de ítem
dc.identifier.citation
CEUR Workshop Proceedings, Volumen 2100, 2018
Identifier
dc.identifier.issn
16130073
Identifier
dc.identifier.uri
https://repositorio.uchile.cl/handle/2250/169430
Abstract
dc.description.abstract
The use of the join operator in metric spaces leads to what is
known as a similarity join, where objects of two datasets are paired if they
are somehow similar. We propose an heuristic that solves the 1-NN selfsimilarity join, that is, a similarity join of a dataset with itself, that brings
together each element with its nearest neighbor within the same dataset.
Solving the problem using a simple brute-force algorithm requires O(n
2
)
distance calculations, since it requires to compare every element against all
others. We propose a simple divide-and-conquer algorithm that gives an
approximated solution for the self-similarity join that computes only O(n
3
2 )
distances. We show how the algorithm can be easily modified in order to
improve the precision up to 31% (i.e., the percentage of correctly found
1-NNs) and such that 79% of the results are within the 10-NN, with no
significant extra distance computations. We present how the algorithm can
be executed in parallel and prove that using Θ(
√
n) processors, the total
execution takes linear time. We end discussing ways in which the algorithm
can be improved in the future.