A simple, efficient, parallelizable algorithm for approximated nearest neighbors
Artículo
Open/ Download
Publication date
2018Metadata
Show full item record
Cómo citar
Ferrada, Sebastián
Cómo citar
A simple, efficient, parallelizable algorithm for approximated nearest neighbors
Author
Abstract
The use of the join operator in metric spaces leads to what is
known as a similarity join, where objects of two datasets are paired if they
are somehow similar. We propose an heuristic that solves the 1-NN selfsimilarity join, that is, a similarity join of a dataset with itself, that brings
together each element with its nearest neighbor within the same dataset.
Solving the problem using a simple brute-force algorithm requires O(n
2
)
distance calculations, since it requires to compare every element against all
others. We propose a simple divide-and-conquer algorithm that gives an
approximated solution for the self-similarity join that computes only O(n
3
2 )
distances. We show how the algorithm can be easily modified in order to
improve the precision up to 31% (i.e., the percentage of correctly found
1-NNs) and such that 79% of the results are within the 10-NN, with no
significant extra distance computations. We present how the algorithm can
be executed in parallel and prove that using Θ(
√
n) processors, the total
execution takes linear time. We end discussing ways in which the algorithm
can be improved in the future.
Indexation
Artículo de publicación SCOPUS
Quote Item
CEUR Workshop Proceedings, Volumen 2100, 2018
Collections