A simple, efficient, parallelizable algorithm for approximated nearest neighbors

Ferrada, Sebastián; Bustos, Benjamin; Reyes, Nora

Author	dc.contributor.author	Ferrada, Sebastián
Author	dc.contributor.author	Bustos, Benjamin
Author	dc.contributor.author	Reyes, Nora
Admission date	dc.date.accessioned	2019-05-31T15:20:02Z
Available date	dc.date.available	2019-05-31T15:20:02Z
Publication date	dc.date.issued	2018
Cita de ítem	dc.identifier.citation	CEUR Workshop Proceedings, Volumen 2100, 2018
Identifier	dc.identifier.issn	16130073
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/169430
Abstract	dc.description.abstract	The use of the join operator in metric spaces leads to what is known as a similarity join, where objects of two datasets are paired if they are somehow similar. We propose an heuristic that solves the 1-NN selfsimilarity join, that is, a similarity join of a dataset with itself, that brings together each element with its nearest neighbor within the same dataset. Solving the problem using a simple brute-force algorithm requires O(n 2 ) distance calculations, since it requires to compare every element against all others. We propose a simple divide-and-conquer algorithm that gives an approximated solution for the self-similarity join that computes only O(n 3 2 ) distances. We show how the algorithm can be easily modified in order to improve the precision up to 31% (i.e., the percentage of correctly found 1-NNs) and such that 79% of the results are within the 10-NN, with no significant extra distance computations. We present how the algorithm can be executed in parallel and prove that using Θ( √ n) processors, the total execution takes linear time. We end discussing ways in which the algorithm can be improved in the future.
Lenguage	dc.language.iso	en
Publisher	dc.publisher	CEUR-WS
Type of license	dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Chile
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/
Source	dc.source	CEUR Workshop Proceedings
Keywords	dc.subject	Computer science (all)
Título	dc.title	A simple, efficient, parallelizable algorithm for approximated nearest neighbors
Document type	dc.type	Artículo de revista
Cataloguer	uchile.catalogador	jmm
Indexation	uchile.index	Artículo de publicación SCOPUS
uchile.cosecha	uchile.cosecha	SI

Files in this item

Name:: A_simple_efficient_paralleliza ...
Size:: 578.0Kb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile