An efficient algorithm for approximated self-similarity joins in metric spaces
Author
dc.contributor.author
Ferrada Aliaga, Sebastián
Author
dc.contributor.author
Bustos Cárdenas, Benjamín
Author
dc.contributor.author
Reyes, Nora
Admission date
dc.date.accessioned
2020-05-29T19:05:08Z
Available date
dc.date.available
2020-05-29T19:05:08Z
Publication date
dc.date.issued
2020
Cita de ítem
dc.identifier.citation
Information Systems. 91: (2020): 101510
es_ES
Identifier
dc.identifier.other
10.1016/j.is.2020.101510
Identifier
dc.identifier.uri
https://repositorio.uchile.cl/handle/2250/175109
Abstract
dc.description.abstract
Similarity join is a key operation in metric databases. It retrieves all pairs of elements that are similar. Solving such a problem usually requires comparing every pair of objects of the datasets, even when indexing and ad hoc algorithms are used. We propose a simple and efficient algorithm for the computation of the approximated k nearest neighbor self-similarity join. This algorithm computes Theta(n(3/2)) distances and it is empirically shown that it reaches an empirical precision of 46% in real-world datasets. We provide a comparison to other common techniques such as Quickjoin and Locality-Sensitive Hashing and argue that our proposal has a better execution time and average precision.
es_ES
Patrocinador
dc.description.sponsorship
Millennium Institute for Foundational Research on Data, Chile
CONICYT-PFCHA, Argentina 2017-21170616