An efficient algorithm for approximated self-similarity joins in metric spaces

Ferrada Aliaga, Sebastián; Bustos Cárdenas, Benjamín; Reyes, Nora

Artículo

Open/Download

An-efficient-algorithm.pdf (2.661Mb)

Access note

Acceso Abierto

Publication date

2020

Metadata

Show full item record

Cómo citar

An efficient algorithm for approximated self-similarity joins in metric spacesFormato de cita

Copiar

Cerrar

Author

Abstract

Similarity join is a key operation in metric databases. It retrieves all pairs of elements that are similar. Solving such a problem usually requires comparing every pair of objects of the datasets, even when indexing and ad hoc algorithms are used. We propose a simple and efficient algorithm for the computation of the approximated k nearest neighbor self-similarity join. This algorithm computes Theta(n(3/2)) distances and it is empirically shown that it reaches an empirical precision of 46% in real-world datasets. We provide a comparison to other common techniques such as Quickjoin and Locality-Sensitive Hashing and argue that our proposal has a better execution time and average precision.

Patrocinador

Millennium Institute for Foundational Research on Data, Chile CONICYT-PFCHA, Argentina 2017-21170616

Indexation

Artículo de publicación ISI

Artículo de publicación SCOPUS

Identifier

URI: https://repositorio.uchile.cl/handle/2250/175109
DOI: 10.1016/j.is.2020.101510

Quote Item

Information Systems. 91: (2020): 101510

Collections

Artículos de revistas

The following license files are associated with this item:

Creative Commons

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile