On the reproducibility of experiments of indexing repetitive document collections
Author
dc.contributor.author
Fariña, Antonio
Author
dc.contributor.author
Martínez-Prieto, Miguel A.
Author
dc.contributor.author
Claude, Francisco
Author
dc.contributor.author
Navarro, Gonzalo
Author
dc.contributor.author
Lastra-Díaz, Juan J.
Author
dc.contributor.author
Prezza, Nicola
Author
dc.contributor.author
Seco, Diego
Admission date
dc.date.accessioned
2019-10-22T03:14:00Z
Available date
dc.date.available
2019-10-22T03:14:00Z
Publication date
dc.date.issued
2019
Cita de ítem
dc.identifier.citation
Information Systems, Volumen 83,
Identifier
dc.identifier.issn
03064379
Identifier
dc.identifier.other
10.1016/j.is.2019.03.007
Identifier
dc.identifier.uri
https://repositorio.uchile.cl/handle/2250/172041
Abstract
dc.description.abstract
This work introduces a companion reproducible paper with the aim of allowing the exact replication of the methods, experiments, and results discussed in a previous work Claude et al., (2016). In that parent paper, we proposed many and varied techniques for compressing indexes which exploit that highly repetitive collections are formed mostly of documents that are near-copies of others. More concretely, we describe a replication framework, called uiHRDC (universal indexes for Highly Repetitive Document Collections), that allows our original experimental setup to be easily replicated using various document collections. The corresponding experimentation is carefully explained, providing precise details about the parameters that can be tuned for each indexing solution. Finally, note that we also provide uiHRDC as reproducibility package.