Compressed vertical partitioning for efficient RDF management

Álvarez García, Sandra; Brisaboa, Nieves; Fernández, Javier D.; Martínez Prieto, Miguel A.; Navarro, Gonzalo

Author	dc.contributor.author	Álvarez García, Sandra
Author	dc.contributor.author	Brisaboa, Nieves
Author	dc.contributor.author	Fernández, Javier D.
Author	dc.contributor.author	Martínez Prieto, Miguel A.
Author	dc.contributor.author	Navarro, Gonzalo
Admission date	dc.date.accessioned	2015-10-05T19:09:17Z
Available date	dc.date.available	2015-10-05T19:09:17Z
Publication date	dc.date.issued	2015
Cita de ítem	dc.identifier.citation	Knowl Inf Syst (2015) 44:439–474	en_US
Identifier	dc.identifier.other	DOI: 10.1007/s10115-014-0770-y
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/134120
General note	dc.description	Artículo de publicación ISI	en_US
Abstract	dc.description.abstract	The Web of Data has been gaining momentum in recent years. This leads to increasingly publish more and more semi-structured datasets following, in many cases, the RDF (Resource Description Framework) data model based on atomic triple units of subject, predicate, and object. Although it is a very simple model, specific compression methods become necessary because datasets are increasingly larger and various scalability issues arise around their organization and storage. This requirement is even more restrictive in RDF stores because efficient SPARQL solution on the compressed RDF datasets is also required. This article introduces a novel RDF indexing technique that supports efficient SPARQL solution in compressed space. Our technique, called -triples, uses the predicate to vertically partition the dataset into disjoint subsets of pairs (subject, object), one per predicate. These subsets are represented as binary matrices of subjects objects in which 1-bits mean that the corresponding triple exists in the dataset. This model results in very sparse matrices, which are efficiently compressed using -trees. We enhance this model with two compact indexes listing the predicates related to each different subject and object in the dataset, in order to address the specific weaknesses of vertically partitioned representations. The resulting technique not only achieves by far the most compressed representations, but also achieves the best overall performance for RDF retrieval in our experimental setup. Our approach uses up to 10 times less space than a state-of-the-art baseline and outperforms its time performance by several orders of magnitude on the most basic query patterns. In addition, we optimize traditional join algorithms on -triples and define a novel one leveraging its specific features. Our experimental results show that our technique also overcomes traditional vertical partitioning for join solution, reporting the best numbers for joins in which the non-joined nodes are provided, and being competitive in most of the cases.	en_US
Patrocinador	dc.description.sponsorship	Chilean Fondecyt 1-110066 1-140796	en_US
Lenguage	dc.language.iso	en	en_US
Publisher	dc.publisher	Springer	en_US
Type of license	dc.rights	Atribución-NoComercial-SinDerivadas 3.0 Chile	*
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/	*
Keywords	dc.subject	RDF	en_US
Keywords	dc.subject	Compressed index	en_US
Keywords	dc.subject	Vertical partitioning	en_US
Keywords	dc.subject	Memory-based SPARQL solution	en_US
Keywords	dc.subject	k2-tree	en_US
Título	dc.title	Compressed vertical partitioning for efficient RDF management	en_US
Document type	dc.type	Artículo de revista

Files in this item

Name:: Compressed-vertical-partitioni ...
Size:: 3.117Mb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Atribución-NoComercial-SinDerivadas 3.0 Chile