Caching of SPARQL queries

La Web Sem´antica es el ´area de estudio que se dedica a investigar c´omo hacer que la estructuraci´on de los datos en la web sean entendibles por m´aquinas, mediante la definici´on de una estructura sem´antica (RDF) y propiedades ontol´ogicas (OWL). RDF permite estructurar los datos de la web en grafos de triplas, donde un sujeto y un objeto est´an conectados a trav´es de una propiedad. SPARQL es el lenguaje definido por la W3C (World Wide Web Consortium) como el est´andar para consultas de datos en RDF. En el presente, existe una gran demanda por servicios que usan este lenguaje, de la cual nace la necesidad de reducir los tiempos de respuesta promedio con tal de optimizar su uso. La propuesta de esta tesis consiste en el dise˜no e implementaci´on de un cach´e, el cual permite guardar en memoria los resultados de las consultas m´as frecuentes con tal de poder acceder a ellos de manera m´as eficiente. En este trabajo se proponen diversas maneras de poder guardar y eliminar elementos del cach´e dise˜nado, y se realizaron experimentos para probar la pol´ıtica de uso m´as eficiente. Entre las pol´ıticas de uso estudiadas se propuso una ideada en este propio trabajo, que toma en consideraci´on el distinto tama˜no que pueden poseer los resultados de las consultas (cosa que no hacen necesariamente las otras pol´ıticas estudiadas); y los experimentos hechos demuestran que esta es la m´as eficiente en permitir reducir la mediana de los tiempos de ejecuci´on de las consultas. El trabajo concluye que es posible reducir los tiempos medianos y de cuartiles superiores mediante el uso de un sistema de cach´e para SPARQL. Adem´as, se demuestra que mediante otros ajustes tales como la selecci´on apropiada de consultas a guardar en el cach´e y el uso de un algoritmo de canonicalizaci´on, ideado por Salas y Hogan, es posible obtener aun mejores resultados con respecto a los tiempos que toma ejecutar las consultas.

The Semantic Web is a research area dedicated to studying the question of how to make the web’s data structures comprehensible for machines through the definition of a standard data model (RDF) and ontological properties (OWL). RDF allows for structuring data on the web as graphs of triples, where a subject and an object are linked through a property. SPARQL is a query language defined by the W3C (World Wide Web Consortium) as the standard for querying data in RDF. Presently, there exists a huge demand for services which use this language, from which a necessity arises to reduce their average response time in order to optimize their use. This thesis proposes the design and implementation of a cache, which enables keeping in memory the results of the most frequent queries so that they can be accessed in a faster way. This work proposes different ways to cache and delete elements, and different experiments were designed to determine which policy is the most efficient. Among the policies we studied, we propose one created in this work which takes into consideration the different sizes query responses may have (something that the other policies we consider don’t necessarily take into account); and the experiments show that this is the most efficient which allows to reduce the median of the execution times for the queries. The work concludes that it is possible to reduce median and upper quartiles times by using a caching system for SPARQL. Furthermore, we prove that with other adjustments such as selecting the appropiate queries to store in the cache and the use of a canonicalisation algorithm, created by Salas and Hogan, it is possible to get even better results for the times that it takes to execute the queries.

xmlui.dri2xhtml.METS-1.0.item-notadetesis.item

Tesis para optar al grado de Magíster en Ciencias de la Computación

Memoria para optar al título de Ingeniero Civil en Computación

Identifier

URI: https://repositorio.uchile.cl/handle/2250/199426
DOI: 10.58011/9hcv-c206

Collections