On the expressiveness of LARA: a unified language for linear and relational algebra

Estudiamos el poder de expresividad del lenguaje Lara – un álgebra unificador recientemente propuesto para expresar operaciones del álgebra relacional y lineal – ambos en términos de lenguajes de consultas de bases de datos tradicionales y algunas tareas analíticas frecuentemente realizadas en tareas de machine learning. El modelo de datos de Lara es la tabla asociativa, una estructura similar a una tabla que tiene una cabecera y contiene datos en filas. Está dividida en dos: las llaves y los valores. La tabla puede verse como una función, donde toda llave está asociada a sus valores respectivos. El lenguaje tiene tres operaciones: join, union, and extension, las cuales están parametrizadas por funciones. Las primeras dos son parametrizadas por funciones de agregación (suma, promedio, desviación estándar, etc.) mientras que la última está parametrizada por una función definida por el usuario, la cual puede permitir transformar filas en tablas. Comenzamos mostrando como Lara es completamente expresivo con respecto a la lógica de primer orden con agregación. Dado que Lara está parametrizado por un conjunto de funciones definidas por el usuario, el poder expresivo exacto del lenguaje depende en cómo estas funciones están definidas. Distinguimos dos casos principales dependiendo del nivel de genericidad al que las consultas están impuestas a satisfacer. Bajo suposiciones de genericidad fuerte el lenguaje no puede expresar la convolución de matrices, una operación muy importante en las operaciones actuales de machine learning. El lenguaje es también local, y por lo tanto no puede expresar operaciones que exhiben un comportamiento recursivo, como la inversión de matrices. Para expresar la convolución, podemos relajar el requerimiento de genericidad añadiendo un orden lineal subyacente al dominio. Esto, sin embargo, destruye la localidad y convierte el poder expresivo del lenguaje en algo mucho más difícil de entender. En particular, aunque bajo supuestos de complejidad el lenguaje resultante aún no puede expresar la inversión de matrices, una prueba de este hecho sin estos supuestos parece ser difícil de obtener.

We study the expressive power of the Lara language – a recently proposed unifying algebra for expressing relational and linear algebra operations – both in terms of traditional database query languages and some analytic tasks often performed in machine learning pipelines. The data model of Lara is the associative table, a table-like structure which has a header and contains rows that follows the header. It is divided in two: the keys and the values. The table can be seen as a function, where each key is associated with their respective values. The lenguage has three operations: join, union and extension, where each is parameterized by a function. The first two are parameterized by aggregate functions (sum, average, standard deviation, etc.) while the latter is parameterized by an user-defined function, which can allow to transform rows into tables. We start by showing Lara to be expressive complete with respect to first-order logic with aggregation. Since Lara is parameterized by a set of user-defined functions, the exact expressive power of the language depends on how these functions are defined. We distinguish two main cases depending on the level of genericity queries are enforced to satisfy. Under strong genericity assumptions the language cannot express matrix convolution, a very important operation in current machine learning operations. This language is also local, and thus cannot express operations such as matrix inverse that exhibit a recursive behavior. For expressing convolution, one can relax the genericity requirement by adding an underlying linear order on the domain. This, however, destroys locality and turns the expressive power of the language much more difficult to understand. In particular, although under complexity assumptions the resulting language can still not express matrix inverse, a proof of this fact without such assumptions seems challenging to obtain.

General note

Tesis para optar al grado de Magíster en Ciencias, Mención Computación

Memoria para optar al título de Ingeniero Civil en Computación

Patrocinador

Instituto Milenio Fundamentos de los Datos (IMFD)

Identifier

URI: https://repositorio.uchile.cl/handle/2250/173906

Collections