Empirical study of the visual reasoning capabilities of the neural state machine

Chaperón Burgos, Gabriel Alejandro

Professor Advisor	dc.contributor.advisor	Pérez Rojas, Jorge
Professor Advisor	dc.contributor.advisor	Bravo Márquez, Felipe
Author	dc.contributor.author	Chaperón Burgos, Gabriel Alejandro
Associate professor	dc.contributor.other	Saavedra Rondo, José Manuel
Associate professor	dc.contributor.other	Bustos Cárdenas, Benjamín
Associate professor	dc.contributor.other	Rivas Echeverría, Francklin
Admission date	dc.date.accessioned	2024-07-31T19:33:22Z
Available date	dc.date.available	2024-07-31T19:33:22Z
Publication date	dc.date.issued	2023
Identifier	dc.identifier.other	10.58011/5h7p-q442
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/199836
Abstract	dc.description.abstract	El área de aprendizaje profundo es un área dentro de las ciencias de la computación, la estadística y la matemática donde los practicantes diseñan redes neuronales profundas para lograr imitar habilidades que son inherentemente humanas. En esta área se usan tareas con el fin evaluar la capacidad de un modelo para llevar a cabo una habilidad humana, como reconocimiento de objetos, clasificación de texto o reconocimiento de voz. A finales del 2019 una nueva arquitectura llamada Neural State Machine (NSM) fue propuesta para la tarea de respuesta de preguntas visuales, donde se espera que un modelo pueda responder preguntas que están basadas en una imagen. La arquitectura se inspira fuertemente en máquinas de estado tradicionales de teoría de autómatas, y funciona recorriendo un camino por los objetos de la imagen de forma iterativa hasta encontrar la respuesta a la pregunta. En este trabajo estudiamos de forma empírica las limitaciones de esta nueva arquitectura. De teoría de autómatas sabemos que la falta de memoria en las máquinas de estado tradicionales limita el tipo de entradas que pueden procesar. Considerando esta observación y el diseño de la NSM basado en máquinas de estado, nosotros conjeturamos que la arquitectura va a ser incapaz de procesar algunos tipos de preguntas basadas en imágenes. Para probar nuestra hipótesis usamos una metodología experimental. Primero definimos categorías de preguntas donde pensamos que la NSM tendrá problemas. Estas preguntas vienen de esfuerzos previos en la literatura de establecer puntos de referencia para sistemas multimodales de texto y visión. Luego evaluamos la arquitectura y comparamos los resultados con resultados base donde la NSM alcanza un desempeño prácticamente perfecto. Nuestros hallazgos muestran que la NSM efectivamente tiene problemas para responder las preguntas propuestas. La disminución en el desempeño varía en cada caso, llegando en ocasiones a niveles aleatorios. Nuestros resultados sugieren que para tener una solución exhaustiva para la tarea de respuestas de preguntas basadas en imágenes es necesario ir más allá de una red neuronal que representa una máquina finita de estados.	es_ES
Abstract	dc.description.abstract	The field of deep learning is a subfield of computer science, statistics and mathematics where practitioners try to build deep neural networks that mimic, to some extent, abilities inherent to human beings. In this field, tasks are used to evaluate the ability of a model to perform specific human skills, like object recognition, text classification or speech recognition. In late 2019, a new architecture called Neural State Machine (NSM) was proposed for the task of visual question answering, where a model has to answer a question based on an image. The network is heavily inspired by traditional state machines from automata theory, and works by iteratively following a path on the image trying to find the answer to the question. In this work we empirically study the limitations of this new architecture. From automata theory we know that traditional state machine’s lack of memory limits the kind of inputs they can process. Considering this observation and the networks inspiration on state machines we hypothesize the network will be unable to process certain kinds of image-based questions. We prove our hypothesis using an experimental approach. First we define a number of question categories where we think the NSM will struggle. These questions come from previous efforts in the literature to establish benchmarks for multimodal visual-text systems. Next we evaluate our architecture and compare the results to a baseline in which the NSM performs almost perfectly. Our findings show the NSM indeed struggles in the proposed proposed questions, with varying degrees of decrease in performance, reaching in some cases random performance. Our results suggests that, in order to have a comprehensive solution for the question answering problem, one would need to go beyond a neural network representation of a finite statemachine	es_ES
Lenguage	dc.language.iso	en	es_ES
Publisher	dc.publisher	Universidad de Chile	es_ES
Type of license	dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
Título	dc.title	Empirical study of the visual reasoning capabilities of the neural state machine	es_ES
Document type	dc.type	Tesis	es_ES
dc.description.version	dc.description.version	Versión original del autor	es_ES
dcterms.accessRights	dcterms.accessRights	Acceso abierto	es_ES
Cataloguer	uchile.catalogador	chb	es_ES
Department	uchile.departamento	Departamento de Ciencias de la Computación	es_ES
Faculty	uchile.facultad	Facultad de Ciencias Físicas y Matemáticas	es_ES
uchile.titulacion	uchile.titulacion	Doble Titulación	es_ES
uchile.carrera	uchile.carrera	Ingeniería Civil en Computación	es_ES
uchile.gradoacademico	uchile.gradoacademico	Magister	es_ES
uchile.notadetesis	uchile.notadetesis	Tesis para optar al grado de Magíster en Ciencias, Mención Computación	es_ES
uchile.notadetesis	uchile.notadetesis	Memoria para optar al titulo de Ingeniero Civil en Computación

Files in this item

Name:: Empirical-study-of-the-visual- ...
Size:: 3.299Mb
Format:: PDF

This item appears in the following Collection(s)

Tesis Postgrado
Tesis Postgrado

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States