Self-supervised sketch-based detection with application in historical document spotting

Stears Rojas, Christopher Andrés

Professor Advisor	dc.contributor.advisor	Saavedra Rondo, José
Author	dc.contributor.author	Stears Rojas, Christopher Andrés
Associate professor	dc.contributor.other	Bravo Márquez, Felipe
Associate professor	dc.contributor.other	Sipiran Mendoza, Iván
Admission date	dc.date.accessioned	2025-03-03T20:12:59Z
Available date	dc.date.available	2025-03-03T20:12:59Z
Publication date	dc.date.issued	2024
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/203288
Abstract	dc.description.abstract	La digitalización es una herramienta fundamental para preservar y resguardar a la posteridad libros o documentos de patrimonio cultural, es por ello que se vuelve de vital importancia tener una herramienta capaz de buscar patrones y figuras a través de los distintos documentos. Las estrategias actuales se basan en la comparación de imágenes del mismo dominio (foto-foto) para detectar los distintos patrones en los documentos, pero su desempeño es limitado, alcanzando un \textit{Mean Average Precision} (mAP) de 27,0\% en la tarea de pattern spotting en el conjunto de datos DocExplore. Este trabajo propone una nueva aproximación que explora el uso de un dominio completamente diferente, específicamente bocetos, para detectar patrones en documentos de patrimonio cultural. Uno de los principales desafios al utilizar bocetos radica en la falta de pares foto-boceto para el entrenamiento, lo que dificulta el desarrollo de modelos generalizables. Para abordar esta limitación, se proponen dos modelos entrenados bajo un régimen auto-supervisado: S3BIR-CLIP y S3BIR-DINOv2 (donde S3BIR significa \textit{Self-Supervised Sketch-based Image Retrieval}). Estos modelos son capaces de producir un espacio de características bimodal foto-boceto sin necesidad de datos emparejados explícitamente, demostrando un desempeño sobresaliente en tres conjuntos de datos públicos. Estos se integraron junto con un modelo de segmentación conocido como SAM (\textit{Segment Anything Model}) para extraer regiones de interés dentro de los documentos y ser evaluados en el dataset DocExplore bajo la tarea de pattern spotting. Los resultados mostraron que esta propuesta es competitiva a la hora de detectar patrones dentro de los documentos, alcanzando un mAP del 21,0\%. Este hallazgo ofrece nuevas oportunidades para los expertos dedicados a la preservación y análisis de documentos históricos, ya que permite el uso de bocetos a la hora de buscar información relevante, facilitando así la interacción con el patrimonio cultural digitalizado.	es_ES
Abstract	dc.description.abstract	Digitization is a fundamental tool for preserving and safeguarding books or cultural heritage documents for posterity, which is why it is of vital importance to have a tool capable of searching for patterns and figures through the different documents. Current strategies are based on the comparison of images from the same domain (photo-photo) to detect the different patterns in the documents, but their performance is limited, reaching a Mean Average Precision (mAP) of 27.0% on the pattern spotting task on a DocExplore dataset. This paper proposes a new approach that explores the use of a completely different domain, specifically sketches, to detect patterns in cultural heritage documents. One of the main challenges in using sketches lies in the lack of photo-sketch pairs for training, which hinders the development of generalizable models. To address this limitation, two models trained under a self-supervised regime are proposed: S3BIR-CLIP and S3BIR-DINOv2 (where S3BIR stands for Self-Supervised Sketch-based Image Retrieval). These models are capable of producing a bimodal photo-sketch feature space without the need for explicitly matched data, demonstrating outstanding performance on three public datasets. These were integrated together with a segmentation model known as SAM (Segment Anything Model) to extract regions of interest within documents and evaluated on the DocExplore dataset under the pattern spotting task. The results showed that this approach is competitive in detecting patterns within documents, achieving a mAP of 21.0%. This finding offers new opportunities for experts dedicated to the preservation and analysis of historical documents, as it allows the use of sketches when searching for relevant information, thus facilitating the interaction with the digitized cultural heritage.	es_ES
Lenguage	dc.language.iso	en	es_ES
Publisher	dc.publisher	Universidad de Chile	es_ES
Type of license	dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
Título	dc.title	Self-supervised sketch-based detection with application in historical document spotting	es_ES
Document type	dc.type	Tesis	es_ES
dc.description.version	dc.description.version	Versión original del autor	es_ES
dcterms.accessRights	dcterms.accessRights	Acceso abierto	es_ES
Cataloguer	uchile.catalogador	chb	es_ES
Department	uchile.departamento	Escuela de Postgrado y Educación Continua	es_ES
Faculty	uchile.facultad	Facultad de Ciencias Físicas y Matemáticas	es_ES
uchile.gradoacademico	uchile.gradoacademico	Magister	es_ES
uchile.notadetesis	uchile.notadetesis	Tesis para optar al grado de Magíster en Ciencia de Datos	es_ES

Files in this item

Name:: Self-supervised-sketch-based-d ...
Size:: 45.17Mb
Format:: PDF

This item appears in the following Collection(s)

Tesis Postgrado
Tesis Postgrado

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States