Show simple item record

Professor Advisordc.contributor.advisorBravo Márquez, Felipe
Authordc.contributor.authorIturra Bocaz, Gabriel Emerson
Associate professordc.contributor.otherAbeliuk Kimelman, Andrés
Associate professordc.contributor.otherGutiérrez Gallardo, Claudio
Associate professordc.contributor.otherScheihing García, Eliana
Admission datedc.date.accessioned2023-11-27T18:20:54Z
Available datedc.date.available2023-11-27T18:20:54Z
Publication datedc.date.issued2023
Identifierdc.identifier.urihttps://repositorio.uchile.cl/handle/2250/196539
Abstractdc.description.abstractWord embeddings have become indispensable tools in various natural language processing and information retrieval tasks, including document classification, ranking, and question answering. However, traditional word embedding models have a major limitation in their static nature, which hinders their ability to adapt to the constantly evolving language patterns that emerge in sources such as social media and the web (e.g., new hashtags or brand names). To address this challenge, incremental word embedding algorithms have been introduced, enabling dynamic updating of word representations in response to new language patterns and continuous data streams. This thesis presents RiverText, a comprehensive framework for training and evaluating incremental word embeddings from text data streams. Our tool provides a valuable resource for the natural language processing community that deals with word embeddings in streaming scenarios, such as social media analysis. The library implements various incremental word embedding techniques in a standardized framework, including Skip-gram, Continuous Bag of Words, and Word Context Matrix. Additionally, it uses PyTorch as its backend for neural network training, enabling efficient and flexible training. We have also implemented a module that adapts intrinsic static word embedding evaluation tasks, such as word similarity and categorization, to a streaming setting. Finally, we compare the performance of our framework using different hyperparameter settings and discuss the results. Our open-source library is available at https://github.com/dccuchile/rivertext. It includes detailed documentation and examples to help users get started with the framework quickly and easily. We believe that our framework will greatly benefit researchers and practitioners in natural language processing, especially those working with large-scale streaming text data.es_ES
Patrocinadordc.description.sponsorshipANID FONDECYT grant 1200290, National Center for Artificial Intelligence CENIA FB210017 y ANID-Millennium Science Initiative Program - Code ICN17 002es_ES
Lenguagedc.language.isoenes_ES
Publisherdc.publisherUniversidad de Chilees_ES
Type of licensedc.rightsAttribution-NonCommercial-NoDerivs 3.0 United States*
Link to Licensedc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
Keywordsdc.subjectProcesamiento de lenguaje natural (Ciencia de la computación)es_ES
Keywordsdc.subjectNatural language processing (Computer science)es_ES
Keywordsdc.subjectWord embeddingses_ES
Keywordsdc.subjectData streamses_ES
Keywordsdc.subjectIncremental learninges_ES
Títulodc.titleRiverText: A framework for training and evaluating incremental word embeddings from text data streamses_ES
Document typedc.typeTesises_ES
dc.description.versiondc.description.versionVersión original del autores_ES
dcterms.accessRightsdcterms.accessRightsAcceso abiertoes_ES
Catalogueruchile.catalogadorgmmes_ES
Departmentuchile.departamentoDepartamento de Ciencias de la Computaciónes_ES
Facultyuchile.facultadFacultad de Ciencias Físicas y Matemáticases_ES
uchile.titulacionuchile.titulacionDoble Titulaciónes_ES
uchile.carrerauchile.carreraIngeniería Civil en Computaciónes_ES
uchile.gradoacademicouchile.gradoacademicoMagisteres_ES
uchile.notadetesisuchile.notadetesisTesis para optar al grado de Magíster en Ciencias, Mención Computaciónes_ES
uchile.notadetesisuchile.notadetesisMemoria para optar al título de Ingeniero Civil en Computación


Files in this item

Icon
Icon

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States