Divide and conquer: An extreme multi-label classification approach for coding diseases and procedures in spanish
Tesis
Access note
Acceso abierto
Publication date
2023Metadata
Show full item record
Cómo citar
Dunstan Escudero, Jocelyn
Cómo citar
Divide and conquer: An extreme multi-label classification approach for coding diseases and procedures in spanish
Author
Professor Advisor
Abstract
Clinical coding is the task of transforming medical documents into structured codes following a standard ontology. Since these terminologies are composed of thousands of codes, this problem can be considered an Extreme Multi-label Classification task. This thesis proposes a novel neural network-based architecture for clinical coding.
First, we take full advantage of the hierarchical nature of ontologies to create clusters based on semantic relations. Then, we use a Matcher module to assign the probability of documents belonging to each cluster. Finally, the Ranker calculates the probability of each code considering only the documents within the cluster. This division allows a fine-grained differentiation within the cluster, which cannot be addressed using a single classifier.
In addition, since most of the previous work has focused on solving this task in English, we conducted our experiments on four clinical coding corpora in Spanish. The experimental results demonstrate the effectiveness of our model, achieving state-of-the-art results on three of the four datasets. Specifically, we outperformed previous models on two subtasks of the CodiEsp shared task: CodiEsp-D and CodiEsp-P. Also we obtained state-of-the-art results in the FALP corpus.
xmlui.dri2xhtml.METS-1.0.item-notadetesis.item
Tesis para optar al grado de Magíster en Ciencia de Datos Memoria para optar al título de Ingeniero Civil en Computación
Identifier
URI: https://repositorio.uchile.cl/handle/2250/193936
Collections
The following license files are associated with this item: