Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas

Báez, Pablo; Villena, Fabián; Zúñiga, Karen; Jones, Natalia; Fernández, Gustavo; Durán, Manuel; Dunstan Escudero, Jocelyn Mariel

Author	dc.contributor.author	Báez, Pablo
Author	dc.contributor.author	Villena, Fabián
Author	dc.contributor.author	Zúñiga, Karen
Author	dc.contributor.author	Jones, Natalia
Author	dc.contributor.author	Fernández, Gustavo
Author	dc.contributor.author	Durán, Manuel
Author	dc.contributor.author	Dunstan Escudero, Jocelyn Mariel
Admission date	dc.date.accessioned	2022-05-03T16:35:55Z
Available date	dc.date.available	2022-05-03T16:35:55Z
Publication date	dc.date.issued	2021
Cita de ítem	dc.identifier.citation	Rev Med Chile 2021; 149: 1014-1022	es_ES
Identifier	dc.identifier.issn	0034-9887
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/185227
Abstract	dc.description.abstract	A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest. Aim: To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals. Material and Methods: A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator’s agreement during their training. Results: An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement. Conclusions: A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.	es_ES
Lenguage	dc.language.iso	es	es_ES
Publisher	dc.publisher	Soc Medica Santiago	es_ES
Type of license	dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
Source	dc.source	Revista Médica de Chile	es_ES
Keywords	dc.subject	Data curation	es_ES
Keywords	dc.subject	Data mining	es_ES
Keywords	dc.subject	Medical informatics	es_ES
Keywords	dc.subject	Natural language processing	es_ES
Keywords	dc.subject	Supervised machine learning	es_ES
Título	dc.title	Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas	es_ES
Title in another language	dc.title.alternative	Construction of text resources for automatic identification of clinical information in unstructured narratives	es_ES
Document type	dc.type	Artículo de revista	es_ES
dc.description.version	dc.description.version	Versión publicada - versión final del editor	es_ES
dcterms.accessRights	dcterms.accessRights	Acceso abierto	es_ES
Cataloguer	uchile.catalogador	cfr	es_ES
Indexation	uchile.index	Artículo de publícación WoS	es_ES
Indexation	uchile.index	Artículo de publicación SCOPUS	es_ES
Indexation	uchile.index	Artículo de publicación SCIELO	es_ES

Files in this item

Name:: Construction-of-text-resources.pdf
Size:: 797.3Kb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States