Identificación del diagnóstico de patología crítica en los informes radiológicos mediante procesamiento de lenguaje natural : aplicación en Chile

Ortiz Calvo, Guillermo Javier

Tesis

Open/Download

Tesis Guillermo Ortiz.pdf (2.080Mb)

TablaContenido-Guillermo Ortiz-5-6.pdf (565.2Kb)

Access note

Acceso abierto

Publication date

2016

Metadata

Show full item record

Cómo citar

Identificación del diagnóstico de patología crítica en los informes radiológicos mediante procesamiento de lenguaje natural : aplicación en ChileFormato de cita

Copiar

Cerrar

Author

Ortiz Calvo, Guillermo Javier;

Professor Advisor

Cerda, Mauricio;

Abstract

Actualmente los informes radiológicos se redactan en texto libre sin un campo específico que los categorice según diagnóstico. Por este motivo, la identificación de los diagnósticos clasificados como patología crítica debe hacerse de forma manual, acarreando consigo problemas como el submuestreo y gran tiempo invertido. Este trabajo propone como solución desarrollar una herramienta utilizando métodos de procesamiento de lenguaje natural para analizar los texto de forma masiva. En esta tesis se plantea como hipótesis que es posible identificar más del 80% de los diagnósticos existentes en SNOMED-CT (una terminología médica) presentes en las impresiones de los informes radiológicos, identificando la patología crítica con más de un 90% de sensibilidad mediante algoritmos de procesamiento de lenguaje natural (NLP). Para clasificar los informes se utilizó SNOMED-CT por su amplio manejo de conceptos médicos y sinónimos. La tarea se realizó con 3 algoritmos: 1) un motor de búsqueda para encontrar los términos de SNOMED-CT contenidos en los informes utilizando indexación reversa, 2) un detector de negación basado en expresiones regulares y 3) se combinó ambas herramientas para identificar patología crítica. Los algoritmos propuestos fueron evaluados en muestra representativa (n=219) de 1973 informes de Angiografía Pulmonar por Tomografía Computada, etiquetada por 2 médicos. Como resultados se obtuvo un valor kappa de acuerdo entre etiquetadores de 85.5%, IC95%[80.8-90.3%], p < 0.001. Por otra parte el motor de búsqueda presentó un rendimiento con medida F (F) de 0.94, sensibilidad (S) de 91.2% y valor predictivo positivo (VPP) de 98%. El detector de negación obtuvo una F de 0.99, S de 98.7% y VPP de 99.3%. Para medir el rendimiento en la detección de patología crítica se utilizó como referencia el diagnóstico de tromboembolismo pulmonar (TEP), obteniendo valores F de 0.94, S de 96.3% y VPP de 92.86% Como conclusión, el presente trabajo de tesis muestra que es posible construir una herramienta para identificar la patología crítica basada en NLP utilizando la regularidad de los patrones de expresión en el texto, lo que permitirá en futuros trabajos crear herramientas de soporte para la toma de decisiones.

Currently radiology reports are written in free text without a specific field to categorize according to diagnosis. Therefore, identification of diagnostics listed as critical result, group characterized by having a high risk of harm to the patient, must be done manually. As a solution is proposed the use of natural language processing tools to analyze big volume of texts. This thesis pose the hypothesis that it is possible to identify more than 80% of existing diagnostics from impressions of radiology reports on SNOMED-CT, a clinical terminology, identifying critical results with more than 90% sensitivity, using natural language processing (NLP) algorithms. To identify reports, SNOMED was used because of its wide management of medical terms and synonyms. Identification was built as a 3 steps algorithm: 1) A search engine was built to find terms of SNOMED contained in reports using reverse indexing, 2) a negation detector based on regular expressions, and 3) both tools were combined to identify critical results. The proposed algorithms were tested against a representative sample (n = 219) of 1973 Computed Tomography Pulmonary Angiography (CTPA) reports, which were tagged by 2 medical doctors. The obtained results were an inter-rater reliability kappa value of 85.5% for taggers, was obtained IC95% [80.8-90.3%]. Moreover, search engine had a performance of measure F (F) of 0.94, sensitivity (S) of 91.2% and positive predictive value (PPV) of 98%. The negation detector had a F of 0.99, S of 98.7% and VPP of 99.3%. The measurement of performance for critical results detection was made using pulmonary embolism as reference, obtaining values; F of 0.94, S of 96.3% and VPP of 92.86% In conclusion, this thesis shows that it is possible to build a tool to identify critical results using NLP by making use of the specific regularity of text expressions in the case of radiology reports, allowing in future researchs to create decision support tools.

General note

Grado de magíster en informática médica

Identifier

URI: https://repositorio.uchile.cl/handle/2250/147321

Collections

Tesis Postgrado

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States