Classification of CT Pulmonary Angiography Reports by Presence, Chronicity, and Location of Pulmonary Embolism with Natural Language Processing.

Sheng Yu,Kanako K. Kumamaru,Elizabeth George,Ruth M. Dunne,Arash Bedayat,Matey Neykov,Andetta R. Hunsaker,Karin E. Dill,Tianxi Cai,Frank J. Rybicki
DOI: https://doi.org/10.1016/j.jbi.2014.08.001
IF: 8
2014-01-01
Journal of Biomedical Informatics
Abstract:In this paper we describe an efficient tool based on natural language processing for classifying the detail state of pulmonary embolism (PE) recorded in CT pulmonary angiography reports. The classification tasks include: PE present vs. absent, acute PE vs. others, central PE vs. others, and subsegmental PE vs. others. Statistical learning algorithms were trained with features extracted using the NLP tool and gold standard labels obtained via chart review from two radiologists. The areas under the receiver operating characteristic curves (AUC) for the four tasks were 0.998, 0.945, 0.987, and 0.986, respectively. We compared our classifiers with bag-of-words Naive Bayes classifiers, a standard text mining technology, which gave AUC 0.942, 0.765, 0.766, and 0.712, respectively.
What problem does this paper attempt to address?