A Natural Language Processing Tool to Extract Quantitative Smoking Status from Clinical Narratives.

Xi Yang,Hanyuan Yang,Tianchen Lyu,Shuang Yang,Yi Guo,Jiang Bian,Hua Xu,Yonghui Wu
DOI: https://doi.org/10.1109/ichi48887.2020.9374369
2020-01-01
Abstract:This study presents a natural language processing (NLP) tool to extract quantitative smoking information (e.g., Pack-Year, Quit Year, Smoking Year, and Pack per Day) from clinical notes and standardized them into Pack-Year unit. We annotated a corpus of 200 clinical notes from patients who had low-dose CT imaging procedures for lung cancer screening and developed an NLP system using a two-layer rule-engine structure. We divided the 200 notes into a training set and a test set and developed the NLP system only using the training set. The experimental results on the test set showed that our NLP system achieved the best F1 scores of 0.963 and 0.946 for lenient and strict evaluation, respectively.
What problem does this paper attempt to address?