Developing Customizable Cancer Information Extraction Modules for Pathology Reports Using CLAMP.

Ergin Soysal,Jeremy L Warner,Jingqi Wang,Min Jiang,Krysten Harvey,Sandeep Kumar Jain,Xiao Dong,Hsing-Yi Song,Harish Siddhanamatha,Liwei Wang,Qi Dai,Qingxia Chen,Xianglin Du,Cui Tao,Ping Yang,Joshua Charles Denny,Hongfang Liu,Hua Xu
DOI: https://doi.org/10.3233/SHTI190383
2019-01-01
Abstract:Natural language processing (NLP) technologies have been successfully applied to cancer research by enabling automated phenotypic information extraction from narratives in electronic health records (EHRs) such as pathology reports; however, developing customized NLP solutions requires substantial effort. To facilitate the adoption of NLP in cancer research, we have developed a set of customizable modules for extracting comprehensive types of cancer-related information in pathology reports (e.g., tumor size, tumor stage, and biomarkers), by leveraging the existing CLAMP system, which provides user-friendly interfaces for building customized NLP solutions for individual needs. Evaluation using annotated data at Vanderbilt University Medical Center showed that CLAMP-Cancer could extract diverse types of cancer information with good F-measures (0.80-0.98). We then applied CLAMP-Cancer to an information extraction task at Mayo Clinic and showed that we can quickly build a customized NLP system with comparable performance with an existing system at Mayo Clinic. CLAMP-Cancer is freely available for academic use.
What problem does this paper attempt to address?