EXTRACTING BI-RADS FEATURES FROM MAMMOGRAPHY REPORTS IN CHINESE BASED ON MACHINE LEARNING

Mate Zhou,Tinglong Tang,Ji Lu,Ziqing Deng,Zhenzhen Xiao,Shuifa Sun,Jun Zhang,Yirong Wu
DOI: https://doi.org/10.1615/jflowvisimageproc.2020035208
2021-01-01
Journal of Flow Visualization and Image Processing
Abstract:The Breast Imaging Reporting and Data System (BI-RADS) lexicon has been developed by the American College of Radiology (ACR) to address a lack of standardization and uniformity in breast radiology reporting. In China, structured reports are not widely used in breast imaging and BIRADS features are stored in free-text reports. It is time-consuming and error-prone to manually extract BI-RADS features for further analysis from breast mammography reports in Chinese. To develop natural language processing (NLP) methods to extract those BI-RADS features automatically, we developed four different NLP methods, including two conventional machine learning methods and two deep learning methods to extract BI-RADS features from breast mammography reports in Chinese. Traditional machine learning methods include hidden Markov model (HMM) and conditional random field model (CRF). Deep learning methods include bidirectional long short-term memory (LSTM) with CRF only (BiLSTM-CRF) and bidirectional long short-term memory (LSTM) with both CRF and attention mechanism (BiLSTM-CRF-Attention).We compared the performance of four NLP methods in terms of precision, recall, and F1-score, and found that CRF method achieved the best NLP performance. This is the first study that applies machine learning based NLP methods to extract BI-RADS features from mammography reports in Chinese, demonstrating the potential of using these BI-RADS features to assist routine clinical care, quality improvement, and research for breast cancer.
What problem does this paper attempt to address?