Abstract:Objective: Breast cancer is the most common malignant tumor among women. The diagnosis and treatment information of breast cancer patients is abundant in multiple types of clinical fields, including clinicopathological data, genotype and phenotype information, treatment information, and prognosis information. However, current studies are mainly focused on extracting information from one specific type of clinical field. This study defines a comprehensive information model to represent the whole-course clinical information of patients. Furthermore, deep learning approaches are used to extract the concepts and their attributes from clinical breast cancer documents by fine-tuning pretrained Bidirectional Encoder Representations from Transformers (BERT) language models. Materials and methods: The clinical corpus that was used in this study was from one 3A cancer hospital in China, consisting of the encounter notes, operation records, pathology notes, radiology notes, progress notes and discharge summaries of 100 breast cancer patients. Our system consists of two components: a named entity recognition (NER) component and a relation recognition component. For each component, we implemented deep learning-based approaches by fine-tuning BERT, which outperformed other state-of-the-art methods on multiple natural language processing (NLP) tasks. A clinical language model is first pretrained using BERT on a large-scale unlabeled corpus of Chinese clinical text. For NER, the context embeddings that were pretrained using BERT were used as the input features of the Bi-LSTM-CRF (Bidirectional long-short-memory-conditional random fields) model and were fine-tuned using the annotated breast cancer notes. Furthermore, we proposed an approach to fine-tune BERT for relation extraction. It was considered to be a classification problem in which the two entities that were mentioned in the input sentence were replaced with their semantic types. Results: Our best-performing system achieved F1 scores of 93.53% for the NER and 96.73% for the relation extraction. Additional evaluations showed that the deep learning-based approaches that fine-tuned BERT did outperform the traditional Bi-LSTM-CRF and CRF machine learning algorithms in NER and the attention-Bi-LSTM and SVM (support vector machines) algorithms in relation recognition. Conclusion: In this study, we developed a deep learning approach that fine-tuned BERT to extract the breast cancer concepts and their attributes. It demonstrated its superior performance compared to traditional machine learning algorithms, thus supporting its uses in broader NER and relation extraction tasks in the medical domain.

EXTRACTING BI-RADS FEATURES FROM MAMMOGRAPHY REPORTS IN CHINESE BASED ON MACHINE LEARNING

Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing.

A Natural Language Processing Pipeline of Chinese Free-text Radiology Reports for Liver Cancer Diagnosis

Machine Learning Based on Automated Breast Volume Scanner (ABVS) Radiomics for Differential Diagnosis of Benign and Malignant BI‐RADS 4 Lesions

BI-RADS Reading of Non-Mass Lesions on DCE-MRI and Differential Diagnosis Performed by Radiomics and Deep Learning

Extracting comprehensive clinical information for breast cancer using deep learning methods

Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework

The implementation of natural language processing to extract index lesions from breast magnetic resonance imaging reports

Harnessing Large Language Models for Structured Reporting in Breast Ultrasound: A Comparative Study of Open AI (GPT-4.0) and Microsoft Bing (GPT-4)

Application of natural language processing to post-structuring of rectal cancer MRI reports

A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging

Deep BI-RADS Network for Improved Cancer Detection from Mammograms

Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble

Deep Learning for Describing Breast Ultrasound Images with BI-RADS Terms

A High-Performance Deep Neural Network Model for BI-RADS Classification of Screening Mammography

An effective fine grading method of BI-RADS classification in mammography

Deep Learning Model Improves Radiologists' Performance in Detection and Classification of Breast Lesions

Bi-Rads-Net: An Explainable Multitask Learning Approach for Cancer Diagnosis in Breast Ultrasound Images

BI-RADS BERT and Using Section Segmentation to Understand Radiology Reports

A New Computer-Aided Diagnosis System with Modified Genetic Feature Selection for BI-RADS Classification of Breast Masses in Mammograms

Deep learning of mammogram images to reduce unnecessary breast biopsies: a preliminary study