Named Entity Recognition of Medical Examination Reports Based on BiLSTM+CRF Model

Fan Zhang,Ying Zhang
DOI: https://doi.org/10.1109/AINIT59027.2023.10212675
2023-06-16
Abstract:Medical examination reports are typical unstructured data written in natural language. Named entity recognition (NER) is used to extract key information from medical texts, which serves as the foundation for further analysis of entity relationships and extraction of diagnostic knowledge. Currently, using either deep learning models or earlier conditional random field (CRF) models alone has their respective drawbacks, such as heavy annotation workload, overfitting, and model generalization issues. Additionally, Chinese medical text data presents greater difficulties for NER tasks due to its specialized and non-standardized structure. To address these issues, this paper proposes an integrated model for extracting structured information from medical examination reports, namely the Bi-LSTM and CRF ensemble model (BLC). BLC identifies medical entities in the reports, with the BiLSTM model determining the probability of each label for individual characters and the CRF decoding ensuring the final sequence adheres to the output standards. Real gastrointestinal endoscopy reports provided by hospitals were used as experimental data for annotation, and the Bi-LSTM+CRF model was built using the TensorFlow framework for training the experimental data. The effects of different parameters on entity recognition were compared. The results showed that under the BIOES annotation scheme, the model's recognition performance was superior to the BIO annotation scheme. Good segmentation results for entity categories with well-segmented features led to better recognition performance.
Medicine,Computer Science
What problem does this paper attempt to address?