MF-MNER: Multi-models Fusion for MNER in Chinese Clinical Electronic Medical Records

Haoze Du,Jiahao Xu,Zhiyong Du,Lihui Chen,Shaohui Ma,Dongqing Wei,Xianfang Wang
DOI: https://doi.org/10.1007/s12539-024-00624-z
2024-04-07
Interdisciplinary Sciences Computational Life Sciences
Abstract:To address the problem of poor entity recognition performance caused by the lack of Chinese annotation in clinical electronic medical records, this paper proposes a multi-medical entity recognition method F-MNER using a fusion technique combining BART, Bi-LSTM, and CRF. First, after cleaning, encoding, and segmenting the electronic medical records, the obtained semantic representations are dynamically fused using a bidirectional autoregressive transformer (BART) model. Then, sequential information is captured using a bidirectional long short-term memory (Bi-LSTM) network. Finally, the conditional random field (CRF) is used to decode and output multi-task entity recognition. Experiments are performed on the CCKS2019 dataset, with micro avg Precision , macro avg Recall , weighted avg Precision reaching 0.880, 0.887, and 0.883, and micro avg F1-score , macro avg F1-score , weighted avg F1-score reaching 0.875, 0.876, and 0.876 respectively. Compared with existing models, our method outperforms the existing literature in three evaluation metrics ( micro average , macro average , weighted average ) under the same dataset conditions. In the case of weighted average, the Precision , Recall , and F1-score are 19.64%, 15.67%, and 17.58% higher than the existing BERT-BiLSTM-CRF model respectively. Experiments are performed on the actual clinical dataset with our MF-MNER, the Precision , Recall , and F1-score are 0.638, 0.825, and 0.719 under the micro-avg evaluation mechanism. The Precision , Recall , and F1-score are 0.685, 0.800, and 0.733 under the macro-avg evaluation mechanism. The Precision , Recall , and F1-score are 0.647, 0.825, and 0.722 under the weighted avg evaluation mechanism. The above results show that our method MF-MNER can integrate the advantages of BART, Bi-LSTM, and CRF layers, significantly improving the performance of downstream named entity recognition tasks with a small amount of annotation, and achieving excellent performance in terms of recall score, which has certain practical significance. Source code and datasets to reproduce the results in this paper are available at https://github.com/xfwang1969/MF-MNER.
mathematical & computational biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the poor performance of entity recognition in Chinese clinical electronic medical records due to the lack of labeled data. Specifically, the paper proposes a multi - model fusion method (MF - MNER), which combines BART, Bi - LSTM and CRF techniques to improve the ability to accurately recognize medical entities in Chinese clinical electronic medical records with a small amount of labeled data. ### Main contributions: 1. **Multi - model fusion**: By fusing the advantages of BART, Bi - LSTM and CRF models, the performance of entity recognition is improved. 2. **Dynamically fusing semantic representations**: The BART model is used to dynamically fuse the semantic representations of input texts, enhancing the model's context - understanding ability. 3. **Capturing sequence information**: The Bi - LSTM network is utilized to capture sequence information, further improving the model's context - modeling ability. 4. **Multi - task entity recognition**: The CRF layer is used to decode and output multi - task entity recognition results, improving the overall recognition accuracy and recall rate. ### Experimental results: - On the CCKS2019 dataset, the micro - avg precision, macro - avg recall, and weighted - avg precision reached 0.880, 0.887 and 0.883 respectively. - The micro - avg F1 - score, macro - avg F1 - score, and weighted - avg F1 - score were 0.875, 0.876 and 0.876 respectively. - Compared with the existing BERT - BiLSTM - CRF model, under the weighted - avg condition, the precision, recall rate and F1 - score were increased by 19.64%, 15.67% and 17.58% respectively. ### Practical application tests: - The test results on the actual clinical dataset show that under the micro - avg condition, the precision, recall rate and F1 - score were 0.638, 0.825 and 0.719 respectively; under the macro - avg condition, the precision, recall rate and F1 - score were 0.685, 0.800 and 0.733 respectively; under the weighted - avg condition, the precision, recall rate and F1 - score were 0.647, 0.825 and 0.722 respectively. ### Conclusion: The MF - MNER method proposed in the paper can effectively integrate the advantages of BART, Bi - LSTM and CRF layers, significantly improve the performance of entity recognition tasks in Chinese clinical electronic medical records with a small amount of labeled data, especially performs well in terms of recall rate, and has certain practical value.