Combination of Loss-Based Active Learning and Semi-Supervised learning for Recognizing Entities in Chinese Electronic Medical Records

Jinghui Yan,Chengqing Zong,Jinan Xu
DOI: https://doi.org/10.1145/3588314
IF: 1.471
2023-03-20
ACM Transactions on Asian and Low-Resource Language Information Processing
Abstract:The recognition of entities in an electronic medical record (EMR) is especially important to downstream tasks, such as clinical entity normalization and medical dialogue understanding. However, in the medical professional field, training a high-quality named entity recognition (NER) system always requires large-scale annotated datasets, which are highly expensive to obtain. In this paper, to lower the cost of data annotation and maximizing the use of unlabeled data, we propose a hybrid approach to recognizing the entities in Chinese electronic medical record, which is in combination of loss-based active learning and semi-supervised learning. Specifically, we adopted a dynamic balance strategy to dynamically balance the minimum loss predicted by a named entity recognition decoder and a loss prediction module at different stages in the process. Experimental results demonstrated our proposed framework’s effectiveness and efficiency, achieving higher performances than existing approaches on Chinese EMR entity recognition datasets under limited labeling resources.
computer science, artificial intelligence
What problem does this paper attempt to address?