Noise Reduction Learning Based on XLNet-CRF for Biomedical Named Entity Recognition

Zhaoying Chai,Han Jin,Shenghui Shi,Siyan Zhan,Lin Zhuo,Yu Yang,Qi Lian
DOI: https://doi.org/10.1109/tcbb.2022.3157630
2023-01-01
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Abstract:In recent years, Biomedical Named Entity Recognition (BioNER) systems have mainly been based on deep neural networks, which are used to extract information from the rapidly expanding biomedical literature. Long-distance context autoencoding language models based on transformers have recently been employed for BioNER with great success. However, noise interference exists in the process of pre-training and fine-tuning, and there is no effective decoder for label dependency. Current models have many aspects in need of improvement for better performance. We propose two kinds of noise reduction models, Shared Labels and Dynamic Splicing, based on XLNet encoding which is a permutation language pre-training model and decoding by Conditional Random Field (CRF). By testing 15 biomedical named entity recognition datasets, the two models improved the average F1-score by 1.504 and 1.48, respectively, and state-of-the-art performance was achieved on 7 of them. Further analysis proves the effectiveness of the two models and the improvement of the recognition effect of CRF, and suggests the applicable scope of the models according to different data characteristics.
What problem does this paper attempt to address?