Unified Fine-Grained Biomedical Entity Recognition As a Combination of Boundary Detection and Sequence Generation.

Xue Li,Yang,Mingchen Ye,Yi Guan,Xuehui Yu,Jingchi Jiang
DOI: https://doi.org/10.1109/bibm55620.2022.9995683
2022-01-01
Abstract:Biomedical Named Entity Recognition (BioNER) is a critical component of biomedical information extraction. NER is more challenging in the biomedical domain because of fine-grained entity types and more common nested and discontinuous entity forms. However, none of the BioNER datasets contains a large amount of all three entity forms, including flat, nested, and discontinuous. Not to mention that there is a unified BioNER model for dealing with the above three entity forms simultaneously. Methods in the public domain only focus on identifying text spans and ignore distinguishing fine-grained entity types. To address these issues, we propose a unified framework based on our own BioNER dataset CCNER, which innovatively models the BioNER task as a combination of boundary recognition and sequence generation. CCNER is a comprehensive and fine-grained BioNER dataset, where the proportion of discontinuous, nested, and flat entities in the dataset is 8.9%, 52.6%, and 38.5%, respectively. Meanwhile, it includes five fine-grained entity types. Our proposed framework includes two modules which are boundary detection and entity generation. In the boundary detection module, we propose a sample-based span representation method to determine fine-grained entity boundaries better. Finally, we conduct experiments on four datasets and achieve competitive results 1 . 1 Code is available at https://github.com/lx-hit/BioNER.
What problem does this paper attempt to address?