BERN2: an advanced neural biomedical named entity recognition and normalization tool

Mujeen Sung,Minbyul Jeong,Yonghwa Choi,Donghyeon Kim,Jinhyuk Lee,Jaewoo Kang
DOI: https://doi.org/10.1093/bioinformatics/btac598
2022-10-06
Abstract:In biomedical natural language processing, named entity recognition (NER) and named entity normalization (NEN) are key tasks that enable the automatic extraction of biomedical entities (e.g. diseases and drugs) from the ever-growing biomedical literature. In this article, we present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by employing a multi-task NER model and neural network-based NEN models to achieve much faster and more accurate inference. We hope that our tool can help annotate large-scale biomedical texts for various tasks such as biomedical knowledge graph construction.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address the issues of Named Entity Recognition (NER) and Named Entity Normalization (NEN) in Biomedical Natural Language Processing (BNLP). Specifically, the paper introduces BERN2, an improved neural network tool capable of faster and more accurate inference. ### Main Objectives: 1. **Support for More Entity Types**: BERN2 supports 9 types of biomedical entities, more than other commonly used tools. 2. **Increase Annotation Speed**: By using a single multi-task NER model, annotation time is significantly reduced. 3. **Enhance Entity Normalization Quality**: Combining rule-based methods with neural network-based methods to improve the quality of entity normalization. ### Specific Contributions: - **Multi-task NER Model**: A multi-task NER model is used to identify 8 types of entities (excluding mutations) such as genes/proteins, diseases, drugs/chemicals, species, etc., improving inference efficiency and allowing the use of larger pre-trained language models on a single GPU. - **Hybrid NEN Model**: For genes/proteins, diseases, and drugs/chemicals, a hybrid approach (combining rule-based and neural network-based NEN models) is adopted to increase the number of correctly normalized entities. - **Efficient Service Provision**: A web service interface is provided, and local installation is supported, making it convenient for users. With these improvements, BERN2 not only performs excellently on various entity types but also surpasses existing biomedical text mining tools in terms of processing speed and normalization accuracy.