An effective undersampling method for biomedical named entity recognition using machine learning

S. M. Archana,Jay Prakash
DOI: https://doi.org/10.1007/s12530-024-09573-w
IF: 2.347
2024-04-05
Evolving Systems
Abstract:Biomedical-named entity recognition (Bio-NER) is a key task of biomedical natural language processing, which is performed on biomedical documents containing professional literature and electronic health records. It involves the identification and categorisation of the named entities that are presented in a text document. Bio-NER plays a crucial role in diverse fields, such as text summarisation, relation extraction, and question answering. However, the data imbalance issue is a significant challenge in Bio-NER, which adversely impacts its performance. Many undersampling methods are employed to overcome this issue. However, they eliminate relevant information during the process, which leads to the degradation of the performance of Bio-NER. To deal with this problem, we propose an undersampling method based on preprocessing and sentence filtering (USPSF). The proposed method preserves relevant data effectively during the process. The effectiveness of the proposed method is analysed on the NCBI disease dataset, CHEMDNER and CDR chemical datasets. The performance of the proposed method is evaluated against the existing methods for Bio-NER. Experimental results show that the proposed method outperforms the competitive methods with respect to the F1 score, showcasing their efficacy in improving Bio-NER performance.
computer science, artificial intelligence
What problem does this paper attempt to address?