Named Entity Recognition in Biology Literature Based on Unsupervised Domain Adaptation Method.

Xingjian Xu,Fang Liu,Fanjun Meng
DOI: https://doi.org/10.1007/978-3-031-10989-8_34
2022-01-01
Abstract:By the careful rearrangement and analysis, various meaningful information could be extracted from the published biological literature, which is of great significance for the related follow-up research. Since the rapid accumulation of literature publications, manual-based curation method is too inefficient to cope with the massive biological literature data. Knowledge extraction methods in computer science seem to be able to process the biology literature more efficiently, however, most of them are based on supervised learning and greatly limited by the annotation quality of the species corpora. Here we present a new named entity recognition algorithm named biolitNER, which features unsupervised domain adaptation and accordingly has the capability to recognize named entities on biomedical literature across domains. Considering the lack of well-annotated corpora of many species, biolitNER is of great utility for named entity recognition and bears critical significance for biomedical literature curation in various species. Experimental verification shows that, compared with traditional programs, biolitNER produces higher quality results and has a satisfactory runtime performance.
What problem does this paper attempt to address?