The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition

Timofey V Ivanisenko,Pavel S Demenkov,Nikolay A Kolchanov,Vladimir A Ivanisenko
DOI: https://doi.org/10.3390/ijms232314934
2022-11-29
Abstract:The body of scientific literature continues to grow annually. Over 1.5 million abstracts of biomedical publications were added to the PubMed database in 2021. Therefore, developing cognitive systems that provide a specialized search for information in scientific publications based on subject area ontology and modern artificial intelligence methods is urgently needed. We previously developed a web-based information retrieval system, ANDDigest, designed to search and analyze information in the PubMed database using a customized domain ontology. This paper presents an improved ANDDigest version that uses fine-tuned PubMedBERT classifiers to enhance the quality of short name recognition for molecular-genetics entities in PubMed abstracts on eight biological object types: cell components, diseases, side effects, genes, proteins, pathways, drugs, and metabolites. This approach increased average short name recognition accuracy by 13%.
What problem does this paper attempt to address?