BactInt: A domain driven transfer learning approach for extracting inter-bacterial associations from biomedical text

Krishanu Das Baksi,Vatsala Pokhrel,Anand Eruvessi Pudavar,Sharmila S. Mande,Bhusan K. Kuntal
DOI: https://doi.org/10.1016/j.compbiolchem.2023.108012
IF: 3.737
2024-01-05
Computational Biology and Chemistry
Abstract:Background The healthy as well as dysbiotic state of an ecosystem like human body is known to be influenced not only by the presence of the bacterial groups in it, but more with respect to the associations within themselves. Evidence reported in biomedical text serves as a reliable source for identifying and ascertaining such inter bacterial associations. However, the complexity of the reported text as well as the ever-increasing volume of information necessitates development of methods for automated and accurate extraction of such knowledge. Methods A BioBERT (biomedical domain specific language model) based information extraction model for bacterial associations is presented that utilizes learning patterns from other publicly available datasets. Additionally, a specialized sentence corpus has been developed to significantly improve the prediction accuracy of the 'transfer learned' model using a fine-tuning approach. Results The final model was seen to outperform all other variations (non-transfer learned and non-fine-tuned models) as well as models trained on BioGPT (a domain trained Generative Pre-trained Transformer). To further demonstrate the utility, a case study was performed using bacterial association network data obtained from experimental studies. Conclusion This study attempts to demonstrates the applicability of transfer learning in a niche field of life sciences where understanding of inter bacterial relationships is crucial to obtain meaningful insights in comprehending microbial community structures across different ecosystems. The study further discusses how such a model can be further improved by fine tuning using limited training data. The results presented and the datasets made available are expected to be a valuable addition in the field of medical informatics and bioinformatics.
biology,computer science, interdisciplinary applications
What problem does this paper attempt to address?