Leveraging Generative AI to Accelerate Biocuration of Medical Actions for Rare Disease

Enock Niyonkuru,J Harry Caufield,Leigh Carmody,Michael Gargano,Sabrina Toro,Trish Whetzel,Hannah Blau,Mauricio Soto,Elena Casiraghi,Leonardo Chimirri,Justin T Reese,Giorgio Valentini,Melissa A Haendel,Christopher J Mungall,Peter N Robinson
DOI: https://doi.org/10.1101/2024.08.22.24310814
2024-08-22
MedRxiv
Abstract:Background: Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for the clinical management of patients. Currently, manual biocuration is used to assign MAxO terms to rare diseases, enabling clinical management of rare diseases to be described computationally for use in clinical decision support and mechanism discovery. However, it is challenging to scale manual curation to comprehensively capture information about medical actions for the more than 10,000 rare diseases. Methods: We present AutoMAxO, a semi-automated workflow that leverages Large Language Models (LLMs) to streamline MAxO biocuration for rare diseases. AutoMAxO first uses LLMs to retrieve candidate curations from abstracts of relevant publications. Next, the candidate curations are matched to ontology terms from MAxO, Human Phenotype Ontology (HPO), and MONDO disease ontology via a combination of LLMs and post-processing techniques. Finally, the matched terms are presented in a structured form to a human curator for approval. Results: We used this approach to process 4,918 unique medical abstracts and identified annotations for 21 rare genetic diseases, we extracted 18,631 candidate disease-treatment curations, 538 of which were confirmed and transferred to the MAxO annotation dataset. Conclusion: The results of this project underscore the potential of generative AI to accelerate precision medicine by enabling a robust and comprehensive curation of the primary literature to represent information about diseases and procedures in a structured fashion. Although we focused on MAxO in this project, similar approaches could be taken for other biomedical curation tasks.
What problem does this paper attempt to address?