Phenomics Assistant: An Interface for LLM-based Biomedical Knowledge Graph Exploration

Shawn T O’Neil,Kevin Schaper,Glass Elsarboukh,Justin T Reese,Sierra A T Moxon,Nomi L Harris,Monica C Munoz-Torres,Peter N Robinson,Melissa A Haendel,Christopher J Mungall
DOI: https://doi.org/10.1101/2024.01.31.578275
2024-02-02
Abstract:We introduce Phenomics Assistant, a prototype chat-based interface for querying the Monarch knowledge graph (KG), a comprehensive biomedical database. While unaided Large Large Language models (LLMs) are prone to mistakes in factual recall, their strong abilities in summarization and tool use suggest new opportunities to help non-expert users query and interact with complex data, while drawing on the KG to improve reliability of the answers. Leveraging the ability of LLMs to interpret queries in natural language, Phenomics Assistant enables a wide range of users to interactively discover relationships between diseases, genes, and phenotypes. To assess the reliability of our approach and compare the accuracy of different LLMs, we evaluated Phenomics Assistant answers on benchmark tasks for gene-disease association and gene alias queries. While comparisons across tested LLMs revealed differences in their ability to interpret KG-provided information, we found that even basic KG access markedly boosts the reliability of standalone LLMs. By enabling users to pose queries in natural language and summarizing results in familiar terms, Phenomics Assistant represents a new approach for navigating the Monarch KG.
Bioinformatics
What problem does this paper attempt to address?
The main objective of this paper is to introduce and evaluate a new interface called Phenomics Assistant, a biomedical knowledge graph query tool based on large language models (LLM). By combining powerful language understanding and generation capabilities with the Monarch Knowledge Graph (a comprehensive biomedical database), Phenomics Assistant aims to address the following key issues: 1. **Enhancing query capabilities for non-expert users**: Enabling users without professional background or specific domain knowledge to easily interact with complex biomedical data and ask questions in natural language. 2. **Improving the accuracy of answers**: By integrating the Monarch Knowledge Graph, enhancing the reliability of LLMs in answering questions, especially when dealing with specialized knowledge in the biomedical field. 3. **Simplifying access to complex data**: Providing users with an intuitive and easy-to-understand way to explore relationships between diseases, genes, and phenotypes without needing to understand specialized query languages or tools. 4. **Comparing the performance of different LLM models**: Evaluating the performance differences of various LLM models in gene alias recognition and gene-disease association tasks before and after integrating the knowledge graph, to determine the impact of knowledge graph integration on model performance. In summary, this study aims to develop a user-friendly interface that allows a wide range of users to leverage powerful language models and comprehensive biomedical databases to obtain accurate and useful information. Through experimental evaluation, researchers have demonstrated that even the most basic knowledge graph integration can significantly improve the reliability of LLM models.