HeCiX: Integrating Knowledge Graphs and Large Language Models for Biomedical Research

Prerana Sanjay Kulkarni,Muskaan Jain,Disha Sheshanarayana,Srinivasan Parthiban
2024-07-19
Abstract:Despite advancements in drug development strategies, 90% of clinical trials fail. This suggests overlooked aspects in target validation and drug optimization. In order to address this, we introduce HeCiX-KG, Hetionet-Clinicaltrials neXus Knowledge Graph, a novel fusion of data from <a class="link-external link-http" href="http://ClinicalTrials.gov" rel="external noopener nofollow">this http URL</a> and Hetionet in a single knowledge graph. HeCiX-KG combines data on previously conducted clinical trials from <a class="link-external link-http" href="http://ClinicalTrials.gov" rel="external noopener nofollow">this http URL</a>, and domain expertise on diseases and genes from Hetionet. This offers a thorough resource for clinical researchers. Further, we introduce HeCiX, a system that uses LangChain to integrate HeCiX-KG with GPT-4, and increase its usability. HeCiX shows high performance during evaluation against a range of clinically relevant issues, proving this model to be promising for enhancing the effectiveness of clinical research. Thus, this approach provides a more holistic view of clinical trials and existing biological data.
Computation and Language,Artificial Intelligence,Information Retrieval,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of up to 90% failure rates in clinical trials, specifically by improving the information deficiencies in target validation and drug optimization processes through the integration of heterogeneous data sources. To tackle this challenge, the authors propose HeCiX-KG, a knowledge graph that combines data from ClinicalTrials.gov (which contains a large amount of clinical trial information) and Hetionet (which includes rich biological knowledge such as diseases and genes). Additionally, the paper introduces the HeCiX system, which leverages LangChain to integrate HeCiX-KG with GPT-4, enabling users to retrieve information from the complex knowledge graph through natural language queries. HeCiX has demonstrated excellent performance in evaluations on various clinically relevant questions, showcasing its potential to enhance the efficiency of clinical research. In summary, this study aims to provide a comprehensive tool to accelerate the process of drug discovery and repurposing by constructing an integrated knowledge graph and combining it with advanced language model technology.