Abstract:A ChatGPT-like system for drug compounds could be a game-changer in pharmaceutical research, accelerating drug discovery, enhancing our understanding of structure-activity relationships, guiding lead optimization, aiding drug repurposing, reducing the failure rate, and streamlining clinical trials. In this work, we make an initial attempt towards enabling ChatGPT-like capabilities on drug molecule graphs, by developing a prototype system DrugChat. DrugChat works in a similar way as ChatGPT. Users upload a compound molecule graph and ask various questions about this compound. DrugChat will answer these questions in a multi-turn, interactive manner. The DrugChat system consists of a graph neural network (GNN), a large language model (LLM), and an adaptor. The GNN takes a compound molecule graph as input and learns a representation for this graph. The adaptor transforms the graph representation produced by the GNN into another representation that is acceptable to the LLM. The LLM takes the compound representation transformed by the adaptor and users' questions about this compound as inputs and generates answers. All these components are trained end-to-end. To train DrugChat, we collected instruction tuning datasets which contain 10,834 drug compounds and 143,517 question-answer pairs. The code and data is available at \url{<a class="link-external link-https" href="https://github.com/UCSD-AI4H/drugchat" rel="external noopener nofollow">this https URL</a>}

What problem does this paper attempt to address?

The problem this paper attempts to address is the development of a system similar to ChatGPT, specifically for the analysis of drug molecular maps, namely **DrugChat**. This system aims to accelerate the drug discovery process by providing instant, interactive analysis, enhancing the understanding of structure-activity relationships (SAR), guiding lead compound optimization, supporting drug repurposing, reducing failure rates, and simplifying clinical trials. Specifically, the paper points out that current drug discovery and development is a time-consuming and costly process, often taking years and billions of dollars to bring a single drug to market. Traditional methods often involve extensive iterative testing and have high late-stage failure rates. While recent advances in computational chemistry and cheminformatics have provided some relief, there is still an urgent need for tools that can intuitively understand and generate meaningful insights from complex molecular maps. Therefore, the development of DrugChat aims to: 1. **Accelerate drug discovery**: Significantly shorten the time required in the early stages of drug discovery by providing instant insights into potential therapeutic uses, side effects, and contraindications of drugs. 2. **Predict drug interactions**: By comparing the molecular structures of thousands of known substances, predict potential conflicts or synergistic effects between new candidate drugs and existing drugs, helping researchers better anticipate the performance of new drugs in practical applications. 3. **Understand structure-activity relationships (SAR)**: Help researchers understand the relationship between the chemical structure of drugs and their biological activity, and predict which chemical structure modifications can enhance their effects or reduce adverse side effects. 4. **Guide lead compound optimization**: Provide structural modification suggestions to improve efficacy, reduce toxicity, and enhance pharmacokinetic parameters during the drug discovery process, guiding researchers in the right direction and saving valuable time. 5. **Support drug repurposing**: By understanding the structural properties of existing drugs, identify candidate drugs that may be effective for diseases not initially targeted, bringing new life to existing drugs and providing faster pathways for treating challenging diseases. 6. **Reduce failure rates**: Help reduce late-stage failures due to unforeseen toxicity and efficacy issues by providing more accurate predictions about drug properties and effects early in the project. 7. **Simplify clinical trials**: Design more effective clinical trials by predicting drug interactions with other drugs or conditions, enabling researchers to target trials more effectively and recruit suitable patient populations. To achieve these goals, the DrugChat system is composed of Graph Neural Networks (GNN), Large Language Models (LLM), and adapters. The GNN is responsible for learning representations from drug molecular graphs, the adapter converts the graph representations generated by the GNN into a form acceptable to the LLM, and the LLM generates answers based on user queries. All these components are trained end-to-end, with the training dataset comprising 10,834 drug compounds and 143,517 question-answer pairs.

DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback

The Future of ChatGPT in Medicinal Chemistry: Harnessing AI for Accelerated Drug Discovery

ChatMol: Interactive Molecular Discovery with Natural Language

An Adaptive Graph Learning Method for Automated Molecular Interactions and Properties Predictions

Multi-Modal Large Language Model Enables All-Purpose Prediction of Drug Mechanisms and Properties

Comprehensive evaluation of molecule property prediction with ChatGPT

ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis

QPowered Compound2DeNovoDrugPropMax –A Novel Programmatic Tool Incorporating Deep Learning and In Silico Methods for Automated In Silico Bio- Activity Discovery for any Compound of Interest

Interactive Molecular Discovery with Natural Language

Abstract 3524: Leveraging ChatGPT for literature-based inference of drug-gene relationships in cancer

Molecule Generation for Drug Design: a Graph Learning Perspective

Assessing the ability of ChatGPT to extract natural product bioactivity and biosynthesis data from publications

Learn molecular representations from large-scale unlabeled molecules for drug discovery

Graph Neural Networks for Drug Discovery: An Integrated Decision Support Pipeline

An effective self-supervised framework for learning expressive molecular global representations to drug discovery

Can ChatGPT pass Glycobiology?

DrugGen: Advancing Drug Discovery with Large Language Models and Reinforcement Learning Feedback

ChatMolData: a Multimodal Agent for Automatic Molecular Data Processing

HiGNN: Hierarchical Informative Graph Neural Networks for Molecular Property Prediction Equipped with Feature-Wise Attention

Unveiling the Potential of Knowledge-Prompted ChatGPT for Enhancing Drug Trafficking Detection on Social Media