CACTUS: Chemistry Agent Connecting Tool-Usage to Science

Andrew D. McNaughton,Gautham Ramalaxmi,Agustin Kruel,Carter R. Knutson,Rohith A. Varikoti,Neeraj Kumar
2024-05-02
Abstract:Large language models (LLMs) have shown remarkable potential in various domains, but they often lack the ability to access and reason over domain-specific knowledge and tools. In this paper, we introduced CACTUS (Chemistry Agent Connecting Tool-Usage to Science), an LLM-based agent that integrates cheminformatics tools to enable advanced reasoning and problem-solving in chemistry and molecular discovery. We evaluate the performance of CACTUS using a diverse set of open-source LLMs, including Gemma-7b, Falcon-7b, MPT-7b, Llama2-7b, and Mistral-7b, on a benchmark of thousands of chemistry questions. Our results demonstrate that CACTUS significantly outperforms baseline LLMs, with the Gemma-7b and Mistral-7b models achieving the highest accuracy regardless of the prompting strategy used. Moreover, we explore the impact of domain-specific prompting and hardware configurations on model performance, highlighting the importance of prompt engineering and the potential for deploying smaller models on consumer-grade hardware without significant loss in accuracy. By combining the cognitive capabilities of open-source LLMs with domain-specific tools, CACTUS can assist researchers in tasks such as molecular property prediction, similarity searching, and drug-likeness assessment. Furthermore, CACTUS represents a significant milestone in the field of cheminformatics, offering an adaptable tool for researchers engaged in chemistry and molecular discovery. By integrating the strengths of open-source LLMs with domain-specific tools, CACTUS has the potential to accelerate scientific advancement and unlock new frontiers in the exploration of novel, effective, and safe therapeutic candidates, catalysts, and materials. Moreover, CACTUS's ability to integrate with automated experimentation platforms and make data-driven decisions in real time opens up new possibilities for autonomous discovery.
Computation and Language,Artificial Intelligence,Machine Learning,Chemical Physics,Quantitative Methods
What problem does this paper attempt to address?
The paper aims to address the application of large language models (LLMs) in the field of chemistry and molecular discovery. Although existing large language models have shown tremendous potential across various domains, they often lack the ability to access and reason with domain-specific knowledge and tools. To tackle this issue, researchers have developed CACTUS (Chemistry Agent Connecting Tool-Usage to Science), an LLM-based agent that integrates cheminformatics tools to enable advanced reasoning and problem-solving in chemistry and molecular discovery. The main contributions of the paper include: 1. **Development of an intelligent cheminformatics agent**: CACTUS can combine multiple open-source language models and utilize cheminformatics tools for tasks such as molecular property prediction, similarity search, and drug similarity assessment. 2. **Performance evaluation**: CACTUS's performance was assessed through a series of benchmark tests, using various open-source language models (such as Gemma-7b, Falcon-7b, MPT-7b, Llama2-7b, and Mistral-7b) to test thousands of chemistry-related questions. 3. **Emphasis on the importance of prompt engineering**: The study explored the impact of different prompt strategies on model performance, finding that domain-specific prompts can significantly improve the accuracy of the model in answering chemistry questions. 4. **Impact of hardware configurations**: The paper investigated the effect of different hardware configurations (such as GPU types) on model performance, demonstrating that smaller models can achieve good performance on consumer-grade hardware, providing more possibilities for researchers with limited resources. 5. **Potential for autonomous discovery**: By integrating with automated experimental platforms, CACTUS can play a role in real-time data-driven decision-making, designing and prioritizing experiments, analyzing results, and iteratively refining hypotheses, thereby exploring the chemical space more efficiently. Through these methods, CACTUS not only enhances research efficiency in the field of cheminformatics but also brings new breakthrough opportunities for drug design and materials science.