Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving

Botao Yu,Frazier N. Baker,Ziru Chen,Garrett Herb,Boyu Gou,Daniel Adu-Ampratwum,Xia Ning,Huan Sun
2024-11-12
Abstract:To enhance large language models (LLMs) for chemistry problem solving, several LLM-based agents augmented with tools have been proposed, such as ChemCrow and Coscientist. However, their evaluations are narrow in scope, leaving a large gap in understanding the benefits of tools across diverse chemistry tasks. To bridge this gap, we develop ChemAgent, an enhanced chemistry agent over ChemCrow, and conduct a comprehensive evaluation of its performance on both specialized chemistry tasks and general chemistry questions. Surprisingly, ChemAgent does not consistently outperform its base LLMs without tools. Our error analysis with a chemistry expert suggests that: For specialized chemistry tasks, such as synthesis prediction, we should augment agents with specialized tools; however, for general chemistry questions like those in exams, agents' ability to reason correctly with chemistry knowledge matters more, and tool augmentation does not always help.
Artificial Intelligence,Computational Engineering, Finance, and Science
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to evaluate the performance of tool - enhanced language model (LLM) agents in chemical problem - solving, especially whether these agents can always outperform the basic large - language models in different types of chemical tasks. Specifically: 1. **The problem of narrow evaluation scope**: Existing tool - enhanced agents such as ChemCrow and Coscientist have shown certain potential, but their evaluations are mainly concentrated on a few specific tasks, failing to fully understand the actual performance of these agents in diverse chemical tasks. 2. **Understanding the effect of tool enhancement**: The research aims to deeply explore the impact of tool enhancement on the large - language model's ability to solve chemical problems, especially the differences between specialized tasks (such as synthesis prediction) and general problems (such as exam questions). To answer these questions, the author developed ChemAgent, an improved version of the chemical agent, and comprehensively evaluated its performance on specialized chemical tasks and general chemical problems. The study found that: - For specialized tasks, such as those related to molecules and reaction centers, tool enhancement significantly improves performance. - For general problems, tool enhancement does not always lead to performance improvement and sometimes is even inferior to the basic large - language model. Through detailed error analysis, the study points out that when dealing with general problems, agents are prone to making minor errors in the reasoning process, which may be caused by the additional cognitive burden introduced by tool enhancement or the inconsistency between tool output and the internal knowledge of the model. In summary, the core objective of this paper is to reveal the advantages and limitations of tool - enhanced agents in different chemical tasks through systematic evaluation and error analysis, providing guidance for future research.