SciAgent: Tool-augmented Language Models for Scientific Reasoning

Yubo Ma,Zhibin Gou,Junheng Hao,Ruochen Xu,Shuohang Wang,Liangming Pan,Yujiu Yang,Yixin Cao,Aixin Sun,Hany Awadalla,Weizhu Chen
2024-02-21
Abstract:Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper focuses on how to enhance the abilities of Large Language Models (LLMs) in scientific reasoning tasks. Current LLMs face challenges when dealing with complex problems in the STEM field because they require mathematical skills and domain-specific knowledge. To address this issue, the paper proposes a new task setting called "Tool-Augmented Scientific Reasoning", which aims to transform LLMs into proficient tool users instead of all-knowing problem solvers by incorporating an extensible toolkit. In the paper, the authors construct a large training corpus called MATH-FUNC, which consists of over 30,000 samples and approximately 6,000 tool functions, to train LLMs in understanding and utilizing these tools. Additionally, they develop an agent model called SCIAGENT, which can generate high-level plans based on the given problem, retrieve and utilize relevant tools to solve the problem. To evaluate the tool-assisted scientific reasoning capabilities, they create a benchmark test called SCITOOLBENCH across five scientific domains. Experiments show that SCIAGENT outperforms other LLMs of similar scale on SCITOOLBENCH, with an accuracy improvement of 13.4% for SCIAGENT-MISTRAL-7B, and SCIAGENT-DEEPMATH-7B clearly surpassing ChatGPT. The study also highlights the advantages and limitations of SCIAGENT, providing valuable insights for future research. In conclusion, the paper aims to enhance the scientific reasoning abilities of LLMs by leveraging tools to overcome the difficulties faced by existing methods in dealing with interdisciplinary scientific problems.