Structured Chemistry Reasoning with Large Language Models

Siru Ouyang,Zhuosheng Zhang,Bing Yan,Xuan Liu,Yejin Choi,Jiawei Han,Lianhui Qin

2024-02-10

Abstract:Large Language Models (LLMs) excel in diverse areas, yet struggle with complex scientific reasoning, especially in the field of chemistry. Different from the simple chemistry tasks (e.g., molecule classification) addressed in previous studies, complex chemistry problems require not only vast knowledge and precise calculation, but also compositional reasoning about rich dynamic interactions of different concepts (e.g., temperature changes). Our study shows that even advanced LLMs, like GPT-4, can fail easily in different ways. Interestingly, the errors often stem not from a lack of domain knowledge within the LLMs, but rather from the absence of an effective reasoning structure that guides the LLMs to elicit the right knowledge, incorporate the knowledge in step-by-step reasoning, and iteratively refine results for further improved quality. On this basis, we introduce StructChem, a simple yet effective prompting strategy that offers the desired guidance and substantially boosts the LLMs' chemical reasoning capability. Testing across four chemistry areas -- quantum chemistry, mechanics, physical chemistry, and kinetics -- StructChem substantially enhances GPT-4's performance, with up to 30\% peak improvement. Our analysis also underscores the unique difficulties of precise grounded reasoning in science with LLMs, highlighting a need for more research in this area. Code is available at \url{<a class="link-external link-https" href="https://github.com/ozyyshr/StructChem" rel="external noopener nofollow">this https URL</a>}.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the challenges faced by large language models (LLMs) in handling complex chemical reasoning problems. Specifically, although advanced LLMs like GPT-4 perform well on simple chemical tasks (e.g., molecular classification), they often fail on complex chemical problems that require extensive knowledge, precise calculations, and multi-step composite reasoning. The research found that these issues are not due to LLMs lacking the necessary domain knowledge, but rather due to the lack of an effective reasoning structure to guide LLMs in extracting relevant knowledge and performing step-by-step reasoning. Therefore, the paper proposes a method called STRUCT CHEM, which is a simple prompting strategy that guides LLMs in generating chemical formulas in stages, performing step-by-step reasoning based on these formulas, and iteratively refining the results through confidence assessment, thereby significantly improving the performance of LLMs on complex chemical reasoning tasks. Experiments show that STRUCT CHEM can greatly reduce error rates and significantly enhance the chemical reasoning capabilities of models like GPT-3.5 and GPT-4. Additionally, fine-tuning smaller language models using the reasoning processes generated by STRUCT CHEM also achieved significant performance improvements. This validates the effectiveness of the STRUCT CHEM method and its potential in solving complex chemical problems.

Structured Chemistry Reasoning with Large Language Models

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks

Chain-of-Thoughts for Molecular Understanding

Are large language models superhuman chemists?

ChemDFM: A Large Language Foundation Model for Chemistry

Exploring the Potential of Large Language Models in Molecular Tasks: An Insightful Evaluation with GPT‐4

LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset

Fine-tuning Large Language Models for Chemical Text Mining

BatGPT-Chem: A Foundation Large Model For Chemical Engineering

Large Language Models are Catalyzing Chemistry Education

Augmenting large language models with chemistry tools

ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models

Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design

Leveraging large language models for predictive chemistry

Concise and Organized Perception Facilitates Reasoning in Large Language Models

ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Coupling Large Language Models with Logic Programming for Robust and General Reasoning from Text