Structured Chemistry Reasoning with Large Language Models

Siru Ouyang,Zhuosheng Zhang,Bing Yan,Xuan Liu,Yejin Choi,Jiawei Han,Lianhui Qin
2024-02-10
Abstract:Large Language Models (LLMs) excel in diverse areas, yet struggle with complex scientific reasoning, especially in the field of chemistry. Different from the simple chemistry tasks (e.g., molecule classification) addressed in previous studies, complex chemistry problems require not only vast knowledge and precise calculation, but also compositional reasoning about rich dynamic interactions of different concepts (e.g., temperature changes). Our study shows that even advanced LLMs, like GPT-4, can fail easily in different ways. Interestingly, the errors often stem not from a lack of domain knowledge within the LLMs, but rather from the absence of an effective reasoning structure that guides the LLMs to elicit the right knowledge, incorporate the knowledge in step-by-step reasoning, and iteratively refine results for further improved quality. On this basis, we introduce StructChem, a simple yet effective prompting strategy that offers the desired guidance and substantially boosts the LLMs' chemical reasoning capability. Testing across four chemistry areas -- quantum chemistry, mechanics, physical chemistry, and kinetics -- StructChem substantially enhances GPT-4's performance, with up to 30\% peak improvement. Our analysis also underscores the unique difficulties of precise grounded reasoning in science with LLMs, highlighting a need for more research in this area. Code is available at \url{<a class="link-external link-https" href="https://github.com/ozyyshr/StructChem" rel="external noopener nofollow">this https URL</a>}.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the challenges faced by large language models (LLMs) in handling complex chemical reasoning problems. Specifically, although advanced LLMs like GPT-4 perform well on simple chemical tasks (e.g., molecular classification), they often fail on complex chemical problems that require extensive knowledge, precise calculations, and multi-step composite reasoning. The research found that these issues are not due to LLMs lacking the necessary domain knowledge, but rather due to the lack of an effective reasoning structure to guide LLMs in extracting relevant knowledge and performing step-by-step reasoning. Therefore, the paper proposes a method called STRUCT CHEM, which is a simple prompting strategy that guides LLMs in generating chemical formulas in stages, performing step-by-step reasoning based on these formulas, and iteratively refining the results through confidence assessment, thereby significantly improving the performance of LLMs on complex chemical reasoning tasks. Experiments show that STRUCT CHEM can greatly reduce error rates and significantly enhance the chemical reasoning capabilities of models like GPT-3.5 and GPT-4. Additionally, fine-tuning smaller language models using the reasoning processes generated by STRUCT CHEM also achieved significant performance improvements. This validates the effectiveness of the STRUCT CHEM method and its potential in solving complex chemical problems.