T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering

Lei Wang,Yi Hu,Jiabang He,Xing Xu,Ning Liu,Hui Liu,Heng Tao Shen

2023-12-18

Abstract:Large Language Models (LLMs) have recently demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. They have also shown the ability to perform chain-of-thought (CoT) reasoning to solve complex problems. Recent studies have explored CoT reasoning in complex multimodal scenarios, such as the science question answering task, by fine-tuning multimodal models with high-quality human-annotated CoT rationales. However, collecting high-quality COT rationales is usually time-consuming and costly. Besides, the annotated rationales are hardly accurate due to the external essential information missed. To address these issues, we propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals. The T-SciQ approach generates high-quality CoT rationales as teaching signals and is advanced to train much smaller models to perform CoT reasoning in complex modalities. Additionally, we introduce a novel data mixing strategy to produce more effective teaching data samples for simple and complex science question answer problems. Extensive experimental results show that our T-SciQ method achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18%. Moreover, our approach outperforms the most powerful fine-tuned baseline by 4.5%. The code is publicly available at <a class="link-external link-https" href="https://github.com/T-SciQ/T-SciQ" rel="external noopener nofollow">this https URL</a>.

Computation and Language

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue of multimodal chain-of-thought (CoT) reasoning in scientific question-answering tasks. Specifically, the researchers propose a new method called T-SciQ to generate high-quality CoT reasoning and improve existing methods through the following approaches: 1. **Generation of High-Quality CoT Reasoning**: Utilizing large-scale language models (LLMs) to generate high-quality CoT reasoning, overcoming the time-consuming nature of manually annotating CoT reasoning and the lack of external critical information. 2. **Data Mixing Strategy**: Proposing a new data mixing strategy that combines data samples from simple and complex questions to create a more effective training dataset. 3. **Multimodal Problem Solving**: Addressing the solution of scientific questions in multimodal scenarios, particularly when dealing with complex image information, avoiding the information loss problem caused by using image captioning models. Through these methods, T-SciQ achieves significantly better performance than other existing methods in the ScienceQA benchmark test.

T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Multimodal Chain-of-Thought Reasoning in Language Models

Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA

Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models

CoQ:AN Empirical Framework for Multi-hop Question Answering Empowered by Large Language Models

Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?

Seek and Solve Reasoning for Table Question Answering

Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models

Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning

Tree-of-Reasoning Question Decomposition for Complex Question Answering with Large Language Models

ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models

Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding

DCR: Divide-and-Conquer Reasoning for Multi-choice Question Answering with LLMs

Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models

SCOTT: Self-Consistent Chain-of-Thought Distillation

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

Enhancing human-like multimodal reasoning: a new challenging dataset and comprehensive framework

ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting