LogiCoT: Logical Chain-of-Thought Instruction-Tuning

Hanmeng Liu,Zhiyang Teng,Leyang Cui,Chaoli Zhang,Qiji Zhou,Yue Zhang
2023-10-28
Abstract:Generative Pre-trained Transformer 4 (GPT-4) demonstrates impressive chain-of-thought reasoning ability. Recent work on self-instruction tuning, such as Alpaca, has focused on enhancing the general proficiency of models. These instructions enable the model to achieve performance comparable to GPT-3.5 on general tasks like open-domain text generation and paraphrasing. However, they fall short of helping the model handle complex reasoning tasks. To bridge the gap, this paper presents LogiCoT, a new instruction-tuning dataset for Logical Chain-of-Thought reasoning with GPT-4. We elaborate on the process of harvesting instructions for prompting GPT-4 to generate chain-of-thought rationales. LogiCoT serves as an instruction set for teaching models of logical reasoning and elicits general reasoning skills.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue of current large language models (LLMs) underperforming in logical reasoning tasks, particularly the lack of multi-step logical reasoning capabilities. Although existing self-guided fine-tuning methods (such as Alpaca) can improve the model's performance on general tasks, they are still insufficient when dealing with complex reasoning tasks. To this end, the paper proposes a new instruction fine-tuning dataset—LogiCoT, specifically designed to enhance the model's Chain-of-Thought (CoT) reasoning ability. Specifically, the main contributions of the paper include: 1. **Constructing the LogiCoT dataset**: By leveraging the powerful generation capabilities of GPT-4, logical reasoning instructions are extracted and constructed from existing logical reasoning datasets to form a high-quality Chain-of-Thought fine-tuning dataset. 2. **Enhancing logical reasoning capabilities**: By performing instruction fine-tuning on the LLaMA-7b model, the effectiveness of the LogiCoT dataset is validated. Experimental results show that the model fine-tuned with LogiCoT exhibits significant improvement in logical reasoning benchmark tests. 3. **Expanding the application scope**: In addition to logical reasoning tasks, the fine-tuned model also performs well in general human-centric language model benchmark tests, demonstrating its generalization ability. In summary, the paper aims to address the shortcomings of existing models in logical reasoning capabilities by constructing a specialized dataset, thereby promoting the application and development of large language models in complex reasoning tasks.