Abstract:Large language models (LLMs) have demonstrated remarkable capabilities in tasks requiring reasoning and multi-step problem-solving through the use of chain-of-thought (CoT) prompting. However, generating the full CoT process results in significantly longer output sequences, leading to increased computational costs and latency during inference. To address this challenge, we propose a novel approach to compress the CoT process through semantic alignment, enabling more efficient decoding while preserving the benefits of CoT reasoning. Our method introduces an auxiliary CoT model that learns to generate and compress the full thought process into a compact special token representation semantically aligned with the original CoT output. This compressed representation is then integrated into the input of the Hidden Chain-of-Thought (HCoT) model. The training process follows a two-stage procedure: First, the CoT model is optimized to generate the compressed token representations aligned with the ground-truth CoT outputs using a contrastive loss. Subsequently, with the CoT model parameters frozen, the HCoT model is fine-tuned to generate accurate subsequent predictions conditioned on the prefix instruction and the compressed CoT representations from the CoT model. Extensive experiments across three challenging domains - mathematical reasoning, agent invocation, and question answering - demonstrate that our semantic compression approach achieves competitive or improved performance compared to the full CoT baseline, while providing significant speedups of at least 1.5x in decoding time. Moreover, incorporating contrastive learning objectives further enhances the quality of the compressed representations, leading to better CoT prompting and improved task accuracy. Our work paves the way for more efficient exploitation of multi-step reasoning capabilities in LLMs across a wide range of applications.

The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Stress Testing Chain-of-Thought Prompting for Large Language Models

Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse

Design of Chain-of-Thought in Math Problem Solving

Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning

Chain of Thoughtlessness? An Analysis of CoT in Planning

Markov Chain of Thought for Efficient Mathematical Reasoning

Structured Chain-of-Thought Prompting for Code Generation

Supervised Chain of Thought

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

A comparison of chain-of-thought reasoning strategies across datasets and models

An automatically discovered chain-of-thought prompt generalizes to novel models and datasets

Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning

Automatic Chain of Thought Prompting in Large Language Models

When do you need Chain-of-Thought Prompting for ChatGPT?

Self-Harmonized Chain of Thought

Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding

The Impact of Reasoning Step Length on Large Language Models