Abstract:Chain of Thought (CoT) of multi-step benefits from the logical structure of the reasoning steps and task-specific actions, significantly enhancing the mathematical reasoning capabilities of large language models. As the prevalence of long CoT, the number of reasoning steps exceeds manageable token limits and leads to higher computational demands. Inspired by the fundamental logic of human cognition, ``derive, then reduce'', we conceptualize the standard multi-step CoT as a novel Markov Chain of Thought (MCoT). In this study, we consider the mathematical reasoning task, defining each reasoning step as text accompanied by a Python code snippet. To facilitate a longer reasoning path, self-correction is enabled through interactions with the code interpreter. Our MCoT aims to compress previous reasoning steps into a simplified question, enabling efficient next-step inference without relying on a lengthy KV cache. In our experiments, we curate the \texttt{MCoTInstruct} dataset, and the empirical results indicate that MCoT not only significantly enhances efficiency but also maintains comparable accuracy. While much remains to be explored, this work paves the way for exploring the long CoT reasoning abilities of LLMs.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the efficiency and accuracy issues faced by large - language models (LLMs) when performing complex mathematical reasoning. Specifically, although the existing multi - step reasoning methods (Multi - step Reasoning, MSR) have improved the reasoning ability, as the number of reasoning steps increases, problems such as excessive consumption of computing resources, prolonged reasoning time, and cumulative errors will occur. These problems make multi - step reasoning inefficient in practical applications. To solve the above problems, the author proposes a new reasoning framework - Markov Chain of Thought (MCoT). MCoT decomposes the complex reasoning process into a series of simplified sub - problems and uses the properties of Markov chains to model the transformation relationships between these sub - problems, thereby achieving efficient reasoning. Its core idea is inspired by "derive and then simplify" in human cognition, ensuring that each reasoning step only depends on the current state and not on previous historical information. This not only reduces the requirements for memory and computing resources but also improves the speed and accuracy of reasoning. ### Main contributions 1. **Propose an innovative framework**: Use the characteristics of Markov chains to view the reasoning process as a sequence of transitions between states. 2. **Construct the MCoTInstruct dataset**: A dataset specifically designed for mathematical reasoning tasks to promote the development of the research community. 3. **Experimental verification**: Extensive experiments show that in the case of up to 8 reasoning steps, MCoT is 1.9 times faster than traditional multi - step reasoning and maintains higher accuracy. 4. **Explore advanced reasoning abilities**: Provide a new way to explore more advanced reasoning abilities and will release model checkpoints and code repositories upon acceptance. ### Specific methods - **Markov Chain of Thought Reasoning**: Assume that each successful derivation step can gradually simplify the original problem into a series of simpler problems and finally obtain the answer. By defining the probability distribution of generating new problems and using the Markov property, ensure the memory - less nature of the reasoning process. - **MCoTInstruct dataset construction**: Extract seed data from the existing multi - step reasoning datasets and expand the dataset through the self - distillation method to improve data coverage and diversity. ### Experimental results - **Accuracy**: The MCoT model performs better than other open - source mathematical solution models on multiple datasets. In particular, on the MATH dataset, MCoT - DeepSeek achieves an accuracy rate of 55.8%, exceeding all 34B and 70B models. - **Efficiency**: Compared with MSR, MCoT significantly improves the reasoning efficiency. Especially when there are more reasoning steps, the average GPU cache usage and decoding time of MCoT are significantly reduced. In summary, this paper effectively solves the efficiency and accuracy bottlenecks encountered by existing reasoning methods when dealing with complex mathematical problems by introducing the MCoT framework, providing a new direction for future research.

Markov Chain of Thought for Efficient Mathematical Reasoning

CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Design of Chain-of-Thought in Math Problem Solving

Supervised Chain of Thought

Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation

The Impact of Reasoning Step Length on Large Language Models

Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models

Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

Chain of Thoughtlessness? An Analysis of CoT in Planning

Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data

Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding

Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

Markovian Transformers for Informative Language Modeling

Towards understanding chain-of-thought prompting: An empirical study of what matters

How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning

Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models

Towards revealing the mystery behind chain of thought: a theoretical perspective