Abstract:Mathematical reasoning problems are among the most challenging, as they typically require an understanding of fundamental laws to solve. The laws are universal, but the derivation of the final answer changes depending on how a problem is approached. When training large language models (LLMs), learning the capability of generating such multiple solutions is essential to accelerate their use in mathematical education. To this end, we train LLMs using generative flow network (GFlowNet). Different from reward-maximizing reinforcement learning (RL), GFlowNet fine-tuning seeks to find diverse solutions by training the LLM whose distribution is proportional to a reward function. In numerical experiments, we evaluate GFlowNet fine-tuning and reward-maximizing RL in terms of accuracy and diversity. The results show that GFlowNet fine-tuning derives correct final answers from diverse intermediate reasoning steps, indicating the improvement of the capability of alternative solution generation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to train large - language models (LLMs) to generate diverse correct solutions in mathematical reasoning tasks. Specifically, the author focuses on: 1. **Increasing the diversity of solutions**: In mathematics education, encouraging students to find multiple solutions from different perspectives helps improve their problem - solving ability and creativity. Therefore, researchers hope to train LLMs to be able to generate diverse solutions, not just find one correct answer. 2. **Maintaining the correctness of the final answer**: Although the solutions can be diverse, the final answer must be correct. This means that it is necessary to ensure that the model can still obtain the correct final result during the process of generating different solutions. To achieve this goal, the author uses the Generative Flow Network (GFlowNet) to fine - tune LLMs. Different from the traditional reinforcement learning (RL) method based on reward maximization, GFlowNet can generate diverse high - reward sequences by adjusting the probability distribution of the model to be proportional to the reward function. This gives GFlowNet a potential advantage in mathematical reasoning tasks, enabling it to generate diverse solutions while maintaining the correctness of the final answer. ### Research questions The paper mainly explores the following two research questions: - **Can GFlowNet generate diverse solutions in mathematical reasoning tasks?** - **What are the differences in accuracy between GFlowNet and the RL method based on reward maximization?** Through experiments, the author evaluated the performance of GFlowNet in terms of accuracy and diversity and compared it with several RL methods based on reward maximization. The experimental results show that GFlowNet can generate more diverse solutions while maintaining the correctness of the final answer.

GFlowNet Fine-tuning for Diverse Correct Solutions in Mathematical Reasoning Tasks

Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples

Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning

Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking

Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

ReFT: Reasoning with Reinforced Fine-Tuning

HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows

Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions

Enhancing Solution Efficiency in Reinforcement Learning: Leveraging Sub-GFlowNet and Entropy Integration

MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning

GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets

Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards

Improving Large Language Model Fine-tuning for Solving Math Problems

Multi-tool Integration Application for Math Reasoning Using Large Language Model

Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

GFlowNet Training by Policy Gradients

Multi-Fidelity Active Learning with GFlowNets

Better Training of GFlowNets with Local Credit and Incomplete Trajectories

MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

Neuro-Symbolic Data Generation for Math Reasoning