Abstract:Large Language Models (LLMs) have recently showcased remarkable
generalizability in various domains. Despite their extensive knowledge, LLMs
still face challenges in efficiently utilizing encoded knowledge to develop
accurate and logical reasoning processes. To mitigate this problem, we
introduced Hint-before-Solving Prompting (HSP), which guides the model to
generate hints (e.g., specific knowledge or key ideas) for solving the problem
and then generate solutions containing intermediate reasoning steps. Since HSP
is orthogonal to prompting methods (e.g., Chain-of-Thought (CoT)), we applied
HSP to CoT, Least-to-Most, Plan-and-Solve, and Standard promptings. The results
of extensive experiments on 6 reasoning benchmarks and 4 open-source LLMs
demonstrate that HSP can effectively improve the accuracy of reasoning tasks:
(1) By applying high-quality hint-enhanced HSP to CoT prompting,
Llama2-70B-Chat shows an improvement of 9.7. (2) Beyond exploring training-free
LLM capabilities, we built the HSPMATH dataset based on HSP and fine-tuned
Llemma-7B, reaching 64.3 accuracy, surpassing GPT-3.5 and WizardMath-13B. We
make our code and dataset publicly available at
\url{https://github.com/jinlanfu/HSP}.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how large language models (LLMs) can more effectively utilize their encoded knowledge to improve accuracy and logical reasoning ability when handling complex reasoning tasks. Although LLMs possess extensive knowledge, they still face challenges when solving complex reasoning tasks that require precise application of this knowledge. Specifically, the paper proposes the **Hint - before - Solving Prompting (HSP)** method, aiming to improve the reasoning process of the model by providing hints (such as specific knowledge or key ideas) before solving the problem, guiding the model to generate solutions that include intermediate reasoning steps.
### Main problems and methods
1. **Problem background**:
- Although LLMs have shown excellent generalization ability in multiple fields, they still have difficulties in complex reasoning tasks, such as mathematical reasoning and common - sense reasoning.
- Existing methods, such as fine - tuning, prompt - based engineering methods, and methods of retrieving knowledge from external knowledge bases, all have limitations.
2. **Proposed method**:
- **Hint - before - Solving Prompting (HSP)**: Allows LLMs to automatically generate useful hints before solving problems. These hints can include the knowledge required to solve the problem, key ideas for analyzing the problem, etc.
- HSP can be combined with existing prompt methods (such as Chain - of - Thought (CoT), Least - to - Most (LtM), Plan - and - Solve (PS), etc.) to further improve performance.
3. **Research questions**:
- **Q1**: Can HSP effectively guide LLMs to generate useful hints independently?
- **Q1**: Is HSP still effective when handling tasks that are difficult for LLMs?
- **Q3**: If supervised fine - tuning is performed on LLMs on a large - scale HSP prompt data set, what will be its performance?
### Experimental results
1. **Combination of HSP and existing prompt methods**:
- HSP shows a significant performance improvement in standard prompts and CoT prompts, but its effect is limited in PS and LtM prompts.
- Larger model sizes usually show more significant performance improvements.
2. **Two - stage HSP (HSP2)**:
- The performance of HSP and HSP2 is comparable, but HSP brings more stable improvements.
- High - quality prompts (such as prompts generated by GPT - 4) can significantly improve the performance of open - source models, even surpassing ChatGPT.
3. **Performance of HSP on difficult tasks**:
- On the MATH data set, only larger models (such as Mix - 56B) show a significant performance improvement under CoT + HSP prompts.
- By increasing the number of sample paths (n), the enhancement effect of HSP will be more reflected in high - difficulty problems.
### Main contributions
1. It is found that providing hints can enable LLMs to use their encoded knowledge more accurately and effectively. The accuracy of Llama - 2 - Chat - 70B on six data sets has increased by nearly 10%.
2. The HSP prompt method is proposed, and its effectiveness is verified through extensive experiments.
3. An HSPMATH data set containing 75,000 samples is constructed, and supervised fine - tuning is performed on Llemma - 7B, achieving an accuracy rate of 64.3, which exceeds GPT - 3.5 (57.1) and WizardMath - 13B (63.9).
Through these methods and experiments, the paper demonstrates the potential of HSP in improving the reasoning ability and accuracy of LLMs.