Abstract:Large Language Models (LLMs) have recently showcased remarkable generalizability in various domains. Despite their extensive knowledge, LLMs still face challenges in efficiently utilizing encoded knowledge to develop accurate and logical reasoning processes. To mitigate this problem, we introduced Hint-before-Solving Prompting (HSP), which guides the model to generate hints (e.g., specific knowledge or key ideas) for solving the problem and then generate solutions containing intermediate reasoning steps. Since HSP is orthogonal to prompting methods (e.g., Chain-of-Thought (CoT)), we applied HSP to CoT, Least-to-Most, Plan-and-Solve, and Standard promptings. The results of extensive experiments on 6 reasoning benchmarks and 4 open-source LLMs demonstrate that HSP can effectively improve the accuracy of reasoning tasks: (1) By applying high-quality hint-enhanced HSP to CoT prompting, Llama2-70B-Chat shows an improvement of 9.7. (2) Beyond exploring training-free LLM capabilities, we built the HSPMATH dataset based on HSP and fine-tuned Llemma-7B, reaching 64.3 accuracy, surpassing GPT-3.5 and WizardMath-13B. We make our code and dataset publicly available at \url{https://github.com/jinlanfu/HSP}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how large language models (LLMs) can more effectively utilize their encoded knowledge to improve accuracy and logical reasoning ability when handling complex reasoning tasks. Although LLMs possess extensive knowledge, they still face challenges when solving complex reasoning tasks that require precise application of this knowledge. Specifically, the paper proposes the **Hint - before - Solving Prompting (HSP)** method, aiming to improve the reasoning process of the model by providing hints (such as specific knowledge or key ideas) before solving the problem, guiding the model to generate solutions that include intermediate reasoning steps. ### Main problems and methods 1. **Problem background**: - Although LLMs have shown excellent generalization ability in multiple fields, they still have difficulties in complex reasoning tasks, such as mathematical reasoning and common - sense reasoning. - Existing methods, such as fine - tuning, prompt - based engineering methods, and methods of retrieving knowledge from external knowledge bases, all have limitations. 2. **Proposed method**: - **Hint - before - Solving Prompting (HSP)**: Allows LLMs to automatically generate useful hints before solving problems. These hints can include the knowledge required to solve the problem, key ideas for analyzing the problem, etc. - HSP can be combined with existing prompt methods (such as Chain - of - Thought (CoT), Least - to - Most (LtM), Plan - and - Solve (PS), etc.) to further improve performance. 3. **Research questions**: - **Q1**: Can HSP effectively guide LLMs to generate useful hints independently? - **Q1**: Is HSP still effective when handling tasks that are difficult for LLMs? - **Q3**: If supervised fine - tuning is performed on LLMs on a large - scale HSP prompt data set, what will be its performance? ### Experimental results 1. **Combination of HSP and existing prompt methods**: - HSP shows a significant performance improvement in standard prompts and CoT prompts, but its effect is limited in PS and LtM prompts. - Larger model sizes usually show more significant performance improvements. 2. **Two - stage HSP (HSP2)**: - The performance of HSP and HSP2 is comparable, but HSP brings more stable improvements. - High - quality prompts (such as prompts generated by GPT - 4) can significantly improve the performance of open - source models, even surpassing ChatGPT. 3. **Performance of HSP on difficult tasks**: - On the MATH data set, only larger models (such as Mix - 56B) show a significant performance improvement under CoT + HSP prompts. - By increasing the number of sample paths (n), the enhancement effect of HSP will be more reflected in high - difficulty problems. ### Main contributions 1. It is found that providing hints can enable LLMs to use their encoded knowledge more accurately and effectively. The accuracy of Llama - 2 - Chat - 70B on six data sets has increased by nearly 10%. 2. The HSP prompt method is proposed, and its effectiveness is verified through extensive experiments. 3. An HSPMATH data set containing 75,000 samples is constructed, and supervised fine - tuning is performed on Llemma - 7B, achieving an accuracy rate of 64.3, which exceeds GPT - 3.5 (57.1) and WizardMath - 13B (63.9). Through these methods and experiments, the paper demonstrates the potential of HSP in improving the reasoning ability and accuracy of LLMs.

Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize Encoded Knowledge

Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models

Progressive-Hint Prompting Improves Reasoning in Large Language Models

Give me a hint: Can LLMs take a hint to solve math problems?

Large Language Models are Contrastive Reasoners

Can LLMs plan paths with extra hints from solvers?

LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought

Gap-Filling Prompting Enhances Code-Assisted Mathematical Reasoning

Active prompting with chain-of-thought for large language models

Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving

Boosting Language Models Reasoning with Chain-of-Knowledge Prompting

Hint Marginalization for Improved Reasoning in Large Language Models

MathPrompter: Mathematical Reasoning using Large Language Models

Chain-of-Thought Reasoning Without Prompting

Instance-adaptive Zero-shot Chain-of-Thought Prompting

Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models

Enhancing Mathematical Reasoning in LLMs by Stepwise Correction