Huan Ma,Changqing Zhang,Yatao Bian,Lemao Liu,Zhirui Zhang,Peilin Zhao,Shu Zhang,Huazhu Fu,Qinghua Hu,Bingzhe Wu
Abstract:Large language models have demonstrated surprising ability to perform in-context learning, i.e., these models can be directly applied to solve numerous downstream tasks by conditioning on a prompt constructed by a few input-output examples. However, prior research has shown that in-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats. Therefore, the construction of an appropriate prompt is essential for improving the performance of in-context learning. In this paper, we revisit this problem from the view of predictive bias. Specifically, we introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes. Then we empirically show that prompts with higher bias always lead to unsatisfactory predictive quality. Based on this observation, we propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning. We perform comprehensive experiments with state-of-the-art mainstream models such as GPT-3 on various downstream tasks. Our results indicate that our method can enhance the model's in-context learning performance in an effective and interpretable manner.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the high - instability problem in few - shot prompting caused by the variation in the selection, order, and prompt format of training examples. Specifically, the authors re - examined this problem from the perspective of predictive bias and proposed a new method for evaluating prompt quality and a strategy for optimizing prompt construction.
### Core Problems of the Paper
1. **High Instability**: Existing few - shot prompting learning methods are unstable under different training examples, example orders, and prompt formats.
2. **Importance of Prompt Construction**: Constructing appropriate prompts is crucial for improving the performance of few - shot prompting learning.
3. **Impact of Predictive Bias**: The authors found that the intrinsic predictive bias of prompts has a significant impact on the model's prediction performance.
### Solutions
1. **Predictive Bias Evaluation**:
- A metric based on predictive bias was introduced to evaluate the quality of prompts.
- By applying the prompt to "content - independent" inputs, the model is expected to output a uniform prediction distribution, thereby measuring the bias of the prompt.
2. **Optimization Strategies**:
- **T - fair - Prompting**: First, calculate the bias of each individual example, and then select the top \( k \) examples with the smallest bias to construct the prompt. This method has a time complexity of \( O(N) \), but its performance may be unstable.
- **G - fair - Prompting**: Adopt a greedy search strategy, gradually select examples that can maximize fairness until no further improvement in fairness can be achieved. This method has a time complexity of \( O(N^2) \), but has better performance.
### Experimental Verification
- **Experimental Setup**: Experiments were carried out using multiple large - language models (such as BLOOM and LLaMA) on different downstream tasks.
- **Results**: The experimental results show that the G - fair - Prompting method significantly outperforms other methods in most cases, especially on the TREC dataset, with a performance improvement of more than 10%.
### Main Contributions
1. **Introduce predictive bias as an indicator for evaluating prompt quality**, and prove the effectiveness of this indicator through experiments.
2. **Propose two efficient prompt optimization strategies**, namely T - fair - Prompting and G - fair - Prompting, which can significantly improve the few - shot learning performance of the model while maintaining a relatively low computational cost.
### Formulas
- **Predictive Bias Metric**:
\[
\text{fair}(\rho)=-\sum_{y \in Y} p(y | \rho \oplus \eta) \log p(y | \rho \oplus \eta)
\]
where \( \rho \) is the prompt, \( \eta \) is the content - independent input, and \( p(y | \rho \oplus \eta) \) is the prediction probability distribution of the model given the prompt and content - independent input.
- **Insertion Condition for G - fair - Prompting**:
\[
\arg \max_{x_i \in S'}\text{fair}(\Gamma(x_i, y_i) \oplus \rho)\quad\text{s.t.}\quad\text{fair}(\Gamma(x_i, y_i) \oplus \rho)>\text{fair}(\rho)
\]
Through these methods, the paper effectively solves the high - instability problem in few - shot prompting learning and provides new ideas for constructing high - quality prompts.