Abstract:Large language models have demonstrated surprising ability to perform in-context learning, i.e., these models can be directly applied to solve numerous downstream tasks by conditioning on a prompt constructed by a few input-output examples. However, prior research has shown that in-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats. Therefore, the construction of an appropriate prompt is essential for improving the performance of in-context learning. In this paper, we revisit this problem from the view of predictive bias. Specifically, we introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes. Then we empirically show that prompts with higher bias always lead to unsatisfactory predictive quality. Based on this observation, we propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning. We perform comprehensive experiments with state-of-the-art mainstream models such as GPT-3 on various downstream tasks. Our results indicate that our method can enhance the model's in-context learning performance in an effective and interpretable manner.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the high - instability problem in few - shot prompting caused by the variation in the selection, order, and prompt format of training examples. Specifically, the authors re - examined this problem from the perspective of predictive bias and proposed a new method for evaluating prompt quality and a strategy for optimizing prompt construction. ### Core Problems of the Paper 1. **High Instability**: Existing few - shot prompting learning methods are unstable under different training examples, example orders, and prompt formats. 2. **Importance of Prompt Construction**: Constructing appropriate prompts is crucial for improving the performance of few - shot prompting learning. 3. **Impact of Predictive Bias**: The authors found that the intrinsic predictive bias of prompts has a significant impact on the model's prediction performance. ### Solutions 1. **Predictive Bias Evaluation**: - A metric based on predictive bias was introduced to evaluate the quality of prompts. - By applying the prompt to "content - independent" inputs, the model is expected to output a uniform prediction distribution, thereby measuring the bias of the prompt. 2. **Optimization Strategies**: - **T - fair - Prompting**: First, calculate the bias of each individual example, and then select the top \( k \) examples with the smallest bias to construct the prompt. This method has a time complexity of \( O(N) \), but its performance may be unstable. - **G - fair - Prompting**: Adopt a greedy search strategy, gradually select examples that can maximize fairness until no further improvement in fairness can be achieved. This method has a time complexity of \( O(N^2) \), but has better performance. ### Experimental Verification - **Experimental Setup**: Experiments were carried out using multiple large - language models (such as BLOOM and LLaMA) on different downstream tasks. - **Results**: The experimental results show that the G - fair - Prompting method significantly outperforms other methods in most cases, especially on the TREC dataset, with a performance improvement of more than 10%. ### Main Contributions 1. **Introduce predictive bias as an indicator for evaluating prompt quality**, and prove the effectiveness of this indicator through experiments. 2. **Propose two efficient prompt optimization strategies**, namely T - fair - Prompting and G - fair - Prompting, which can significantly improve the few - shot learning performance of the model while maintaining a relatively low computational cost. ### Formulas - **Predictive Bias Metric**: \[ \text{fair}(\rho)=-\sum_{y \in Y} p(y | \rho \oplus \eta) \log p(y | \rho \oplus \eta) \] where \( \rho \) is the prompt, \( \eta \) is the content - independent input, and \( p(y | \rho \oplus \eta) \) is the prediction probability distribution of the model given the prompt and content - independent input. - **Insertion Condition for G - fair - Prompting**: \[ \arg \max_{x_i \in S'}\text{fair}(\Gamma(x_i, y_i) \oplus \rho)\quad\text{s.t.}\quad\text{fair}(\Gamma(x_i, y_i) \oplus \rho)>\text{fair}(\rho) \] Through these methods, the paper effectively solves the high - instability problem in few - shot prompting learning and provides new ideas for constructing high - quality prompts.

Fairness-guided Few-shot Prompting for Large Language Models

BayesPrompt: Prompting Large-Scale Pre-Trained Language Models on Few-shot Inference via Debiased Domain Abstraction

Metacognition-Enhanced Few-Shot Prompting With Positive Reinforcement

Prompt-Based Bias Calibration for Better Zero/Few-Shot Learning of Language Models

Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

Boosted Prompt Ensembles for Large Language Models

Deconstructing In-Context Learning: Understanding Prompts via Corruption

Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment

Are Large Language Models Good Prompt Optimizers?

Unified Prompt Learning Makes Pre-Trained Language Models Better Few-Shot Learners

Towards Informative Few-Shot Prompt with Maximum Information Gain for In-Context Learning

Instance-aware Prompt Learning for Language Understanding and Generation

Prompt Space Optimizing Few-shot Reasoning Success with Large Language Models

Exploring Lottery Prompts for Pre-trained Language Models

Context-faithful Prompting for Large Language Models

Prompt Optimization in Large Language Models

Prompting Large Language Model for Machine Translation: A Case Study

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity

Structured Prompting: Scaling In-Context Learning to 1,000 Examples

Towards Generalist Prompting for Large Language Models by Mental Models

Efficient Prompting Methods for Large Language Models: A Survey