Abstract:Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels. However, this paradigm is limited by the availability of gold labels, while in certain scenarios, LLMs may need to perform tasks that are too complex for humans to provide such labels. To tackle this challenge, this study explores whether solely utilizing unlabeled data can elicit strong model capabilities. We propose a new paradigm termed zero-to-strong generalization. We iteratively prompt LLMs to annotate unlabeled data and retain high-quality labels by filtering. Surprisingly, we obverse that this iterative process gradually unlocks LLMs' potential on downstream tasks. Our experiments on extensive classification and reasoning tasks confirm the effectiveness of our proposed framework. Our analysis indicates that this paradigm is effective for both in-context learning and fine-tuning, and for various model sizes.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily explores how to leverage the capabilities of large language models (LLMs) to accomplish complex tasks without gold standard labels. Specifically, the paper proposes a new paradigm called "zero-to-strong generalization," which iteratively prompts and filters data to gradually unleash the powerful capabilities of LLMs. #### Main Issues: 1. **Limitations of existing paradigms**: Existing methods of supervised fine-tuning or context-based learning with gold standard labels require a large number of gold standard labels, which may be difficult or impossible to obtain in some scenarios. 2. **Limitations of weak supervision**: While weak-to-strong generalization can guide strong models through weak supervision models, this approach is still limited by the capabilities of the weak supervision models and may not have available weak supervision models in some cases. #### Solution: - **Propose a zero-to-strong generalization framework**: This framework does not require gold standard labels or weak supervision models. Instead, it initializes the model with random or invalid examples and then iteratively selects high-confidence samples as new demonstration samples to gradually improve performance. - **Experimental validation**: The authors conducted extensive experiments on multiple classification tasks, extreme label classification tasks, and reasoning tasks to demonstrate the effectiveness of this framework. Additionally, this method is applicable not only to context learning but also to fine-tuning and is effective for larger-scale models as well. #### Core Contributions: - Proposed a simple and effective zero-to-strong generalization framework. - Demonstrated the effectiveness of this framework across various tasks. - Analyzed the reasons for the effectiveness of zero-to-strong generalization and found that its advantages lie in stronger models and more complex tasks.

Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels

Unveiling the Generalization Power of Fine-Tuned Large Language Models

Supervised Knowledge Makes Large Language Models Better In-context Learners

Large Language Models are Strong Zero-Shot Retriever

Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost

Large Language Models are Zero-Shot Reasoners

Non-Vacuous Generalization Bounds for Large Language Models

Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM

Towards Zero-Label Language Learning

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models

Dial-insight: Fine-tuning Large Language Models with High-Quality Domain-Specific Data Preventing Capability Collapse

Large Language Models Are Zero-Shot Text Classifiers

A transfer learning framework for weak-to-strong generalization

LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models

Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection

Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning

Weak-to-Strong Reasoning

Law of the Weakest Link: Cross Capabilities of Large Language Models

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Large Language Models Can Self-Improve in Long-context Reasoning