Abstract:Classic supervised learning involves algorithms trained on $n$ labeled examples to produce a hypothesis $h \in \mathcal{H}$ aimed at performing well on unseen examples. Meta-learning extends this by training across $n$ tasks, with $m$ examples per task, producing a hypothesis class $\mathcal{H}$ within some meta-class $\mathbb{H}$. This setting applies to many modern problems such as in-context learning, hypernetworks, and learning-to-learn. A common method for evaluating the performance of supervised learning algorithms is through their learning curve, which depicts the expected error as a function of the number of training examples. In meta-learning, the learning curve becomes a two-dimensional learning surface, which evaluates the expected error on unseen domains for varying values of $n$ (number of tasks) and $m$ (number of training examples). Our findings characterize the distribution-free learning surfaces of meta-Empirical Risk Minimizers when either $m$ or $n$ tend to infinity: we show that the number of tasks must increase inversely with the desired error. In contrast, we show that the number of examples exhibits very different behavior: it satisfies a dichotomy where every meta-class conforms to one of the following conditions: (i) either $m$ must grow inversely with the error, or (ii) a \emph{finite} number of examples per task suffices for the error to vanish as $n$ goes to infinity. This finding illustrates and characterizes cases in which a small number of examples per task is sufficient for successful learning. We further refine this for positive values of $\varepsilon$ and identify for each $\varepsilon$ how many examples per task are needed to achieve an error of $\varepsilon$ in the limit as the number of tasks $n$ goes to infinity. We achieve this by developing a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper mainly studies the application of the Empirical Risk Minimization (ERM) principle in meta - learning, especially the learning performance when the number of tasks $n$ and the number of samples per task $m$ tend to infinity. Specifically, the authors focus on how to select appropriate hypothesis classes through the ERM principle in the meta - learning framework and analyze their performance. #### Specific problem description 1. **The ERM principle in meta - learning**: - The paper explores the effect of using the ERM principle in meta - learning, especially the performance when the number of tasks and the number of samples change. - The authors introduce a new concept - "learning surface", which is a two - dimensional function that describes the expected error of the algorithm under different numbers of tasks $n$ and numbers of samples per task $m$. 2. **The influence of the number of tasks and the number of samples**: - The research finds that the number of tasks $n$ must increase as the expected error decreases, that is, $n$ needs to grow in reverse to meet the error requirements. - For the number of samples $m$ per task, there are two cases: - Case 1: $m$ must grow in reverse to meet the error requirements. - Case 2: A finite number of samples is sufficient to make the error disappear when $n$ tends to infinity. 3. **Theoretical results**: - The paper provides upper and lower bound estimates of the ERM principle in meta - learning, especially the behavior when the number of tasks and the number of samples tend to infinity. - The authors also propose a necessary and sufficient condition for determining which meta - hypothesis classes can achieve zero error in the case of a finite number of samples. 4. **Practical applications**: - These theoretical results are helpful for understanding how to design effective meta - learning algorithms in practical applications, such as transfer learning, few - shot learning and other scenarios. - Especially for classification problems in high - dimensional spaces, the authors show that in some cases, a small number of samples are sufficient to achieve perfect meta - learning. #### Summary of mathematical formulas - **Learning surface**: \[ \varepsilon_{\text{ERM}}(n, m)=\sup_{Q \in \mathcal{RE}(H)} \mathbb{E}_{S \sim Q(n, m)}\left[\sup_{H: L_S(H) = 0} L_Q(H)\right] \] - **Projection of the number of tasks**: \[ \varepsilon_{\text{ERM}}^{\text{dom}}(n)=\lim_{m \to \infty} \varepsilon_{\text{ERM}}(n, m) \] - **Projection of the number of samples**: \[ \varepsilon_{\text{ERM}}^{\text{exp}}(m)=\lim_{n \to \infty} \varepsilon_{\text{ERM}}(n, m) \] - **Optimal error function**: \[ \varepsilon_H(m):=\inf \left\{\varepsilon \in [0, 1] \mid m_H(\varepsilon) \leq m\right\} \] These formulas help quantify the influence of the number of tasks and the number of samples on the learning performance in meta - learning, providing a theoretical basis for designing more efficient meta - learning algorithms.

On the ERM Principle in Meta-Learning

Universal Rates of Empirical Risk Minimization

Transfer Meta-Learning: Information-Theoretic Bounds and Information Meta-Risk Minimization

Curriculum in Gradient-Based Meta-Reinforcement Learning

Meta Reinforcement Learning with Finite Training Tasks -- a Density Estimation Approach

Nonlinear Meta-Learning Can Guarantee Faster Rates

Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning

Meta-Learning Requires Meta-Augmentation

Adaptive Gradient-Based Meta-Learning Methods

Rethinking Meta-Learning from a Learning Lens

Meta-Learning Adversarial Bandit Algorithms

Conditional Mutual Information-Based Generalization Bound for Meta Learning

Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation

Meta-Learning Loss Functions for Deep Neural Networks

Meta-Learning and representation learner: A short theoretical note

How Sensitive are Meta-Learners to Dataset Imbalance?

Meta-Learning in Neural Networks: A Survey

Generating meta-learning tasks to evolve parametric loss for classification learning

Meta-Learning with Neural Tangent Kernels

Unsupervised Meta-Learning for Reinforcement Learning

Any-Way Meta Learning