Abstract:A driving force behind the diverse applicability of modern machine learning is the ability to extract meaningful features across many sources. However, many practical domains involve data that are non-identically distributed across sources, and statistically dependent within its source, violating vital assumptions in existing theoretical studies. Toward addressing these issues, we establish statistical guarantees for learning general $\textit{nonlinear}$ representations from multiple data sources that admit different input distributions and possibly dependent data. Specifically, we study the sample-complexity of learning $T+1$ functions $f_\star^{(t)} \circ g_\star$ from a function class $\mathcal F \times \mathcal G$, where $f_\star^{(t)}$ are task specific linear functions and $g_\star$ is a shared nonlinear representation. A representation $\hat g$ is estimated using $N$ samples from each of $T$ source tasks, and a fine-tuning function $\hat f^{(0)}$ is fit using $N'$ samples from a target task passed through $\hat g$. We show that when $N \gtrsim C_{\mathrm{dep}} (\mathrm{dim}(\mathcal F) + \mathrm{C}(\mathcal G)/T)$, the excess risk of $\hat f^{(0)} \circ \hat g$ on the target task decays as $\nu_{\mathrm{div}} \big(\frac{\mathrm{dim}(\mathcal F)}{N'} + \frac{\mathrm{C}(\mathcal G)}{N T} \big)$, where $C_{\mathrm{dep}}$ denotes the effect of data dependency, $\nu_{\mathrm{div}}$ denotes an (estimatable) measure of $\textit{task-diversity}$ between the source and target tasks, and $\mathrm C(\mathcal G)$ denotes the complexity of the representation class $\mathcal G$. In particular, our analysis reveals: as the number of tasks $T$ increases, both the sample requirement and risk bound converge to that of $r$-dimensional regression as if $g_\star$ had been given, and the effect of dependency only enters the sample requirement, leaving the risk bound matching the iid setting.

Few-Shot Learning via Learning the Representation, Provably

Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples

Representation Learning Beyond Linear Prediction Functions

Multi-Task Imitation Learning for Linear Dynamical Systems

A Statistical Guarantee for Representation Transfer in Multitask Imitation Learning

Few-Shot Class-Incremental Learning Via Feature Space Composition

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data

On the Power of Multitask Representation Learning in Linear MDP

Cooperative Density-Aware Representation Learning for Few-Shot Visual Recognition

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

Collaborative Learning with Shared Linear Representations: Statistical Rates and Optimal Algorithms

Transductive Episodic-Wise Adaptive Metric for Few-Shot Learning

Few-shot Learning Via Model Composition

Provable benefits of representation learning

Active Representation Learning for General Task Space with Applications in Robotics

Improved Active Multi-Task Representation Learning via Lasso

Ensemble Transductive Propagation Network for Semi-Supervised Few-Shot Learning

Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness

Few-shot learning with representative global prototype

Few-Shot Learning via Embedding Adaptation With Set-to-Set Functions