Abstract:Several statistical models for regression of a function $F$ on $\mathbb{R}^d$ without the statistical and computational curse of dimensionality exist, for example by imposing and exploiting geometric assumptions on the distribution of the data (e.g. that its support is low-dimensional), or strong smoothness assumptions on $F$, or a special structure $F$. Among the latter, compositional models assume $F=f\circ g$ with $g$ mapping to $\mathbb{R}^r$ with $r\ll d$, have been studied, and include classical single- and multi-index models and recent works on neural networks. While the case where $g$ is linear is rather well-understood, much less is known when $g$ is nonlinear, and in particular for which $g$'s the curse of dimensionality in estimating $F$, or both $f$ and $g$, may be circumvented. In this paper, we consider a model $F(X):=f(\Pi_\gamma X) $ where $\Pi_\gamma:\mathbb{R}^d\to[0,\rm{len}_\gamma]$ is the closest-point projection onto the parameter of a regular curve $\gamma: [0,\rm{len}_\gamma]\to\mathbb{R}^d$ and $f:[0,\rm{len}_\gamma]\to\mathbb{R}^1$. The input data $X$ is not low-dimensional, far from $\gamma$, conditioned on $\Pi_\gamma(X)$ being well-defined. The distribution of the data, $\gamma$ and $f$ are unknown. This model is a natural nonlinear generalization of the single-index model, which corresponds to $\gamma$ being a line. We propose a nonparametric estimator, based on conditional regression, and show that under suitable assumptions, the strongest of which being that $f$ is coarsely monotone, it can achieve the $one$-$dimensional$ optimal min-max rate for non-parametric regression, up to the level of noise in the observations, and be constructed in time $\mathcal{O}(d^2n\log n)$. All the constants in the learning bounds, in the minimal number of samples required for our bounds to hold, and in the computational complexity are at most low-order polynomials in $d$.

Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples

Representation Learning Beyond Linear Prediction Functions

Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data

Nonlinear Meta-Learning Can Guarantee Faster Rates

Few-Shot Learning via Learning the Representation, Provably

Understanding Dynamics of Nonlinear Representation Learning and Its Application

Dependence Induced Representations

A Statistical Guarantee for Representation Transfer in Multitask Imitation Learning

Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness

A theory of representation learning gives a deep generalisation of kernel methods

Statistical Learning Guarantees for Compressive Clustering and Compressive Mixture Modeling

Conditional regression for the Nonlinear Single-Variable Model

A New Reliable & Parsimonious Learning Strategy Comprising Two Layers of Gaussian Processes, to Address Inhomogeneous Empirical Correlation Structures

Estimating Stochastic Linear Combination of Non-Linear Regressions Efficiently and Scalably

Continual Learning of Nonlinear Independent Representations

Fundamental computational limits of weak learnability in high-dimensional multi-index models

Estimating Stochastic Linear Combination of Non-Linear Regressions

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

Finite sample inference in nonlinear regression estimation

A generalization of regularized dual averaging and its dynamics

Variance-Covariance Regularization Improves Representation Learning