Abstract:Although there exist plentiful theories of empirical risk minimization (ERM) for supervised learning, current theoretical understandings of ERM for a related problem---stochastic convex optimization (SCO), are limited. In this work, we strengthen the realm of ERM for SCO by exploiting smoothness and strong convexity conditions to improve the risk bounds. First, we establish an $\widetilde{O}(d/n + \sqrt{F_*/n})$ risk bound when the random function is nonnegative, convex and smooth, and the expected function is Lipschitz continuous, where $d$ is the dimensionality of the problem, $n$ is the number of samples, and $F_*$ is the minimal risk. Thus, when $F_*$ is small we obtain an $\widetilde{O}(d/n)$ risk bound, which is analogous to the $\widetilde{O}(1/n)$ optimistic rate of ERM for supervised learning. Second, if the objective function is also $\lambda$-strongly convex, we prove an $\widetilde{O}(d/n + \kappa F_*/n )$ risk bound where $\kappa$ is the condition number, and improve it to $O(1/[\lambda n^2] + \kappa F_*/n)$ when $n=\widetilde{\Omega}(\kappa d)$. As a result, we obtain an $O(\kappa/n^2)$ risk bound under the condition that $n$ is large and $F_*$ is small, which to the best of our knowledge, is the first $O(1/n^2)$-type of risk bound of ERM. Third, we stress that the above results are established in a unified framework, which allows us to derive new risk bounds under weaker conditions, e.g., without convexity of the random function and Lipschitz continuity of the expected function. Finally, we demonstrate that to achieve an $O(1/[\lambda n^2] + \kappa F_*/n)$ risk bound for supervised learning, the $\widetilde{\Omega}(\kappa d)$ requirement on $n$ can be replaced with $\Omega(\kappa^2)$, which is dimensionality-independent.

ERM and RERM are optimal estimators for regression problems when malicious outliers corrupt the labels

On the robustness of minimum norm interpolators and regularized empirical risk minimizers

Empirical Risk Minimization with Relative Entropy Regularization

Universal Robust Regression via Maximum Mean Discrepancy

Optimal Excess Risk Bounds for Empirical Risk Minimization on $p$-Norm Linear Regression

Outlier-Bias Removal with Alpha Divergence: A Robust Non-Convex Estimator for Linear Regression

A Robust Learning Approach for Regression Models Based on Distributionally Robust Optimization.

Optimal Robust Estimation under Local and Global Corruptions: Stronger Adversary and Smaller Error

Outlier-robust sparse/low-rank least-squares regression and robust matrix completion

A Robust Learning Algorithm for Regression Models Using Distributionally Robust Optimization under the Wasserstein Metric

On the Performance of Empirical Risk Minimization with Smoothed Data

Error Risk Minimization

Optimal Rates for Robust Stochastic Convex Optimization

Robustness of Maximum Correntropy Estimation Against Large Outliers

Outlier-robust additive matrix decomposition

Error Density-dependent Empirical Risk Minimization

Outlier-Robust Training of Machine Learning Models

Large Dimensional Analysis of Robust M-Estimators of Covariance with Outliers

Trustworthy Regularized Huber Regression for Outlier Detection

Empirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-Type of Risk Bounds.