Abstract:We develop machinery to design efficiently computable and consistent estimators, achieving estimation error approaching zero as the number of observations grows, when facing an oblivious adversary that may corrupt responses in all but an $\alpha$ fraction of the samples. As concrete examples, we investigate two problems: sparse regression and principal component analysis (PCA). For sparse regression, we achieve consistency for optimal sample size $n\gtrsim (k\log d)/\alpha^2$ and optimal error rate $O(\sqrt{(k\log d)/(n\cdot \alpha^2)})$ where $n$ is the number of observations, $d$ is the number of dimensions and $k$ is the sparsity of the parameter vector, allowing the fraction of inliers to be inverse-polynomial in the number of samples. Prior to this work, no estimator was known to be consistent when the fraction of inliers $\alpha$ is $o(1/\log \log n)$, even for (non-spherical) Gaussian design matrices. Results holding under weak design assumptions and in the presence of such general noise have only been shown in dense setting (i.e., general linear regression) very recently by d'Orsi et al. [dNS21]. In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix (usually used in matrix completion). Previous works could obtain non-trivial guarantees only under the assumptions that the measurement noise corresponding to the inliers is polynomially small in $n$ (e.g., Gaussian with variance $1/n^2$). To devise our estimators, we equip the Huber loss with non-smooth regularizers such as the $\ell_1$ norm or the nuclear norm, and extend d'Orsi et al.'s approach [dNS21] in a novel way to analyze the loss function. Our machinery appears to be easily applicable to a wide range of estimation problems.

Refined Complexity of PCA with Outliers

Robust Principal Component Analysis Based on Maximum Correntropy Criterion

Optimal Bound for PCA with Outliers using Higher-Degree Voronoi Diagrams

Robust PCA via Outlier Pursuit

Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA

Self-Paced Probabilistic Principal Component Analysis for Data with Outliers

Dual Principal Component Pursuit: Probability Analysis and Efficient Algorithms

Robust and Sparse Kernel PCA and Its Outlier Map

Exact Recoverability of Robust Pca Via Outlier Pursuit with Tight Recovery Bounds

Ensemble Principal Component Analysis

R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization

Capturing the Denoising Effect of PCA via Compression Ratio

Improved Algorithms for High-Dimensional Robust Pca

Self-paced Principal Component Analysis

R1-PCA

Robust factored principal component analysis for matrix-valued outlier accommodation and detection

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

Sparse Functional Principal Component Analysis in High Dimensions

Robust Principal Component Analysis via Discriminant Sample Weight Learning

Dynamic Principal Subspaces in High Dimensions

Robust Principal Component Analysis: A Median of Means Approach