Robust Principal Component Analysis: A Median of Means Approach

Debolina Paul,Saptarshi Chakraborty,Swagatam Das

2023-07-20

Abstract:Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well-known to fall prey to outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Following the Median of Means (MoM) philosophy, recent supervised learning methods have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. This paper proposes a PCA procedure based on the MoM principle. Called the \textbf{M}edian of \textbf{M}eans \textbf{P}rincipal \textbf{C}omponent \textbf{A}nalysis (MoMPCA), the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the non-asymptotic error bounds of the obtained solution via the aid of the Rademacher complexities while granting absolutely no assumption on the outlying observations. The derived concentration results are not dependent on the dimension because the analysis is conducted in a separable Hilbert space, and the results only depend on the fourth moment of the underlying distribution in the corresponding norm. The proposal's efficacy is also thoroughly showcased through simulations and real data applications.

Machine Learning,Statistics Theory

What problem does this paper attempt to address?

The main problem this paper attempts to address is the poor performance of Principal Component Analysis (PCA) in the presence of outliers. Specifically, traditional PCA methods are susceptible to the influence of outliers, which prevents them from detecting the true low-dimensional structure in the dataset. To solve this problem, the authors propose a PCA method based on the Median of Means (MoM), called MoMPCA. This method is not only computationally efficient but also achieves optimal convergence rates under minimal assumptions. The core contributions of the paper include: 1. Proposing a simple yet efficient framework for robust PCA under the paradigm of the Median of Means. 2. Providing strong theoretical support for finite sample error rates, requiring only the assumption that the data distribution has finite fourth moments. 3. Deriving generalization bounds that are dimension-independent, meaning these error rates are equally applicable to infinite-dimensional Hilbert spaces. 4. Requiring relatively loose conditions on the number of outliers, assuming only that their number is o(N), and making no assumptions about the distribution of outliers, allowing them to be correlated, unbounded, or heavy-tailed. 5. Validating the effectiveness of MoMPCA through experiments on simulated and real datasets, demonstrating its superior performance under various experimental settings.

Robust Principal Component Analysis: A Median of Means Approach

Robust Principal Component Analysis Based on Maximum Correntropy Criterion

Modal Principal Component Analysis

Optimal Mean Robust Principal Component Analysis.

Robust Principal Component Analysis Based On Maximum Correntropy Power Iterations

Robust Principal Component Analysis Via Joint Reconstruction and Projection.

Maximally Correlated Principal Component Analysis

Robust Principal Component Analysis Via Joint ℓ<inf>2,1</inf>-Norms Minimization

Robust Principal Component Analysis via Discriminant Sample Weight Learning

Robust Principal Component Analysis via Joint l(2,1)-Norms Minimization

Avoiding Optimal Mean ℓ 2,1 -Norm Maximization-Based Robust PCA for Reconstruction

Ensemble Principal Component Analysis

Robust Covariance Estimation for Distributed Principal Component Analysis

Max–Min Robust Principal Component Analysis

Normalized Robust PCA With Adaptive Reconstruction Error Minimization

Robust principal component analysis via optimal mean by joint ℓ

Self-paced Principal Component Analysis

Capturing the Denoising Effect of PCA via Compression Ratio

Maximally Correlated Principal Component Analysis Based on Deep Parameterization Learning.

Robust Principal Component Analysis Via Optimal Mean by Joint ℓ2,1 and Schatten P-Norms Minimization

Avoiding Optimal Mean Robust PCA/2DPCA with Non-Greedy L1-norm Maximization

Robust Principal Component Analysis: A Median of Means Approach

Robust Principal Component Analysis Based on Maximum Correntropy Criterion

Modal Principal Component Analysis

Optimal Mean Robust Principal Component Analysis.

Robust Principal Component Analysis Based On Maximum Correntropy Power Iterations

Robust Principal Component Analysis Via Joint Reconstruction and Projection.

Maximally Correlated Principal Component Analysis

Robust Principal Component Analysis Via Joint ℓ&lt;inf&gt;2,1&lt;/inf&gt;-Norms Minimization

Robust Principal Component Analysis via Discriminant Sample Weight Learning

Robust Principal Component Analysis via Joint l(2,1)-Norms Minimization

Avoiding Optimal Mean ℓ 2,1 -Norm Maximization-Based Robust PCA for Reconstruction

Ensemble Principal Component Analysis

Robust Covariance Estimation for Distributed Principal Component Analysis

Max–Min Robust Principal Component Analysis

Normalized Robust PCA With Adaptive Reconstruction Error Minimization

Robust principal component analysis via optimal mean by joint ℓ

Self-paced Principal Component Analysis

Capturing the Denoising Effect of PCA via Compression Ratio

Maximally Correlated Principal Component Analysis Based on Deep Parameterization Learning.

Robust Principal Component Analysis Via Optimal Mean by Joint ℓ2,1 and Schatten P-Norms Minimization

Avoiding Optimal Mean Robust PCA/2DPCA with Non-Greedy L1-norm Maximization

Robust Principal Component Analysis Via Joint ℓ<inf>2,1</inf>-Norms Minimization