Modal Principal Component Analysis

Keishi Sando,Hideitsu Hino
DOI: https://doi.org/10.48550/arXiv.2008.03400
2020-08-08
Abstract:Principal component analysis (PCA) is a widely used method for data processing, such as for dimension reduction and visualization. Standard PCA is known to be sensitive to outliers, and thus, various robust PCA methods have been proposed. It has been shown that the robustness of many statistical methods can be improved using mode estimation instead of mean estimation, because mode estimation is not significantly affected by the presence of outliers. Thus, this study proposes a modal principal component analysis (MPCA), which is a robust PCA method based on mode estimation. The proposed method finds the minor component by estimating the mode of the projected data points. As theoretical contribution, probabilistic convergence property, influence function, finite-sample breakdown point and its lower bound for the proposed MPCA are derived. The experimental results show that the proposed method has advantages over the conventional methods.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the sensitivity of standard principal component analysis (PCA) to outliers. Standard PCA finds a low - dimensional subspace by minimizing the squared residuals, but this method is easily affected by outliers in the data set. To solve this problem, the paper proposes a modal principal component analysis (MPCA) based on pattern estimation, aiming to improve the robustness of the PCA method. ### Main contributions of the paper: 1. **Proposing the MPCA algorithm**: - MPCA is a robust PCA method based on pattern estimation. It finds the minor component by estimating the pattern of the projected data points. - Pattern estimation is not significantly affected by outliers, so MPCA performs more stably when dealing with data containing outliers. 2. **Theoretical contributions**: - **Probability convergence property**: It is proved that the objective function of MPCA converges uniformly in probability to the true probability density function (PDF) under standard regularity conditions. - **Influence function**: The influence function of MPCA is derived, which quantifies the influence of a single outlier on the estimation result and proves that its influence is less than that of standard PCA. - **Finite - sample breakdown point**: A finite - sample breakdown point suitable for principal component estimation is introduced, and its lower bound (LBBP) is derived to evaluate the tolerance of the estimator to outliers. 3. **Experimental verification**: - Through experiments on artificial data sets and real data sets, the performance of MPCA under different noise levels and sample numbers is verified. - The experimental results show that MPCA can still maintain good performance at high noise levels, especially in Laplace - distributed data. ### Specific problem analysis: - **Outlier sensitivity**: Standard PCA uses squared residuals as the objective function, which makes it very sensitive to outliers. MPCA improves the robustness to outliers by using pattern estimation instead of mean estimation. - **Theoretical analysis**: The paper provides a detailed theoretical analysis, including probability convergence properties, influence functions, and finite - sample breakdown points. These theoretical results provide a solid theoretical basis for the robustness of MPCA. - **Experimental results**: The experimental part shows the performance of MPCA on multiple data sets, especially in comparison with traditional PCA and other robust PCA methods, further verifying the effectiveness and superiority of MPCA. ### Conclusion: The paper proposes a modal principal component analysis (MPCA) based on pattern estimation, which effectively solves the problem of standard PCA being sensitive to outliers. Through theoretical analysis and experimental verification, it is proved that MPCA has better robustness and stability when dealing with data containing outliers.