Robust Covariance Estimation for Distributed Principal Component Analysis

Li, Kangqiang,Bao, Han,Zhang, Lixin
DOI: https://doi.org/10.1007/s00184-021-00848-9
IF: 0.96
2022-01-01
Metrika
Abstract:Fan et al. (Ann Stat 47(6):3009–3031, 2019) constructed a distributed principal component analysis (PCA) algorithm to reduce the communication cost between multiple servers significantly. However, their algorithm’s guarantee is only for sub-Gaussian data. Spurred by this deficiency, this paper enhances the effectiveness of their distributed PCA algorithm by utilizing robust covariance matrix estimators of Minsker (Ann Stat 46(6A):2871–2903, 2018) and Ke et al. (Stat Sci 34(3):454–471, 2019) to tame heavy-tailed data. The theoretical results demonstrate that when the sampling distribution is symmetric innovation with the bounded fourth moment or asymmetric with the finite 6th moment, the statistical error rate of the final estimator produced by the robust algorithm is similar to that of sub-Gaussian tails. Extensive numerical trials support the theoretical analysis and indicate that our algorithm is robust to heavy-tailed data and outliers.
What problem does this paper attempt to address?