Sparse principal component analysis with missing observations

Seyoung Park,Hongyu Zhao
DOI: https://doi.org/10.1214/18-aoas1220
2019-06-01
The Annals of Applied Statistics
Abstract:<strong>Seyoung Park</strong>, <strong>Hongyu Zhao</strong>. <p><br/> Principal component analysis (PCA) is a commonly used statistical method in a wide range of applications. However, it does not work well when the number of features is larger than the sample size. We consider the estimation of the sparse principal subspace in the high dimensional setting with missing data motivated by the analysis of single-cell RNA sequence data. We propose a two step estimation procedure, and establish the rates of convergence for estimating the principal subspace. Simulated examples with various missing mechanisms show its competitive performance compared to existing sparse PCA methods. We apply the method to single-cell data and show that the proposed method can better distinguish cell types than other PCA methods. </p>
What problem does this paper attempt to address?