Abstract:In the era of big data, reducing data dimensionality is critical in many areas of science. Widely used Principal Component Analysis (PCA) addresses this problem by computing a low dimensional data embedding that maximally explain variance of the data. However, PCA has two major weaknesses. Firstly, it only considers linear correlations among variables (features), and secondly it is not suitable for categorical data. We resolve these issues by proposing Maximally Correlated Principal Component Analysis (MCPCA). MCPCA computes transformations of variables whose covariance matrix has the largest Ky Fan norm. Variable transformations are unknown, can be nonlinear and are computed in an optimization. MCPCA can also be viewed as a multivariate extension of Maximal Correlation. For jointly Gaussian variables we show that the covariance matrix corresponding to the identity (or the negative of the identity) transformations majorizes covariance matrices of non-identity functions. Using this result we characterize global MCPCA optimizers for nonlinear functions of jointly Gaussian variables for every rank constraint. For categorical variables we characterize global MCPCA optimizers for the rank one constraint based on the leading eigenvector of a matrix computed using pairwise joint distributions. For a general rank constraint we propose a block coordinate descend algorithm and show its convergence to stationary points of the MCPCA optimization. We compare MCPCA with PCA and other state-of-the-art dimensionality reduction methods including Isomap, LLE, multilayer autoencoders (neural networks), kernel PCA, probabilistic PCA and diffusion maps on several synthetic and real datasets. We show that MCPCA consistently provides improved performance compared to other methods.

Single-Pass PCA of Large High-Dimensional Data

Fast Randomized PCA for Sparse Data

A Covariance-Free Iterative Principal Component Analysis for High Dimensional and Large Scale Data

A Fast Data-Oriented Algorithm for Principal Component Analysis

An Exploration of the Application of Principal Component Analysis in Big Data Processing

Efficient Sparse PCA via Block-Diagonalization

Dynamic Principal Subspaces in High Dimensions

A Selective Overview of Sparse Principal Component Analysis

Sparse principal component analysis via regularized low rank matrix approximation

Improved Algorithms for High-Dimensional Robust Pca

Self-paced Principal Component Analysis

Dynamic Principal Component Analysis in High Dimensions

Maximally Correlated Principal Component Analysis

Sparse Principal Component Analysis

Large-Scale Sparse Principal Component Analysis with Application to Text Data

Diagonally-Dominant Principal Component Analysis

Sparse Functional Principal Component Analysis in High Dimensions

Sparse Principal Component Analysis via Variable Projection

Dynamic Principal Subspaces with Sparsity in High Dimensions