Adaptive gPCA: A method for structured dimensionality reduction

Julia Fukuyama

DOI: https://doi.org/10.48550/arXiv.1702.00501

2017-02-02

Abstract:When working with large biological data sets, exploratory analysis is an important first step for understanding the latent structure and for generating hypotheses to be tested in subsequent analyses. However, when the number of variables is large compared to the number of samples, standard methods such as principal components analysis give results which are unstable and difficult to interpret. To mitigate these problems, we have developed a method which allows the analyst to incorporate side information about the relationships between the variables in a way that encourages similar variables to have similar loadings on the principal axes. This leads to a low-dimensional representation of the samples which both describes the latent structure and which has axes which are interpretable in terms of groups of closely related variables. The method is derived by putting a prior encoding the relationships between the variables on the data and following through the analysis on the posterior distributions of the samples. We show that our method does well at reconstructing true latent structure in simulated data and we also demonstrate the method on a dataset investigating the effects of antibiotics on the composition of bacteria in the human gut.

Methodology,Applications

What problem does this paper attempt to address?

This paper attempts to solve the problems encountered in exploratory analysis when dealing with large - scale biological data sets. When the number of variables is much larger than the number of samples, the results produced by the traditional principal component analysis (PCA) method are unstable and difficult to interpret. To alleviate these problems, the authors developed a new method - adaptive generalized principal component analysis (adaptive gPCA), which allows analysts to incorporate external information about the relationships between variables in a way that encourages similar variables to have similar principal axis loadings. This results in a low - dimensional representation of the samples, which not only describes the underlying structure, but also whose principal axes can be interpreted according to closely related variable groups. Specifically, this method is achieved by placing a prior encoding the relationships between variables on the data and analyzing the samples based on the posterior distribution. This method has shown a good ability to reconstruct the true underlying structure in simulated data, and has also been verified in its application to a data set on the impact of antibiotics on the composition of human gut bacteria. In this way, adaptive gPCA provides a more flexible method that can adjust the coarseness and fineness of the analysis and can better interpret the principal axes, thus helping to generate hypotheses and understand the biological basis in the data.

Adaptive gPCA: A method for structured dimensionality reduction

Adaptive Functional Principal Component Analysis

Probabilistic PCA of Censored Data: Accounting for Uncertainties in the Visualization of High-Throughput Single-Cell Qpcr Data.

PLPCA: Persistent Laplacian-Enhanced PCA for Microarray Data Analysis

Identifying patterns differing between high-dimensional datasets with generalized contrastive PCA

Generalized probabilistic principal component analysis of correlated data

PLPCA: Persistent Laplacian Enhanced-PCA for Microarray Data Analysis

Robust generalized PCA for enhancing discriminability and recoverability

GraphPCA: a fast and interpretable dimension reduction algorithm for spatial transcriptomics data

Integrated Principal Components Analysis

Principal component analysis: a review and recent developments

Functional PCA With Covariate-Dependent Mean and Covariance Structure

Sparse and Functional Principal Components Analysis

Normalized Robust PCA With Adaptive Reconstruction Error Minimization

Ensemble Principal Component Analysis

Diagonally-Dominant Principal Component Analysis

Robust PCA for High Dimensional Data based on Characteristic Transformation

PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning

Deterministic parallel analysis: An improved method for selecting factors and principal components

Hierarchical disjoint principal component analysis

Sparse Functional Principal Component Analysis in High Dimensions