Incorporating graph information in Bayesian factor analysis with robust and adaptive shrinkage priors

Qiyiwen Zhang,Changgee Chang,Li Shen,Qi Long
DOI: https://doi.org/10.1093/biomtc/ujad014
IF: 1.701
2024-01-29
Biometrics
Abstract:ABSTRACT There has been an increasing interest in decomposing high-dimensional multi-omics data into a product of low-rank and sparse matrices for the purpose of dimension reduction and feature engineering. Bayesian factor models achieve such low-dimensional representation of the original data through different sparsity-inducing priors. However, few of these models can efficiently incorporate the information encoded by the biological graphs, which has been already proven to be useful in many analysis tasks. In this work, we propose a Bayesian factor model with novel hierarchical priors, which incorporate the biological graph knowledge as a tool of identifying a group of genes functioning collaboratively. The proposed model therefore enables sparsity within networks by allowing each factor loading to be shrunk adaptively and by considering additional layers to relate individual shrinkage parameters to the underlying graph information, both of which yield a more accurate structure recovery of factor loadings. Further, this new priors overcome the phase transition phenomenon, in contrast to existing graph-incorporated approaches, so that it is robust to noisy edges that are inconsistent with the actual sparsity structure of the factor loadings. Finally, our model can handle both continuous and discrete data types. The proposed method is shown to outperform several existing factor analysis methods through simulation experiments and real data analyses.
statistics & probability,mathematical & computational biology,biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively integrate biological atlas information in factor analysis of high - dimensional multi - omics data to improve the performance of the model in dimension reduction and feature engineering. Specifically, although existing Bayesian factor models can achieve low - dimensional representations through different sparse priors, few models can efficiently utilize known biological network information. These network information has been proven to be very useful in many analysis tasks, especially in identifying a group of genes that work together. Therefore, this paper proposes a new Bayesian factor model. By introducing novel hierarchical priors, this model can use biological atlas knowledge as a tool to identify genomes that work together, and can adaptively shrink each factor loading. At the same time, it considers additional layers to correlate individual shrinkage parameters with underlying graph information, thereby more accurately recovering the factor loading structure. In addition, this model also overcomes the phase - transition phenomenon existing in existing graph embedding methods, making it more robust to inconsistent noisy edges and suitable for continuous and discrete data types.