Accounting for network noise in graph-guided Bayesian modeling of structured high-dimensional data

Wenrui Li,Changgee Chang,Suprateek Kundu,Qi Long
DOI: https://doi.org/10.1093/biomtc/ujae012
IF: 1.701
2024-01-29
Biometrics
Abstract:Abstract There is a growing body of literature on knowledge-guided statistical learning methods for analysis of structured high-dimensional data (such as genomic and transcriptomic data) that can incorporate knowledge of underlying networks derived from functional genomics and functional proteomics. These methods have been shown to improve variable selection and prediction accuracy and yield more interpretable results. However, these methods typically use graphs extracted from existing databases or rely on subject matter expertise, which are known to be incomplete and may contain false edges. To address this gap, we propose a graph-guided Bayesian modeling framework to account for network noise in regression models involving structured high-dimensional predictors. Specifically, we use 2 sources of network information, including the noisy graph extracted from existing databases and the estimated graph from observed predictors in the dataset at hand, to inform the model for the true underlying network via a latent scale modeling framework. This model is coupled with the Bayesian regression model with structured high-dimensional predictors involving an adaptive structured shrinkage prior. We develop an efficient Markov chain Monte Carlo algorithm for posterior sampling. We demonstrate the advantages of our method over existing methods in simulations, and through analyses of a genomics dataset and another proteomics dataset for Alzheimer’s disease.
statistics & probability,mathematical & computational biology,biology
What problem does this paper attempt to address?
The paper aims to address the issue of network noise in regression modeling of high-dimensional structured data, such as genomics and transcriptomics data. Specifically, existing knowledge-guided statistical learning methods often rely on graphs extracted from existing databases or the knowledge of domain experts, which may be incomplete and contain erroneous edges. Therefore, this paper proposes a graph-guided Bayesian modeling framework to handle network noise in high-dimensional structured predictors in regression models. The authors infer the true underlying network structure by combining two sources of network information: noisy graphs extracted from existing databases and graphs estimated from the predictors observed in the current dataset. Additionally, the method employs adaptive structured shrinkage priors for Bayesian regression modeling and develops an efficient Markov Chain Monte Carlo algorithm for posterior sampling. Through simulation studies and the analysis of genomics and proteomics data related to Alzheimer's disease, the advantages of this method over existing methods are demonstrated.