Abstract:In clinical research, DNA microarrays are widely applied in the identification of the oncogenes, which are differentially expressed between two clinical states and considered as predictors for the cancer prognosis. Due to the heterogeneity of clinical samples, the differentially expressed genes (DEGs) discovered by current statistical methods or machine learning algorithms involve a number of genes unrelated to the phenotypic differences between the compared samples and, consequently, will impact on the reliability of the predictive models in the cancer prognosis. In our study, we proposed Bayesian nonparametric variable selection algorithm, a stochastic random and hierarchical search method, to separate out the cancer-related genes from the DEG lists. The importance of the genes in the DEG lists can be inferred from the posterior distribution of the predicted clinical endpoints, which can be simulated by the Markov Chain Monte Carlo (MCMC) algorithm. The cancer-related genes were identified according to their importance and used to construct models for the prediction of three clinical endpoints, namely the estrogen receptor status (ER status) of the breast cancer patient, the preoperative treatment response of breast cancer and the overall survival milestone outcome of acute myeloma leukemia (OS of AML). The prediction accuracies of preoperative treatment response, ER status and OS of AML were 86%, 89% and 58%, and the Mathew’s correlation coefficients were 0.42, 0.77 and 0.33, which were higher than those reported in previous studies. Furthermore, most of the genes identified by our method were reported as oncogenes in previous literatures. Our results demonstrated that the Bayesian nonparametric variable selection algorithm proposed in current study can efficiently identify the oncogenes for cancer prognosis and enhance the performance of the predictive models.

Bayesian nonparametric variable selection as an exploratory tool for discovering differentially expressed genes

Bayesian Nonparametric Variable Selection as an Exploratory Tool for Finding Genes that Matter

Rank-based Bayesian variable selection for genome-wide transcriptomic analyses

Bayesian Variable Selection for Probit Mixed Models Applied to Gene Selection

Bayesian variable selection for disease classification using gene expression data

Bayesian Variable Selection in Multinomial Probit Model for Classifying High-Dimensional Data

Bayesian variable selection in linear regression models with instrumental variables

Genome-wide search algorithms for identifying dynamic gene co-expression via Bayesian variable selection

Identifying Oncogenes As Features for Clinical Cancer Prognosis by Bayesian Nonparametric Variable Selection Algorithm

Bayesian Variable Selection with Sparse and Correlation Priors for High-Dimensional Data Analysis

Bayesian variable selection and data integration for biological regulatory networks

Spatial Knockoff Bayesian Variable Selection in Genome-Wide Association Studies

Multivariate Bayesian variable selection with application to multi-trait genetic fine mapping

Non-parametric Bayesian modelling of digital gene expression data

A Bayesian Semiparametric Approach to Learning About Gene-Gene Interactions in Case-Control Studies

NetDiff – Bayesian model selection for differential gene regulatory network inference

Variable selection in Bayesian generalized linear-mixed models: an illustration using candidate gene case-control association studies

Robust Bayesian variable selection for gene-environment interactions

Sparse Bayesian Variable Selection in Multinomial Probit Regression Model with Application to High-Dimensional Data Classification

Is Seeing Believing? A Practitioner's Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies

Bayesian inference with historical data-based informative priors improves detection of differentially expressed genes.