Partitioning gene expression data by data-driven Markov chain Monte Carlo

E.F. Saraiva,A.K. Suzuki,F. Louzada,L.A. Milan
DOI: https://doi.org/10.1080/02664763.2015.1092113
IF: 1.416
2015-10-09
Journal of Applied Statistics
Abstract:In this paper we introduce a Bayesian mixture model with an unknown number of components for partitioning gene expression data. Inferences about all the unknown parameters involved are made by using the proposed data-driven Markov chain Monte Carlo. This algorithm is essentially a Metropolis–Hastings within Gibbs sampling. The Metropolis–Hastings is performed to change the number of partitions k in the neighborhood and using a pair of split-merge moves. Our strategy for splitting is based on data in which allocation probabilities are calculated based on marginal likelihood function from the previously allocated observations. Conditional on k, the partitions labels are updated via Gibbs sampling. The two main advantages of the proposed algorithm is that it is easy to be implemented and the acceptance probability for split-merge movements depends only on the observed data. We examine the performance of the proposed algorithm on simulated data and then analyze two publicly available gene expression data sets.
statistics & probability
What problem does this paper attempt to address?