Bayesian model-based method for clustering gene expression time series with multiple replicates

Elio Nushi,François P. Douillard,Katja Selby,Miia Lindström,Antti Honkela
DOI: https://doi.org/10.1101/2024.05.23.595463
2024-05-29
Abstract:In this study, we introduce a Bayesian model-based method for clustering transcriptomics time series data with multiple replicates. This technique is based on sampling Gaussian processes (GPs) within an infinite mixture model from a Dirichlet process (DP). Our method uses multiple GP models to accommodate for multiple differently behaving experimental replicates within each cluster. We call it multiple models Dirichlet process Gaussian process (MMDPGP). We compare our method with state-of-the-art model-based clustering approaches for handling gene expression time series with multiple replicates. We present a case study where all methods are applied for clustering RNA-Seq time series of with three different experimental replicates. The results obtained from the gene enrichment analysis showed that the number of significantly enriched sets of genes is larger in the clusters produced by MMDPGP. To demonstrate the accuracy of our method we use it to cluster synthetically generated data sets. The clusters produced by our method on the synthetic data had a significantly higher purity score compared to the state-of-the-art approaches. By modelling each replicate with a separate GP, our method can use the natural variability between experimental replicates to learn more about the underlying biology.
Bioinformatics
What problem does this paper attempt to address?