A family of mixture models for beta valued DNA methylation data

Koyel Majumdar,Romina Silva,Antoinette Sabrina Perry,Ronald William Watson,Andrea Rau,Florence Jaffrezic,Thomas Brendan Murphy,Isobel Claire Gormley
2022-11-04
Abstract:As hypermethylation of promoter cytosine-guanine dinucleotide (CpG) islands has been shown to silence tumour suppressor genes, identifying differentially methylated CpG sites between different samples can assist in understanding disease. Differentially methylated CpG sites (DMCs) can be identified using moderated t-tests or nonparametric tests, but this typically requires the use of data transformations due to a lack of appropriate statistical methods able to adequately account for the bounded nature of DNA methylation data. We propose a family of beta mixture models (BMMs) which use a model-based approach to cluster CpG sites given their original beta-valued methylation data, with no need for transformations. The BMMs allow (i) objective inference of methylation state thresholds and (ii) identification of DMCs between different sample types. The BMMs employ different parameter constraints facilitating application to different study settings. Parameter estimation proceeds via an expectation-maximisation algorithm, with a novel approximation in the maximization step providing tractability and computational feasibility. Performance of BMMs is assessed through thorough simulation studies, and the BMMs are used to analyse a prostate cancer dataset. The BMMs objectively infer intuitive and biologically interpretable methylation state thresholds, and identify DMCs that are related to genes implicated in carcinogenesis and involved in cancer related pathways. An R package betaclust facilitates widespread use of BMMs.
Methodology
What problem does this paper attempt to address?