Addressing dispersion in mis‐measured multivariate binomial outcomes: A novel statistical approach for detecting differentially methylated regions in bisulfite sequencing data

Kaiqiong Zhao,Karim Oualkacha,Yixiao Zeng,Cathy Shen,Kathleen Klein,Lajmi Lakhal‐Chaieb,Aurélie Labbe,Tomi Pastinen,Marie Hudson,Inés Colmegna,Sasha Bernatsky,Celia M. T. Greenwood
DOI: https://doi.org/10.1002/sim.10149
2024-06-27
Statistics in Medicine
Abstract:Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra‐parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non‐constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi‐binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace‐approximated quasi‐likelihood of our model, we further develop a specialized two‐stage expectation‐maximization (EM) algorithm, where a plug‐in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non‐zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti‐citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA‐related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS."
public, environmental & occupational health,medicine, research & experimental,medical informatics,mathematical & computational biology,statistics & probability
What problem does this paper attempt to address?