Discovery and optimization of cell-type-specific DNA methylation markers for in silico deconvolution

Aleksa Krsmanovic
2023-04-12
Abstract:DNA methylation is a significant driver of cell-type heterogeneity and has been implicated in various regulatory processes ranging from cell differentiation to imprinting. As the methyl group is embedded in the DNA molecule, assessing DNA methylation is particularly promising in liquid-biopsy-based approaches, as cell-free DNA retains information related to its cell of origin. In this work, I leverage a recently profiled collection of cell-sorted whole genome bisulfite profiles of 44 healthy cell types. The high quality and purity of such data provide an ideal basis for discovering and characterizing discriminative DNA methylation regions that could serve as a reference for computational deconvolution. First, I characterize differentially methylated regions between every pair of cell types, obtaining a meaningful measure of divergence. Pairwise differences were then aggregated to identify a set of uniquely (de)methylated regions (UMRs) for each cell type. Identified UMRs are predominantly hypomethylated and their numbers vary significantly across cell types. They are mostly located in enhancer regions and strongly support cell-type-specific characteristics. As mapping onto UMRs has proven unsuitable for deconvolution, I developed a novel approach utilizing the set cover algorithm to select discriminative regions for this purpose. Based on these regions, deconvolution was performed in two distinct approaches: a beta-value-based and a read-level one. Both approaches outperform an existing deconvolution software modeled on the same data 3-fold in terms of total deconvolution error. Surprisingly, the beta-based approach slightly outperformed the read-level one. Overall, I present an adaptable, end-to-end software solution (methylcover) for obtaining accurate cell type deconvolution, with possible future applications to non-invasive assays for disease detection and monitoring.
Quantitative Methods,Genomics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to analyze the relative abundances of different cell types through DNA methylation characteristics in complex tissue samples, that is, cell - type deconvolution. Specifically, the research objectives are divided into two main parts: 1. **Identifying unique DNA methylation markers for specific cell types**: By analyzing the whole - genome bisulfite sequencing data of 44 healthy cell types, the differentially methylated regions (DMRs) that can uniquely characterize these cell types are identified. These regions show significant methylation differences among different cell types and can be used as cell - type - specific markers. 2. **Optimizing the marker set to improve the accuracy of deconvolution results**: Based on the identified unique methylation markers, a new algorithm is developed to select the most discriminative regions for accurately analyzing the relative proportions of each cell type from the DNA methylation profiles of mixed tissue samples. The study compared the performance of two deconvolution methods: the beta - value - based method and the read - based method, and found that the former is approximately three times better than the latter in terms of the total deconvolution error. Through these studies, the authors proposed a highly adaptable, end - to - end software solution (methylcover) for obtaining accurate cell - type deconvolution from mixed samples, which may be applied to non - invasive disease detection and monitoring in the future.