scMD facilitates cell type deconvolution using single-cell DNA methylation references

Manqi Cai,Jingtian Zhou,Chris McKennan,Jiebiao Wang
DOI: https://doi.org/10.1038/s42003-023-05690-5
IF: 6.548
2024-01-02
Communications Biology
Abstract:Abstract The proliferation of single-cell RNA-sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell-type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent developments in single-cell DNA methylation (scDNAm), there are emerging opportunities for deconvolving bulk DNAm data, particularly for solid tissues like brain that lack cell-type references. Due to technical limitations, current scDNAm sequences represent a small proportion of the whole genome for each single cell, and those detected regions differ across cells. This makes scDNAm data ultra-high dimensional and ultra-sparse. To deal with these challenges, we introduce scMD (single cell Methylation Deconvolution), a cellular deconvolution framework to reliably estimate cell type fractions from tissue-level DNAm data. To analyze large-scale complex scDNAm data, scMD employs a statistical approach to aggregate scDNAm data at the cell cluster level, identify cell-type marker DNAm sites, and create precise cell-type signature matrixes that surpass state-of-the-art sorted-cell or RNA-derived references. Through thorough benchmarking in several datasets, we demonstrate scMD’s superior performance in estimating cellular fractions from bulk DNAm data. With scMD-estimated cellular fractions, we identify cell type fractions and cell type-specific differentially methylated cytosines associated with Alzheimer’s disease.
biology
What problem does this paper attempt to address?
This paper aims to solve the problem of applying single - cell DNA methylation (scDNAm) data in tissue - level DNA methylation (DNAm) data deconvolution. Specifically, the paper focuses on the following issues: 1. **Technical limitations**: Current scDNAm data can usually only detect a small part (about 5% of CpG sites) of each single - cell genome, and the detected regions vary greatly among different cells. This results in extremely high - dimensional and sparse scDNAm data, posing huge challenges for data analysis. 2. **Limitations of existing methods**: Existing DNAm deconvolution methods mainly rely on rough cell - type references (such as neurons and non - neurons), or use RNA - based references. These methods have deficiencies in accuracy and resolution, especially when dealing with complex tissues (such as the brain). 3. **Requirement for high resolution and accuracy**: In order to more accurately estimate the cell - type proportions in tissue - level DNAm data and infer cell - type - specific changes related to specific diseases (such as Alzheimer's disease), a method that can use scDNAm data to generate high - quality DNAm references needs to be developed. To solve these problems, the authors developed scMD (single cell Methylation Deconvolution), a new cell deconvolution framework that can reliably estimate cell - type proportions from tissue - level DNAm data. scMD aggregates scDNAm data through statistical methods, identifies cell - type - marked CpG sites, and constructs an accurate cell - type feature matrix, thus surpassing existing sorting - cell - or RNA - based reference methods. Through benchmarking on multiple datasets, the authors demonstrated the superior performance of scMD in estimating cell - type proportions, and by using the cell - type proportions estimated by scMD, they discovered cell - type - specific differentially methylated cytosines related to Alzheimer's disease.