Multi-distance based spectral embedding fusion for clustering single-cell methylation data

Jianxiao Zou,Qi Tian,Shicai Fan,Jianxiong Tang
DOI: https://doi.org/10.1109/CIBCB49929.2021.9562895
2021-10-13
Abstract:Advances in high throughput sequencing have enabled DNA methylation profiling at single-cell resolution. The generation of single-cell methylation sequencing (scM-Seq) data provides unprecedented opportunities for a comprehensive dissection of epigenetic heterogeneity. An important step of exploring epigenetic heterogeneity is clustering cells according to their single-cell methylation profiles. However, the inherent sparsity and stochastic measurement characteristic of the data make it challenging. To this end, we introduce SINCEF, using spectral embedding fusion to reconstruct cell-to-cell pairwise distance for clustering single-cell methylation data. SIN CEF first calculates multiple basic distance matrices to capture cell-to-cell methylation dissimilarity relationships according to the global methylation status. Then it adopts spectral embedding to transform these basic distance matrices into the latent representations, pooling information from the basic distance measures. Finally, it reconstructs a novel distance matrix and implements hierarchical clustering to yield cell partitions. Assessments on several public scM-Seq datasets demonstrated that SINCEF could generate a more appropriate distance matrix to measure the methylation distance between cells, which considerably improved the clustering performance. As an additional benefit, the reconstructed novel distance matrix could help to visually assess the heterogeneity across cell populations through presenting the block structures in the hierarchical clustering heat maps. SINCEF is freely available on GitHub at https://github.com/TQBio/SINCEF.
Computer Science,Biology
What problem does this paper attempt to address?