A downsampling method enables robust clustering and integration of single-cell transcriptome data

Jun Ren,Quan Zhang,Ying Zhou,Yudi Hu,Xuejing Lyu,Hongkun Fang,Jing Yang,Rongshan Yu,Xiaodong Shi,Qiyuan Li
DOI: https://doi.org/10.1016/j.jbi.2022.104093
IF: 8
2022-06-01
Journal of Biomedical Informatics
Abstract:The random noises, sampling biases, and batch effects often confound true biological variations in single-cell RNA-sequencing (scRNA-seq) data. Adjusting such biases is key to the robust discoveries in downstream analyses, such as cell clustering, gene selection and data integration. Here we propose a model-based downsampling algorithm based on minimal unbiased representative points (MURPXMBD). MURPXMBD is designed to retrieve a set of representative points by reducing gene-wise random independent errors, while retaining the covariance structure of biological origin hence provide an unbiased representation of the cell population. Subsequent validation using benchmark datasets shows that MURPXMBD can improve the quality and accuracy of clustering algorithms, and thus facilitate the discovery of new cell types. Besides, MURPXMBD also improves the performance of dataset integration algorithms. In summary, MURPXMBD serves as a useful noise-reduction method for single-cell sequencing analysis in biomedical studies.
medical informatics,computer science, interdisciplinary applications
What problem does this paper attempt to address?