Explainable t-SNE for single-cell RNA-seq data analysis

Henry Han,Tianyu Zhang,Chun Li,Mary Lauren Benton,Juan Wang,Junyi Li
DOI: https://doi.org/10.1101/2022.01.12.476084
2022-01-13
Abstract:Abstract Background Single-cell RNA (scRNA-seq) sequencing technologies trigger the study of individual cell gene expression and reveal the diversity within cell populations. To measure cell-to-cell similarity based on their transcription and gene expression, many dimension reduction methods are employed to retrieve corresponding low-dimensional embeddings of input scRNA-seq data to conduct clustering. However, the methods lack explainability and may not perform well with scRNA-seq data because they are not customized for high-dimensional sparse scRNA-seq data. Results In this study, we propose an explainable t-SNE: cell-driven t-SNE (c-TSNE) that fuses cell differences reflected from biologically meaningful distance metrics for input data. Our study shows that the proposed method not only enhances the interpretation of the original t-SNE visualization but also demonstrates favorable single cell segregation performance on benchmark datasets compared to state-of-the-art peers. The robustness analysis shows that the proposed cell-driven t-SNE demonstrates robustness to dropout and noise in clustering. It provides a novel and practical way to investigate the interpretability of t-SNE in scRNA-seq data analysis. Unlike the general assumption that the explainability of a machine learning method may need to compromise with learning efficiency, the proposed explainable t-SNE improves both clustering efficiency and explainability. More importantly, our work suggests that widely used t-SNE can be easily misused in existing scRNA-seq analysis, because its default Euclidean distance can bring biases or meaningless results in cell difference evaluation for high-dimensional sparse scRNA-seq data. To the best of our knowledge, it is the first explainable t-SNE proposed in scRNA-seq analysis and will inspire other explainable machine learning method development in the field. Conclusion The proposed explainable t-SNE outperforms classic t-SNE and its peers in meaningful visualization and segregation. The poor performance of the classic t-SNE highlights the importance of developing explainable machine learning methods in scRNA-seq analysis. The explainable t-SNE is a data-centric customized ML enhance efficiency in data analysis through bringing more biological insights and interpretations.
What problem does this paper attempt to address?