IsoCell: an Approach to Enhance Single Cell Clustering by Integrating Isoform-level Expression Through Orthogonal Projection

Yingyi Liu,Hong-Dong Li,Yunpei Xu,Yi-Wei Liu,Xiaoqing Peng,Jianxin Wang
DOI: https://doi.org/10.1109/tcbb.2022.3147193
2022-01-01
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Abstract:Single cell RNA sequencing (scRNA-seq) provides a powerful approach for profiling transcriptomes at single cell resolution. An essential application of scRNA-seq is the discovery of cell types with the aid of clustering analysis. Currently, existing single cell clustering methods are exclusively based on gene-level expression data, without considering alternative splicing information. It has been shown that alternative splicing has an important influence on biological processes such as cell differentiation and cell cycle. We therefore hypothesize that adding information about alternative splicing may help enhance single cell clustering. This motivates us to develop a way to integrate isoform-level expression and gene-level expression. We report an approach to enhance single cell clustering by integrating isoform-level expression through orthogonal projection. First, we construct an orthogonal projection matrix based on gene expression data. Second, isoforms are projected to the gene space to remove the redundant information between them. Third, isoform selection is performed based on the residual of the projected expression and the selected isoforms are combined with gene expression data for subsequent clustering. We applied our method to sixteen scRNA-seq datasets. We find that alternative splicing contains differential information among cell types and can be integrated to enhance single cell clustering. Compared with using only gene-level expression data, the integration of isoform-level expression leads to better clustering performances for most of the datasets. The integration of isoform-level expression also has potential in the detection of novel cell subgroups. Our study shows that integrating isoform and gene-level expression is a promising way to improve single cell clustering. The IsoCell R package is freely available at both Github ( https://github.com/genemine/IsoCell ) and Zenodo ( https://zenodo.org/record/4395707 ).
What problem does this paper attempt to address?