inClust: a general framework for clustering that integrates data from multiple sources

Lifei Wang,Rui Nie,Zhang Zhang,Weiwei Gu,Shuo Wang,Anqi Wang,Jiang Zhang,Jun Cai
DOI: https://doi.org/10.1101/2022.05.27.493706
2022-01-01
Abstract:Clustering is one of the most commonly used methods in single-cell RNA sequencing (scRNA-seq) data analysis and other fields of biology. Traditional clustering methods usually use data from a single source as the input (e.g. scRNA-seq data). However, as the data become more and more complex and contain information from multiple sources, a clustering method that could integrate multiple data is required. Here, we present inClust (integrated clustering), a clustering method that integrates information from multiple sources based on variational autoencoder and vector arithmetic in latent space. inClust perform information integration and clustering jointly, meanwhile it could utilize the labeling information from data as regulation information. It is a flexible framework that can accomplish different tasks under different modes, ranging from supervised to unsupervised. We demonstrate the capability of inClust in the tasks of conditional out-of-distribution generation under supervised mode; label transfer under semi-supervised mode and guided clustering mode; spatial domain identification under unsupervised mode. inClust performs well in all tasks, indicating that it is an excellent general framework for clustering and task-related clustering in the era of multi-omics. ### Competing Interest Statement The authors have declared no competing interest.
What problem does this paper attempt to address?