Integration of millions of transcriptomes using batch-aware triplet neural networks

Lukas M. Simon,Yin-Ying Wang,Zhongming Zhao
DOI: https://doi.org/10.1038/s42256-021-00361-8
IF: 23.8
2021-06-21
Nature Machine Intelligence
Abstract:Efficient integration of heterogeneous and increasingly large single-cell RNA sequencing data poses a major challenge for analysis and, in particular, comprehensive atlasing efforts. Here we developed a novel deep learning algorithm called INSCT (Insight) to overcome batch effects using batch-aware triplet neural networks. We use simulated and real data to demonstrate that INSCT generates an embedding space that accurately integrates cells across experiments, platforms and species. Our benchmark comparisons with current state-of-the-art single-cell RNA sequencing integration methods revealed that INSCT outperforms competing methods in scalability while achieving comparable accuracies. Moreover, using INSCT in semisupervised mode enables users to classify unlabelled cells by projecting them into a reference collection of annotated cells. To demonstrate scalability, we applied INSCT to integrate more than 2.6 million transcriptomes from four independent studies of mouse brains in less than 1.5 h using less than 25 GB of memory. This feature empowers researchers to perform atlasing-scale data integration in a typical desktop computer environment. INSCT is freely available at <a href="https://github.com/lkmklsmn/insct">https://github.com/lkmklsmn/insct</a>.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?