scNovel: a scalable deep learning-based network for novel rare cell discovery in single-cell transcriptomics

Chuanyang Zheng,Yixuan Wang,Yuqi Cheng,Xuesong Wang,Hongxin Wei,Irwin King,Yu Li
DOI: https://doi.org/10.1093/bib/bbae112
IF: 9.5
2024-03-27
Briefings in Bioinformatics
Abstract:Abstract Single-cell RNA sequencing has achieved massive success in biological research fields. Discovering novel cell types from single-cell transcriptomics has been demonstrated to be essential in the field of biomedicine, yet is time-consuming and needs prior knowledge. With the unprecedented boom in cell atlases, auto-annotation tools have become more prevalent due to their speed, accuracy and user-friendly features. However, existing tools have mostly focused on general cell-type annotation and have not adequately addressed the challenge of discovering novel rare cell types. In this work, we introduce scNovel, a powerful deep learning-based neural network that specifically focuses on novel rare cell discovery. By testing our model on diverse datasets with different scales, protocols and degrees of imbalance, we demonstrate that scNovel significantly outperforms previous state-of-the-art novel cell detection models, reaching the most AUROC performance(the only one method whose averaged AUROC results are above 94%, up to 16.26% more comparing to the second-best method). We validate scNovel’s performance on a million-scale dataset to illustrate the scalability of scNovel further. Applying scNovel on a clinical COVID-19 dataset, three potential novel subtypes of Macrophages are identified, where the COVID-related differential genes are also detected to have consistent expression patterns through deeper analysis. We believe that our proposed pipeline will be an important tool for high-throughput clinical data in a wide range of applications.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of discovering novel rare cell types in single-cell transcriptomics. Specifically, it proposes a scalable deep learning framework named **scNovel** for the automatic detection of novel rare cell types in single-cell transcriptome data. #### Main Issues: 1. **Limitations of Existing Tools**: Existing automatic annotation tools mainly focus on annotating general cell types and do not adequately address the challenge of discovering novel rare cell types. 2. **Biomedical Importance**: Discovering novel rare cells is of significant importance in biology and medicine. For example, in cancer research, specific cancer stem cells (which often constitute 1-2% of the total tumor cells) have remarkable proliferative capacity. Therefore, identifying novel rare cells in individual patients can enable more personalized treatment plans. 3. **Technical Requirements**: In practical applications, query datasets may contain novel cell types not seen in reference datasets, and there is often a significant distribution difference between the reference and query sets. Thus, a tool is needed that can accurately classify cell types and identify novel rare cells. #### Solutions: 1. **scNovel Framework**: The paper proposes an integrated neural network framework, scNovel, which utilizes barcode preprocessing for automatic detection tasks. Through four different query stage modes (traditional classification mode, single combination mode, parallel combination mode, and sequential combination mode), scNovel's performance on different datasets is demonstrated. 2. **Performance Evaluation**: By testing on various datasets of different scales, protocols, and degrees of imbalance, scNovel's significant advantage in novel rare cell detection tasks is proven. scNovel's average AUROC results exceed 94%, outperforming the second-best method by 16.26%. 3. **Clinical Application**: Applying scNovel to clinical COVID-19 datasets successfully identified three potential novel macrophage subtypes and further analyzed the expression patterns of disease-related genes. Through these methods, scNovel not only significantly outperforms existing methods in terms of performance but also demonstrates its scalability and efficiency on large-scale datasets.