Domain generalization enables general cancer cell annotation in single-cell and spatial transcriptomics

Zhixing Zhong,Junchen Hou,Zhixian Yao,Lei Dong,Feng Liu,Junqiu Yue,Tiantian Wu,Junhua Zheng,Gaoliang Ouyang,Chaoyong Yang,Jia Song
DOI: https://doi.org/10.1038/s41467-024-46413-6
IF: 16.6
2024-03-03
Nature Communications
Abstract:Single-cell and spatial transcriptome sequencing, two recently optimized transcriptome sequencing methods, are increasingly used to study cancer and related diseases. Cell annotation, particularly for malignant cell annotation, is essential and crucial for in-depth analyses in these studies. However, current algorithms lack accuracy and generalization, making it difficult to consistently and rapidly infer malignant cells from pan-cancer data. To address this issue, we present Cancer-Finder, a domain generalization-based deep-learning algorithm that can rapidly identify malignant cells in single-cell data with an average accuracy of 95.16%. More importantly, by replacing the single-cell training data with spatial transcriptomic datasets, Cancer-Finder can accurately identify malignant spots on spatial slides. Applying Cancer-Finder to 5 clear cell renal cell carcinoma spatial transcriptomic samples, Cancer-Finder demonstrates a good ability to identify malignant spots and identifies a gene signature consisting of 10 genes that are significantly co-localized and enriched at the tumor-normal interface and have a strong correlation with the prognosis of clear cell renal cell carcinoma patients. In conclusion, Cancer-Finder is an efficient and extensible tool for malignant cell annotation.
multidisciplinary sciences
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issues of accuracy and generalization in annotating malignant cells in single-cell and spatial transcriptomics. Specifically: 1. **Limitations of Current Algorithms**: - Current algorithms for identifying malignant cells lack accuracy and generalization, making it difficult to consistently and quickly infer malignant cells from cross-cancer data. - Existing methods primarily rely on marker genes or copy number variation (CNV) events to identify malignant cells, but these methods are susceptible to technical artifacts (such as dropout and high sparsity), leading to false-negative results. - There is no universal set of cancer-specific marker genes, and the existing knowledge of cancer marker genes is insufficient to distinguish malignant cells from normal cells in all tumor microenvironments. - Machine learning-based methods (such as ikarus and Casee) have limited performance on single-cell data and cannot be applied to spatial transcriptomics data. 2. **Proposed New Method**: - To address the above issues, the authors propose Cancer-Finder, a domain generalization-based deep learning algorithm that can quickly and accurately identify malignant cells in single-cell data, with an average accuracy of 95.16%. - More importantly, by replacing single-cell training data with spatial transcriptomics datasets, Cancer-Finder can accurately identify malignant spots on spatial slices. 3. **Application and Validation**: - The authors applied Cancer-Finder to 5 spatial transcriptomics samples of clear cell renal cell carcinoma (ccRCC), successfully identifying malignant spots and discovering a gene signature composed of 10 genes that significantly co-localize at the tumor-normal tissue interface and are closely related to the prognosis of ccRCC patients. - Through internal and external validation with multiple datasets, Cancer-Finder has demonstrated stability and efficiency across different tissue types and datasets. ### Summary Cancer-Finder is an efficient and scalable tool that can accurately annotate malignant cell states, applicable to both single-cell and spatial transcriptomics data. By employing a domain generalization strategy, it enhances the model's generalization ability and accuracy, showing promise for significant contributions to cancer research.