The Curated Cancer Cell Atlas: comprehensive characterisation of tumours at single-cell resolution

Michael Tyler,Avishai Gavish,Chaya Barbolin,Roi Tschernichovsky,Rouven Hoefflin,Michael Mints,Sidharth V Puram,Itay Tirosh
DOI: https://doi.org/10.1101/2024.10.11.617836
2024-10-12
Abstract:Single-cell RNA-seq (scRNA-seq) has transformed the study of cancer biology. Recent years have seen a rapid expansion in the number of single-cell cancer studies, yet most of these studies profiled few tumours, such that individual datasets have limited statistical power. Combining the data and results across studies holds great promise but also involves various challenges. We recently began to address these challenges by curating a large collection of cancer scRNA-seq datasets, and leveraging it for systematic analyses of tumor heterogeneity. Here we significantly extend this repository to 124 datasets for over 40 cancer types, together comprising 2,822 samples, with improved data annotations, visualisations and exploration. Utilising this vast cohort, we systematically quantified context-dependent gene expression and proliferation patterns across cell types and cancer types. These data, annotations and analysis results are all freely available for exploration and download via the Curated Cancer Cell Atlas (3CA) website (https://www.weizmann.ac.il/sites/3CA/), a central source of data and analyses for the cancer research community that opens new avenues in cancer research.
Cancer Biology
What problem does this paper attempt to address?
This paper aims to solve several key problems in single - cell RNA sequencing (scRNA - seq) data in cancer research: 1. **Limited statistical power**: Most single - cell cancer studies only analyze a small number of tumor samples, resulting in limited statistical power for each individual dataset and making it difficult to identify robust and clinically significant gene expression patterns. 2. **Difficulty in cross - study data comparison**: Batch - effect and differences in technical methods between different studies make it difficult to directly compare data. 3. **Lack of systematic analysis**: Although the amount of single - cell data is increasing rapidly, there is a lack of a comprehensive resource to systematically analyze tumor heterogeneity. To solve these problems, the authors constructed a large single - cell RNA sequencing data set library - Curated Cancer Cell Atlas (3CA), and carried out the following work: - **Data integration and standardization**: Collected 124 data sets, covering more than 40 cancer types, a total of 2,822 samples and more than 5.5 million single cells. Standardized the format of these data and verified cell annotation. - **Systematic analysis**: Using this huge data set, systematically quantified gene expression and proliferation patterns in different cell types and cancer types. - **Online resource**: Provided an online portal where researchers can freely download data, view data visualization results, and conduct various exploratory analyses. Through these efforts, 3CA has become an important resource, providing high - resolution single - cell transcriptome data for the cancer research community, which is helpful for a deeper understanding of tumor heterogeneity and the development of new treatment strategies.