Martin Hemberg,Federico Marini,Shila Ghazanfar,Ahmad Al Ajami,Najla Abassi,Benedict Anchang,Bérénice A. Benayoun,Yue Cao,Ken Chen,Yesid Cuesta-Astroz,Zach DeBruine,Calliope A. Dendrou,Iwijn De Vlaminck,Katharina Imkeller,Ilya Korsunsky,Alex R. Lederer,Pieter Meysman,Clint Miller,Kerry Mullan,Uwe Ohler,Nikolaos Patikas,Jonas Schuck,Jacqueline HY Siu,Timothy J. Triche Jr.,Alex Tsankov,Sander W. van der Laan,Masanao Yajima,Jean Yang,Fabio Zanini,Ivana Jelic

Abstract:The field of single-cell biology is growing rapidly and is generating large amounts of data from a variety of species, disease conditions, tissues, and organs. Coordinated efforts such as CZI CELLxGENE, HuBMAP, Broad Institute Single Cell Portal, and DISCO, allow researchers to access large volumes of curated datasets. Although the majority of the data is from scRNAseq experiments, a wide range of other modalities are represented as well. These resources have created an opportunity to build and expand the computational biology ecosystem to develop tools necessary for data reuse, and for extracting novel biological insights. Here, we highlight achievements made so far, areas where further development is needed, and specific challenges that need to be overcome.

What problem does this paper attempt to address?

The paper primarily explores the applications, opportunities, and challenges of large-scale cell atlases in the field of single-cell biology. Specifically, the paper attempts to address the following key issues: 1. **Data Integration and Sharing**: With the advancement of single-cell sequencing technology, a large amount of datasets has been generated. How to effectively integrate these data and share them through a unified platform so that researchers can more easily access and analyze these data. 2. **Data Preprocessing and Standardization**: Ensuring the quality and consistency of data is crucial for constructing cell atlases. The paper discusses how to preprocess data from different sources to reduce batch effects and ensure that the data conforms to standard formats. 3. **Metadata and Ontology**: To facilitate the reanalysis of existing datasets, it is necessary to establish comprehensive metadata standards and cell type ontologies. This not only helps in the standardized management of data but also improves the reproducibility and accuracy of data analysis. 4. **Data Integration and Meta-Analysis**: How to effectively integrate data from different studies and experimental conditions to achieve large-scale meta-analysis, thereby revealing new biological insights. This involves handling various confounding factors and technical differences. 5. **Applications in Biomedical Research**: The ultimate goal is to use cell atlases to accelerate research in disease management and treatment. For example, by analyzing the association between specific gene loci and cell types to uncover the molecular mechanisms of complex diseases; and by identifying disease-related cell states to discover potential drug targets. 6. **Integration of New Technologies**: With the development of artificial intelligence and other advanced technologies, how to apply these new technologies to single-cell data analysis to improve research efficiency and accuracy. In summary, this paper aims to provide a comprehensive perspective for the field of single-cell biology by discussing the above issues, thereby promoting further development in this field.

Insights, opportunities and challenges provided by large cell atlases

12 Grand Challenges in Single-Cell Data Science.

Single Cells Make Big Data: New Challenges and Opportunities in Transcriptomics

CZ CELLxGENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data

A sandbox for prediction and integration of DNA, RNA, and proteins in single cells

Eleven grand challenges in single-cell data science

Statistical Single Cell Multi-Omics Integration

Single-cell biology: what does the future hold?

Single-cell Technologies: from Research to Application.

Challenges and Emerging Directions in Single-Cell Analysis

An Overview of Bioinformatics Challenges for Human Cell Atlas

Large-scale single-cell RNA sequencing atlases of human immune cells across lifespan: Possibilities and challenges

Single-Cell Transcriptomics Bioinformatics and Computational Challenges

Computational modelling in single-cell cancer genomics: methods and future directions

Considerations for building and using integrated single-cell atlases

Mapping Cell Atlases at the Single‐Cell Level

Single cell biology—a Keystone Symposia report

Exploiting Single-Cell Tools in Gene and Cell Therapy

Current Challenges in the Bioinformatics of Single Cell Genomics

CDCP: a visualization and analyzing platform for single-cell datasets

Single-cell RNA-sequencing: The future of genome biology is now