Cross-species imputation and comparison of single-cell transcriptomic profiles

Ran Zhang,Mu Yang,Jacob Schreiber,Diana R ODay,James Turner,Jay Shendure,Christine M Disteche,Xinxian Deng,William S Noble
DOI: https://doi.org/10.1101/2023.10.19.563173
2024-08-12
Abstract:Cross-species comparison and prediction of gene expression profiles are important to understand regulatory changes during evolution and to transfer knowledge learned from model organisms to humans. Single-cell RNA-seq (scRNA-seq) profiles enable us to capture gene expression profiles with respect to variations among individual cells; however, cross-species comparison of scRNA-seq profiles is challenging because of data sparsity, batch effects, and the lack of one-to-one cell matching across species. Moreover, single-cell profiles are challenging to obtain in certain biological contexts, limiting the scope of hypothesis generation. Here we developed Icebear, a neural network framework that decomposes single-cell measurements into factors representing cell identity, species, and batch factors. Icebear enables accurate prediction of single-cell gene expression profiles across species, thereby providing high-resolution cell type and disease profiles in under-characterized contexts. Icebear also facilitates direct cross-species comparison of single-cell expression profiles for conserved genes that are located on the X chromosome in eutherian mammals but on autosomes in chicken. This comparison, for the first time, revealed evolutionary and diverse adaptations of X-chromosome upregulation in mammals.
Bioinformatics
What problem does this paper attempt to address?
The paper aims to address several key challenges in cross-species single-cell transcriptomics comparison and prediction. Specifically, the research team developed a neural network framework called Icebear to tackle the following major issues: 1. **Data Sparsity**: Single-cell RNA sequencing (scRNA-seq) data typically have a high rate of missing values and noise levels. 2. **Batch Effects**: There are technical differences in data between different experimental batches. 3. **Cell Matching Difficulty**: Directly comparing the expression profiles of individual cells between different species is very challenging. 4. **Single-Cell Data Acquisition Limitations**: Single-cell data in certain biological contexts are difficult to obtain, limiting the scope of hypothesis generation. Through these methods, Icebear is able to achieve the following goals: - Accurately predict cross-species single-cell gene expression profiles. - Characterize cell types and disease states at high resolution in the absence of complete single-cell data. - Directly compare single-cell expression profiles between different species, especially for conserved genes located on the X chromosome. The paper particularly focuses on the study of X chromosome evolution and uses Icebear to reveal the evolutionary patterns and adaptive diversity of X chromosome upregulation (XCU) in mammals. By analyzing single-cell data from mice, opossums, and chickens, Icebear demonstrates the variation in XCU expression patterns among different mammalian species and different X-linked genes, providing new insights into the evolution of the X chromosome.