Cross-modality representation and multi-sample integration of spatially resolved omics data

Zhen Li,Xuejian Cui,Xiaoyang Chen,Zijing Gao,Yuyao Liu,Yan Pan,Shengquan Chen,Rui Jiang
DOI: https://doi.org/10.1101/2024.06.10.598155
2024-06-11
Abstract:Spatially resolved sequencing technologies have revolutionized the characterization of biological regulatory processes within microenvironment by simultaneously accessing the states of genomic regions, genes and proteins, along with the spatial coordinates of cells, necessitating advanced computational methods for the cross-modality and multi-sample integrated analysis of spatial omics datasets. To address this gap, we propose PRESENT, an effective and scalable contrastive learning framework, for the cross-modality representation of spatially resolved omics data. Through comprehensive experiments on massive spatially resolved datasets, PRESENT achieves superior performance across various species, tissues, and sequencing technologies, including spatial epigenomics, transcriptomics, and multi-omics. Specifically, PRESENT empowers the incorporation of spatial dependency and complementary omics information simultaneously, facilitating the detection of spatial domains and uncovering biological regulatory mechanisms within microenvironment. Furthermore, PRESENT can be extended to the integrative analysis of horizontal and vertical samples across different dissected regions or developmental stages, thereby promoting the identification of hierarchical structures from a spatiotemporal perspective.
Bioinformatics
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are cross - modal representation and multi - sample integration analysis of spatially resolved omics data. Specifically: 1. **Cross - modal representation**: Spatially resolved sequencing techniques can simultaneously obtain the spatial coordinates of cells and the states of genomic regions, genes, and proteins, which requires the development of advanced computational methods to process these multi - modal data. The paper proposes an effective and scalable contrastive learning framework named PRESENT for cross - modal representation of spatially resolved omics data. Through comprehensive experiments, PRESENT shows superior performance on datasets of different species, tissues, and sequencing techniques (such as spatial epigenomics, transcriptomics, and multi - omics). 2. **Multi - sample integration**: Due to technical limitations, each spatial sequencing sample usually only focuses on data in specific anatomical regions or conditions, and there are differences in spatial coordinates among different samples, which hinders the exploration of spatially resolved samples across multiple anatomical regions or developmental stages. Therefore, the paper also proposes PRESENT - BC, an extended multi - sample integration framework that can effectively eliminate redundant batch effects while maintaining shared and sample - specific biological variations, thereby detecting hierarchical functional structures from a spatio - temporal perspective. 3. **Improvement of low - quality data**: The paper explores how to use external reference data to improve the analysis effect of samples with low sequencing depth and signal - to - noise ratio. By introducing prior information, PRESENT shows a significant performance improvement when dealing with low - quality samples. Overall, this paper aims to provide a unified, advanced, and scalable framework for cross - modal representation and multi - sample integration analysis of spatially resolved omics data, in order to systematically reveal gene regulation processes and cell activities in the tissue microenvironment.