StereoMM: A Graph Fusion Model for Integrating Spatial Transcriptomic Data and Pathological Images
Bingying Luo,Fei Teng,Guo Tang,Weixuan Chen,Chi Qu,Xuanzhu Liu,Xin Liu,Xing Liu,Huaqiang Huang,Yu Feng,Xue Zhang,Min Jian,Mei Li,Feng Xi,Guibo Li,Sha Liao,Ao Chen,Xun Xu,Jiajun Zhang
DOI: https://doi.org/10.1101/2024.05.04.592486
2024-05-07
Abstract:Spatially resolved omics technologies generating multimodal and high-throughput data lead to the urgent need for advanced analysis to allow the biological discoveries by comprehensively utilizing information from multi-omics data. The H&E image and spatial transcriptomic data indicate abundant features which are different and complementary to each other. AI algorithms can perform nonlinear analysis on these aligned or unaligned complex datasets to decode tumoral heterogeneity for detecting functional domain. However,the interpretability of AI-generated outcomes for human experts is a problem hindering application of multi-modal analysis in clinic. We presented a machine learning based toolchain called StereoMM, which is a graph fusion model that can integrate gene expression, histological images, and spatial location. StereoMM firstly performs information interaction on transcriptomic and imaging features through the attention module, guaranteeing explanations for its decision-making processes. The interactive features are input into the graph autoencoder together with the graph of spatial position, so that multimodal features are fused in a self-supervised manner. Here, StereoMM was subjected to mouse brain tissue, demonstrating its capability to discern fine tissue architecture, while highlighting its advantage in computational speed. Utilizing data from Stereo-seq of human lung adenosquamous carcinoma and 10X Visium of human breast cancer, we showed its superior performance in spatial domain recognition over competing software and its ability to reveal tumor heterogeneity. The fusion approach for imaging and gene expression data within StereoMM aids in the more accurate identification of domains, unveils critical molecular features, and elucidates the connections between different domains, thereby laying the groundwork for downstream analysis.
Bioinformatics