SpaNCMG: improving spatial domains identification of spatial transcriptomics using neighborhood-complementary mixed-view graph convolutional network

Zhihao Si,Hanshuang Li,Wenjing Shang,Yanan Zhao,Lingjiao Kong,Chunshen Long,Yongchun Zuo,Zhenxing Feng
DOI: https://doi.org/10.1093/bib/bbae259
IF: 9.5
2024-06-01
Briefings in Bioinformatics
Abstract:The advancement of spatial transcriptomics (ST) technology contributes to a more profound comprehension of the spatial properties of gene expression within tissues. However, due to challenges of high dimensionality, pronounced noise and dynamic limitations in ST data, the integration of gene expression and spatial information to accurately identify spatial domains remains challenging. This paper proposes a SpaNCMG algorithm for the purpose of achieving precise spatial domain description and localization based on a neighborhood-complementary mixed-view graph convolutional network. The algorithm enables better adaptation to ST data at different resolutions by integrating the local information from KNN and the global structure from r -radius into a complementary neighborhood graph. It also introduces an attention mechanism to achieve adaptive fusion of different reconstructed expressions, and utilizes KPCA method for dimensionality reduction. The application of SpaNCMG on five datasets from four sequencing platforms demonstrates superior performance to eight existing advanced methods. Specifically, the algorithm achieved highest ARI accuracies of 0.63 and 0.52 on the datasets of the human dorsolateral prefrontal cortex and mouse somatosensory cortex, respectively. It accurately identified the spatial locations of marker genes in the mouse olfactory bulb tissue and inferred the biological functions of different regions. When handling larger datasets such as mouse embryos, the SpaNCMG not only identified the main tissue structures but also explored unlabeled domains. Overall, the good generalization ability and scalability of SpaNCMG make it an outstanding tool for understanding tissue structure and disease mechanisms. Our codes are available at https://github.com/ZhihaoSi/SpaNCMG.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
The paper attempts to address the challenge of accurately identifying spatial domains in Spatial Transcriptomics (ST) data. Specifically, due to the high dimensionality, significant noise, and dynamic constraints of ST data, combining gene expression with spatial information to accurately identify spatial domains remains difficult. To address these issues, the paper proposes a new algorithm called SpaNCMG, which is based on a Neighborhood-Complementary Mixed-View Graph Convolutional Network, aiming to achieve precise description and localization of spatial domains. The SpaNCMG algorithm enhances adaptability to ST data of different resolutions by integrating local information (from K-Nearest Neighbors, KNN) and global structure (from r-radius) to construct a complementary neighborhood graph. Additionally, the algorithm introduces an attention mechanism to achieve adaptive fusion of different reconstructed expressions and employs Kernel Principal Component Analysis (KPCA) for dimensionality reduction. Experimental results show that SpaNCMG outperforms eight existing advanced methods on five datasets from four sequencing platforms, achieving ARI accuracies of 0.63 and 0.52 on the human dorsolateral prefrontal cortex and mouse somatosensory cortex datasets, respectively. It can also accurately identify the locations of marker genes in the mouse olfactory bulb tissue and infer the biological functions of different regions. For larger datasets such as the mouse embryo, SpaNCMG not only identifies major tissue structures but also explores unannotated domains, demonstrating good generalization ability and scalability.