Hypergraph Neural Networks Reveal Spatial Domains from Single-cell Transcriptomics Data

Mehrad Soltani,Luis Rueda
2024-10-24
Abstract:The task of spatial clustering of transcriptomics data is of paramount importance. It enables the classification of tissue samples into diverse subpopulations of cells, which, in turn, facilitates the analysis of the biological functions of clusters, tissue reconstruction, and cell-cell interactions. Many approaches leverage gene expressions, spatial locations, and histological images to detect spatial domains; however, Graph Neural Networks (GNNs) as state of the art models suffer from a limitation in the assumption of pairwise connections between nodes. In the case of domain detection in spatial transcriptomics, some cells are found to be not directly related. Still, they are grouped as the same domain, which shows the incapability of GNNs for capturing implicit connections among the cells. While graph edges connect only two nodes, hyperedges connect an arbitrary number of nodes along their edges, which lets Hypergraph Neural Networks (HGNNs) capture and utilize richer and more complex structural information than traditional GNNs. We use autoencoders to address the limitation of not having the actual labels, which are well-suited for unsupervised learning. Our model has demonstrated exceptional performance, achieving the highest iLISI score of 1.843 compared to other methods. This score indicates the greatest diversity of cell types identified by our method. Furthermore, our model outperforms other methods in downstream clustering, achieving the highest ARI values of 0.51 and Leiden score of 0.60.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to reveal spatial domains from single - cell transcriptome data, that is, by analyzing spatial transcriptome data to classify tissue samples into different cell sub - populations, so as to better understand the spatial relationships between cells, biological functions and cell - cell interactions. ### Background and Challenges of the Problem 1. **Importance of Spatial Transcriptome Data** Spatial transcriptome data can help us identify new cell types and provide a better understanding of biological processes in the tissue microenvironment. Such data can be obtained through imaging techniques and sequencing techniques. Each technique has its own advantages and disadvantages, but all generate multi - modal, multi - scale and high - resolution data. 2. **Limitations of Existing Methods** - Many existing methods rely on gene expression, spatial location and histological images to detect spatial domains, but these methods are insufficient in capturing the internal relationships or spatial dependencies of two - dimensional features. - Graph Neural Networks (GNNs), as the state - of - the - art models, assume pairwise connections between nodes and cannot capture some of the implicit complex relationships between cells. For example, in the field of spatial transcriptome, some cells are still in the same domain although they are not directly related, which indicates the limitations of GNNs in capturing implicit connections between cells. 3. **Research Objectives** - This paper aims to solve the above problems by introducing Hypergraph Neural Networks (HGNNs). HGNNs can capture higher - order relationships because each hyper - edge can connect any number of nodes, thus encoding richer structural information. - At the same time, since the actual labels are difficult to obtain, the authors use autoencoders for unsupervised learning to detect patterns and anomalies. ### Overview of the Method - **Hypergraph Construction**: Use the K - Nearest Neighbor algorithm to capture the nearest cells and group them into hyper - edges to form a hypergraph structure. - **Autoencoder**: Used to generate the latent representation of gene expression to handle unlabeled data. - **Hypergraph Neural Network**: Learn node representations through a two - step message - passing process (vertex - hyperedge - vertex) to capture the complex relationships between cells. - **Optimization and Evaluation**: Optimize the model by minimizing the error between the reconstructed similarity matrix and the original adjacency matrix, and use metrics such as iLISI and ARI to evaluate the model performance. ### Experimental Results This model performs well on multiple evaluation metrics. For example, it obtains the highest iLISI score (1.843), indicating that it can identify the most diverse cell types. In addition, it also performs excellently in downstream clustering tasks, obtaining the highest ARI value (0.51) and Leiden score (0.60). In conclusion, this paper solves the shortcomings of existing methods in capturing complex relationships between cells by introducing hypergraph neural networks, providing new ideas and technical means for revealing spatial domains from single - cell transcriptome data.