Abstract:In this paper, we propose a large-scale sparse graph downsampling method based on a sparse random graph model, which allows for the adjustment of different sparsity levels. We combine sparsity and topological similarity: the sparse graph model reduces the node connection probability as the graph size increases, while the downsampling method preserves a specific topological connection pattern during this change. Based on the downsampling method, we derive a theoretical transferability bound about downsampling sparse graph convolutional networks (GCNs), that higher sampling rates, greater average degree expectations, and smaller initial graph sizes lead to better downsampling transferability performance.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is about the transferability issue of large - scale sparse graph convolutional networks (GCNs) in down - sampling training. Specifically, the researchers proposed a large - scale sparse graph down - sampling method based on the sparse random graph model, which allows for adjusting different levels of sparsity. By combining sparsity and topological similarity, this method aims to reduce the decrease in node connection probability as the graph scale increases while keeping a specific topological connection pattern unchanged. Based on this down - sampling method, the researchers derived a theoretical transferability bound, which indicates that a higher sampling rate, a larger expected average degree, and a smaller initial graph scale can lead to better down - sampling transferability performance.
### Main Contributions
1. **Proposed a down - sampling method based on the sparse graph model**: Established the connection between sparsity and topological similarity.
2. **Proved the transferability theorem**: Defined the difference distance in GCN outputs between the initial large - scale sparse graph and its down - sampled smaller - scale graph, and this distance is related to the initial scale, the expected average node degree, and the down - sampling rate.
### Research Background
- **Applications of large - scale graph convolutional networks (GCNs)**: Due to the wide application of large - scale graph data in multiple fields, GCNs have achieved remarkable success in these tasks.
- **Training challenges**: Training large - scale graphs requires a large amount of storage and time resources, so multiple down - sampling methods have been proposed to accelerate training.
- **Shortcomings of existing research**: Existing research mainly focuses on how specific sampling methods affect transferability, while less consideration is given to the influence of the topological properties of the initial sparse graph on transferability.
### Methods
1. **Sparse random graph model**: Proposed a simple sparse random graph model that allows for adjusting the sparsity level to generate large - scale sparse graphs with a fixed expected average degree.
2. **Down - sampling method**: Proposed a large - scale graph down - sampling method to maintain a similar topological structure.
3. **Theoretical analysis**: Derived the transferability bound of down - sampling training on large - scale sparse graphs and proved relevant theorems.
### Experimental Results
- **Influence of different initial scales**: The experimental results show that as the sampling scale increases, the transferability error decreases, but graphs with a larger initial scale usually have a larger error.
- **Influence of different expected average degrees**: The experiment also shows that as the sampling scale increases, the transferability error decreases, and graphs with a larger expected average degree usually have a smaller error.
### Formula Summary
- **Expected edge density of the sparse graph model**:
\[
\epsilon(n)=\frac{\text{actual number of edges}}{n(n - 1)/2}
\]
- **Transferability bound after down - sampling**:
\[
E\left\{\left\|I\cdot\Phi(\tilde{S}_N,x_N,H)-I\cdot\Phi(\tilde{S}_n,x_n,H)\right\|^2\right\}\leq C_m Ah\left\{\sqrt{\frac{1 - L_2^2}{L_1^2}\frac{N}{d}}+A_R^+\sqrt{\frac{6t_N\sqrt{N}}{d}\left(1+\sqrt{\frac{N}{n}}\right)}\right\}+A_s\sqrt{6\left(\frac{1}{\sqrt{N}}+\frac{1}{\sqrt{n}}\right)}+2C_m\Delta h(\lambda)
\]
### Conclusion
This research, through theoretical analysis and experimental verification, shows how to optimize the down - sampling transferability of large - scale sparse graph convolutional networks by adjusting the sampling rate, the initial graph scale, and the expected average degree. This provides new ideas and methods for the efficient processing of large - scale graph data.