Accelerating Graph Neural Networks with a Novel Matrix Compression Format

João N. F. Alves,Samir Moustafa,Siegfried Benkner,Alexandre P. Francisco,Wilfried N. Gansterer,Luís M. S. Russo
2024-09-04
Abstract:The inference and training stages of Graph Neural Networks (GNNs) are often dominated by the time required to compute a long sequence of matrix multiplications between the sparse graph adjacency matrix and its embedding. To accelerate these stages, we first propose the Compressed Binary Matrix (CBM) storage format to succinctly represent the binary adjacency matrix of an unweighted graph. Then, we show how to generalize this representation to normalized adjacency matrices of unweighted graphs which arise in the context of GNNs. Finally, we develop efficient matrix multiplication kernels based on this compressed representation. The matrix multiplication kernels proposed in this work never require more scalar operations than classic sparse matrix multiplication algorithms. Experimental evaluation shows that the matrix multiplication strategies proposed outperform the current state-of-the-art implementations provided by Intel MKL, achieving speedups close to 5$\times$. Furthermore, our optimized matrix-multiplication strategies accelerated the inference time of a GNN by up to $3\times$.
Data Structures and Algorithms
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to accelerate the matrix multiplication operations in Graph Neural Networks (GNNs). Specifically, the inference and training stages of GNNs are usually dominated by a series of matrix multiplications between the sparse graph adjacency matrix and its embeddings, and these operations take up most of the computation time. To accelerate these stages, the authors propose a new matrix compression format - **Compressed Binary Matrix (CBM)** storage format. ### Problem Background In GNNs, especially in architectures that rely on Message Passing Layers (MPLs), the nodes in each hidden layer will aggregate the embeddings of neighboring nodes and adjust their own embeddings according to the collected information. For example, in the widely - used Graph Convolutional Networks (GCNs), the message generated by each layer is actually the product of the graph's adjacency matrix and its current embedding. For a two - layer GCN, the calculation process is as follows: \[ \hat{A} \sigma(\hat{A} X W_0) W_1 \] where: - \(\hat{A}\) is the normalized adjacency matrix, defined as \(\hat{A} = D^{-\frac{1}{2}} (A + I) D^{-\frac{1}{2}}\), - \(D\) is the degree diagonal matrix of the graph, - \(\sigma\) is the element - wise activation function, - \(X\) is the node feature matrix, - \(W_0\) and \(W_1\) are the learnable dense matrices of the first and second layers. Since in practical scenarios the adjacency matrix is usually much larger than other operand matrices, the matrix multiplications involving this matrix represent the main computational burden of GNN training and inference. ### Solution To accelerate these matrix multiplication operations, the authors propose the CBM format. This format takes advantage of the fact that the adjacency matrix of an unweighted graph is binary and represents the similarity between matrix rows through differential compression. Specifically, the CBM format only represents the differences (deltas) of one row relative to another row, thereby reducing the number of required scalar operations. ### Main Contributions 1. **Propose the CBM format**: An efficient binary matrix compression scheme that can reduce the memory footprint of unweighted graphs. 2. **Introduce a new algorithm**: Significantly accelerate the sequence of matrix multiplications between the (possibly normalized) adjacency matrix of the graph and its embeddings. 3. **Worst - case guarantee**: Even in cases where compression is not possible, the number of scalar operations required by the CBM format will not exceed the number of operations required by the classic sparse storage format. 4. **Implementation and experiment**: Implement the CBM format and the corresponding matrix multiplication kernels and integrate them into deep learning frameworks such as PyTorch. Experimental results show that this method is nearly 5 times faster than the existing state - of - the - art SpMM implementations in both sequential and parallel environments, and shortens the inference time of a two - layer GCN by more than 3 times. ### Experimental Evaluation The authors conducted experiments using an Intel Xeon Gold 6130 CPU, and the evaluation metrics include compression rate and running time reduction. Experimental results show that the CBM format exhibits significant advantages on multiple real - world graph datasets, especially in sparse - dense matrix multiplication (SpMM) and GCN inference. In conclusion, this paper aims to solve the problem of inefficient matrix multiplication calculations in GNNs by introducing the CBM format, thereby accelerating the training and inference processes of GNNs.