GNNShap: Scalable and Accurate GNN Explanation using Shapley Values

Selahattin Akkas,Ariful Azad
DOI: https://doi.org/10.1145/3589334.3645599
2024-02-23
Abstract:Graph neural networks (GNNs) are popular machine learning models for graphs with many applications across scientific domains. However, GNNs are considered black box models, and it is challenging to understand how the model makes predictions. Game theoric Shapley value approaches are popular explanation methods in other domains but are not well-studied for graphs. Some studies have proposed Shapley value based GNN explanations, yet they have several limitations: they consider limited samples to approximate Shapley values; some mainly focus on small and large coalition sizes, and they are an order of magnitude slower than other explanation methods, making them inapplicable to even moderate-size graphs. In this work, we propose GNNShap, which provides explanations for edges since they provide more natural explanations for graphs and more fine-grained explanations. We overcome the limitations by sampling from all coalition sizes, parallelizing the sampling on GPUs, and speeding up model predictions by batching. GNNShap gives better fidelity scores and faster explanations than baselines on real-world datasets. The code is available at
Machine Learning,Social and Information Networks
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper "GNNShap: Scalable and Accurate GNN Explanation using Shapley Values" aims to address the challenges of graph neural network (GNNs) interpretation. Although GNNs perform well in handling graph - structured data, they are often regarded as "black - box" models, and it is difficult to understand the mechanisms behind their predictions. Specifically, the paper attempts to solve the following key problems: 1. **Accuracy of explanation**: Existing GNN explanation methods based on Shapley values, although providing a certain explanatory ability, are insufficient in accuracy. These methods often consider only a limited number of samples or coalitions of a specific size, resulting in explanations that are not comprehensive and precise enough. 2. **Computational efficiency**: Existing Shapley value methods have high computational complexity, especially when dealing with large - scale graph data. These methods usually require a large amount of computational resources, making them impractical in real - world applications. 3. **Fine - grained explanation**: Existing methods often focus on the explanation of node features or the entire sub - graph, while ignoring the importance of edges. The method proposed in the paper (GNNShap) focuses on providing the importance of edges and more fine - grained explanations to better understand the prediction mechanism of GNNs. ### Main contributions To address the above challenges, the paper proposes GNNShap, a GNN explanation model based on Shapley values. The main contributions are as follows: 1. **Providing edge importance scores**: GNNShap provides importance scores for all relevant edges, thereby providing a more natural and fine - grained explanation for the prediction of the target node. 2. **Improving sampling coverage**: By sampling over all possible coalition sizes, GNNShap improves the accuracy of the explanation. This method ensures that each coalition size is fully considered, avoiding the problem in existing methods of only focusing on small and large coalitions. 3. **Accelerating computation**: GNNShap significantly improves computational efficiency by parallelizing sampling and batch - processing model predictions. This makes GNNShap an order of magnitude faster than other Shapley - value - based methods when dealing with large - scale graph data. ### Method overview The main steps of GNNShap include: 1. **Obtaining the pruned computational graph**: For a two - layer GNN, find the computational graph and prune redundant edges to reduce computational complexity. 2. **Coalition sampling**: Sample on all possible sub - graphs, generate a binary mask matrix, and calculate the weight of each sample. 3. **Model prediction**: Use each sample to generate a prediction vector for the target node, and improve efficiency through batch - processing and parallel computing. 4. **Shapley value calculation**: Calculate the Shapley values of all edges according to the model prediction results, and finally generate an explanatory sub - graph. ### Experimental results The paper conducted experiments on multiple real - world datasets, including Cora, CiteSeer, PubMed, Coauthor - CS, Coauthor - Physics, and Facebook. The experimental results show that GNNShap outperforms existing baseline methods in both fidelity score and computational speed. ### Conclusion GNNShap successfully solves the accuracy and efficiency problems in GNN interpretation by improving the sampling strategy and optimizing the computational process. This method not only provides more fine - grained explanations but also can operate efficiently on large - scale graph data, providing a new direction for the interpretability research of GNNs.