Defining and Extracting generalizable interaction primitives from DNNs

Lu Chen,Siyu Lou,Benhao Huang,Quanshi Zhang
2024-09-13
Abstract:Faithfully summarizing the knowledge encoded by a deep neural network (DNN) into a few symbolic primitive patterns without losing much information represents a core challenge in explainable AI. To this end, Ren et al. (2024) have derived a series of theorems to prove that the inference score of a DNN can be explained as a small set of interactions between input variables. However, the lack of generalization power makes it still hard to consider such interactions as faithful primitive patterns encoded by the DNN. Therefore, given different DNNs trained for the same task, we develop a new method to extract interactions that are shared by these DNNs. Experiments show that the extracted interactions can better reflect common knowledge shared by different DNNs.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the problem of extracting interpretable and generalizable interaction patterns in deep neural networks (DNNs). Specifically, the researchers aim to improve the generalizability of these patterns by extracting shared interaction patterns across different DNNs. The main contributions of the paper are as follows: 1. **Definition and Extraction of Interaction Patterns**: The researchers defined two types of interaction patterns—AND interactions and OR interactions—and demonstrated that these interaction patterns can be used to explain the output of DNNs. 2. **Addressing the Fidelity Issue of Interaction Patterns**: Existing methods for extracting interaction patterns face two main challenges: first, the decomposition of interaction patterns is not unique, leading to diverse extracted interaction patterns; second, ensuring that the extracted interaction patterns have generalizability across different DNNs. 3. **Proposing a New Method**: To overcome the above challenges, the researchers proposed a new method to extract generalizable interaction patterns. This method optimizes a loss function to ensure that the extracted interaction patterns are not only sparse but also have high consistency across multiple DNNs. 4. **Experimental Validation**: The researchers conducted experiments on multiple tasks, including sentiment classification, dialogue tasks, and image classification, to validate the effectiveness and generalizability of the proposed interaction pattern extraction method. In summary, the paper aims to enhance the interpretability of deep neural networks by extracting shared interaction patterns across different DNNs and ensuring that these interaction patterns have good generalization performance across different models.