On the limits of neural network explainability via descrambling

Shashank Sule,Richard G. Spencer,Wojciech Czaja
2024-09-03
Abstract:We characterize the exact solutions to neural network descrambling--a mathematical model for explaining the fully connected layers of trained neural networks (NNs). By reformulating the problem to the minimization of the Brockett function arising in graph matching and complexity theory we show that the principal components of the hidden layer preactivations can be characterized as the optimal explainers or descramblers for the layer weights, leading to descrambled weight matrices. We show that in typical deep learning contexts these descramblers take diverse and interesting forms including (1) matching largest principal components with the lowest frequency modes of the Fourier basis for isotropic hidden data, (2) discovering the semantic development in two-layer linear NNs for signal recovery problems, and (3) explaining CNNs by optimally permuting the neurons. Our numerical experiments indicate that the eigendecompositions of the hidden layer data--now understood as the descramblers--can also reveal the layer's underlying transformation. These results illustrate that the SVD is more directly related to the explainability of NNs than previously thought and offers a promising avenue for discovering interpretable motifs for the hidden action of NNs, especially in contexts of operator learning or physics-informed NNs, where the input/output data has limited human readability.
Machine Learning,Signal Processing,Numerical Analysis
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve a key problem in the field of neural network explainability, that is, how to explain trained neural networks (NNs) through mathematical models, especially the weights of fully - connected layers. Specifically, the authors introduce a method called "descrambling" to visualize and explain the weight matrices of neural networks, enabling these weights to be presented in a human - readable way. #### Main research questions 1. **Theoretical basis of descrambling transformation**: The authors hope to provide a theoretical framework to explain how the descrambling transformation interprets neural network weights. They not only focus on how to reverse - engineer the network but also hope to characterize the descrambling transformations that make reverse - engineering possible. 2. **Optimization method of descrambling transformation**: A new method based on eigen - decomposition is proposed to find the minimum solution of the descrambling transformation, and it is shown that this method may provide a clearer explanation than the previous gradient - descent - based methods. 3. **Descrambling behavior under different input distributions and network architectures**: The behavior and limit cases of the descrambling transformation are studied under different input data distributions (such as isotropic data, noisy signal input) and different network architectures (such as linear networks, convolutional networks). 4. **Combination of descrambling and network training**: The relationship between the descrambling transformation and neural network training is explored, especially in the Saxe - Mclelland - Ganguli (SMG) model, how the descrambling transformation reflects the dynamic characteristics of network training. #### Mathematical models and methods - **Objective function of descrambling transformation**: The optimal descrambling transformation \(P\) is found by minimizing the smoothness loss function \(\eta_{SC}(Pf_k(X))\): \[ \eta_{SC}(Pf_k(X))=\|DPf_k(X)\|_F^2 \] where \(D\) is the second - order differential operator or Fourier differential matrix, and \(f_k(X)\) is the pre - activation data of the \(k\) - th layer. - **Eigen - decomposition method**: The eigen - decomposition method is used to solve the minimization problem of the descrambling transformation, and the descrambling transformation is represented as: \[ \hat{P} = TU^{\top} \] where \(T\) is the right singular vector of \(D^{\top}D\), and \(U\) is the principal component of the pre - activation data autocorrelation matrix. #### Experimental results The effectiveness of the descrambling transformation is verified through experiments, especially on the DEERNet network, showing the influence of different initialization methods on the descrambling effect. The results show that the descrambling transformation obtained by the eigen - decomposition method can better explain network weights and shows clearer band - pass filtering characteristics in the frequency domain. ### Summary The main goal of this paper is to provide a new theory and method through the descrambling transformation to explain the weights of trained neural networks, thereby improving the explainability of neural networks. The authors not only propose new optimization methods but also explore the behavior of the descrambling transformation under different conditions and prove its effectiveness through experiments.