Abstract:We characterize the exact solutions to neural network descrambling--a mathematical model for explaining the fully connected layers of trained neural networks (NNs). By reformulating the problem to the minimization of the Brockett function arising in graph matching and complexity theory we show that the principal components of the hidden layer preactivations can be characterized as the optimal explainers or descramblers for the layer weights, leading to descrambled weight matrices. We show that in typical deep learning contexts these descramblers take diverse and interesting forms including (1) matching largest principal components with the lowest frequency modes of the Fourier basis for isotropic hidden data, (2) discovering the semantic development in two-layer linear NNs for signal recovery problems, and (3) explaining CNNs by optimally permuting the neurons. Our numerical experiments indicate that the eigendecompositions of the hidden layer data--now understood as the descramblers--can also reveal the layer's underlying transformation. These results illustrate that the SVD is more directly related to the explainability of NNs than previously thought and offers a promising avenue for discovering interpretable motifs for the hidden action of NNs, especially in contexts of operator learning or physics-informed NNs, where the input/output data has limited human readability.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve a key problem in the field of neural network explainability, that is, how to explain trained neural networks (NNs) through mathematical models, especially the weights of fully - connected layers. Specifically, the authors introduce a method called "descrambling" to visualize and explain the weight matrices of neural networks, enabling these weights to be presented in a human - readable way. #### Main research questions 1. **Theoretical basis of descrambling transformation**: The authors hope to provide a theoretical framework to explain how the descrambling transformation interprets neural network weights. They not only focus on how to reverse - engineer the network but also hope to characterize the descrambling transformations that make reverse - engineering possible. 2. **Optimization method of descrambling transformation**: A new method based on eigen - decomposition is proposed to find the minimum solution of the descrambling transformation, and it is shown that this method may provide a clearer explanation than the previous gradient - descent - based methods. 3. **Descrambling behavior under different input distributions and network architectures**: The behavior and limit cases of the descrambling transformation are studied under different input data distributions (such as isotropic data, noisy signal input) and different network architectures (such as linear networks, convolutional networks). 4. **Combination of descrambling and network training**: The relationship between the descrambling transformation and neural network training is explored, especially in the Saxe - Mclelland - Ganguli (SMG) model, how the descrambling transformation reflects the dynamic characteristics of network training. #### Mathematical models and methods - **Objective function of descrambling transformation**: The optimal descrambling transformation \(P\) is found by minimizing the smoothness loss function \(\eta_{SC}(Pf_k(X))\): \[ \eta_{SC}(Pf_k(X))=\|DPf_k(X)\|_F^2 \] where \(D\) is the second - order differential operator or Fourier differential matrix, and \(f_k(X)\) is the pre - activation data of the \(k\) - th layer. - **Eigen - decomposition method**: The eigen - decomposition method is used to solve the minimization problem of the descrambling transformation, and the descrambling transformation is represented as: \[ \hat{P} = TU^{\top} \] where \(T\) is the right singular vector of \(D^{\top}D\), and \(U\) is the principal component of the pre - activation data autocorrelation matrix. #### Experimental results The effectiveness of the descrambling transformation is verified through experiments, especially on the DEERNet network, showing the influence of different initialization methods on the descrambling effect. The results show that the descrambling transformation obtained by the eigen - decomposition method can better explain network weights and shows clearer band - pass filtering characteristics in the frequency domain. ### Summary The main goal of this paper is to provide a new theory and method through the descrambling transformation to explain the weights of trained neural networks, thereby improving the explainability of neural networks. The authors not only propose new optimization methods but also explore the behavior of the descrambling transformation under different conditions and prove its effectiveness through experiments.

On the limits of neural network explainability via descrambling

Which Neural Network Makes More Explainable Decisions? an Approach Towards Measuring Explainability

Neural Networks Decoded: Targeted and Robust Analysis of Neural Network Decisions via Causal Explanations and Reasoning

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Scalable Partial Explainability in Neural Networks via Flexible Activation Functions

Library network, a possible path to explainable neural networks

A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

Deeper Interpretability of Deep Networks

NeuroView: Explainable Deep Network Decision Making

Learning local discrete features in explainable-by-design convolutional neural networks

Understanding Neural Networks through Representation Erasure.

Explainable Neural Networks: Achieving Interpretability in Neural Models

Interpretable Disentanglement of Neural Networks by Extracting Class-Specific Subnetwork

Less is More: Discovering Concise Network Explanations

Unsupervised Learning of Neural Networks to Explain Neural Networks (extended abstract)

Unsupervised Learning of Neural Networks to Explain Neural Networks

Visual Interpretability forDeepLearning

Neural Network Layer Matrix Decomposition reveals Latent Manifold Encoding and Memory Capacity

Understanding CNN Hidden Neuron Activations Using Structured Background Knowledge and Deductive Reasoning

Corrupting Neuron Explanations of Deep Visual Features

Foiling Explanations in Deep Neural Networks