Abstract:This study delves into the intricate dynamics of trained deep neural networks and their relationships with network parameters. Trained networks predominantly continue training in a single direction, known as the drift mode. This drift mode can be explained by the quadratic potential model of the loss function, suggesting a slow exponential decay towards the potential minima. We unveil a correlation between Hessian eigenvectors and network weights. This relationship, hinging on the magnitude of eigenvalues, allows us to discern parameter directions within the network. Notably, the significance of these directions relies on two defining attributes: the curvature of their potential wells (indicated by the magnitude of Hessian eigenvalues) and their alignment with the weight vectors. Our exploration extends to the decomposition of weight matrices through singular value decomposition. This approach proves practical in identifying critical directions within the Hessian, considering both their magnitude and curvature. Furthermore, our examination showcases the applicability of principal component analysis in approximating the Hessian, with update parameters emerging as a superior choice over weights for this purpose. Remarkably, our findings unveil a similarity between the largest Hessian eigenvalues of individual layers and the entire network. Notably, higher eigenvalues are concentrated more in deeper layers. Leveraging these insights, we venture into addressing catastrophic forgetting, a challenge of neural networks when learning new tasks while retaining knowledge from previous ones. By applying our discoveries, we formulate an effective strategy to mitigate catastrophic forgetting, offering a possible solution that can be applied to networks of varying scales, including larger architectures.

Neural Network Learning for Principal Component Analysis: A Multistage Decomposition Approach

A Multistage Decomposition Approach for Adaptive Principal Component Analysis

Adaptive dimensionality reduction for neural network-based online principal component analysis

Biologically Plausible Online Principal Component Analysis Without Recurrent Neural Dynamics

Compressing Recurrent Neural Network Models Through Principal Component Analysis

Forecasting Approach for Short-term Traffic Flow Based on Principal Component Analysis and Combined Neural Network

A Unified Self-Stabilizing Neural Network Algorithm for Principal and Minor Components Extraction.

A Unified Self-Stabilizing Neural Network Algorithm for Principal Takagi Component Extraction

Neural Network Learning to Non-Linear Principal Component Analysis

Concise Coupled Neural Network Algorithm for Principal Component Analysis

Distributed Principal Component Analysis Based on Randomized Low-Rank Approximation.

Fuzzy Neural Network Based on Principal Component

Fast principal component extraction by a weighted information criterion

Fast Principal Component Extraction by a Homogeneous Neural Network.

Dynamic Principal Component Analysis in High Dimensions

Hessian Eigenvectors and Principal Component Analysis of Neural Network Weight Matrices

A Neural Network Learning for Adaptively Extracting Cross-Correlation Features Between Two High-Dimensional Data Streams

Condition Identification Using Direction of Principal Component and Neural Network

New Nonlinear Principal Analysis Method Based on RBF Neural Network

nPCA: a linear dimensionality reduction method using a multilayer perceptron

Neural Network Characterization and Entropy Regulated Data Balancing through Principal Component Analysis