Abstract:The evolution of a deep neural network trained by the gradient descent can be described by its neural tangent kernel (NTK) as introduced in [20], where it was proven that in the infinite width limit the NTK converges to an explicit limiting kernel and it stays constant during training. The NTK was also implicit in some other recent papers [6,13,14]. In the overparametrization regime, a fully-trained deep neural network is indeed equivalent to the kernel regression predictor using the limiting NTK. And the gradient descent achieves zero training loss for a deep overparameterized neural network. However, it was observed in [5] that there is a performance gap between the kernel regression using the limiting NTK and the deep neural networks. This performance gap is likely to originate from the change of the NTK along training due to the finite width effect. The change of the NTK along the training is central to describe the generalization features of deep neural networks. In the current paper, we study the dynamic of the NTK for finite width deep fully-connected neural networks. We derive an infinite hierarchy of ordinary differential equations, the neural tangent hierarchy (NTH) which captures the gradient descent dynamic of the deep neural network. Moreover, under certain conditions on the neural network width and the data set dimension, we prove that the truncated hierarchy of NTH approximates the dynamic of the NTK up to arbitrary precision. This description makes it possible to directly study the change of the NTK for deep neural networks, and sheds light on the observation that deep neural networks outperform kernel regressions using the corresponding limiting NTK.

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

A Revision of Neural Tangent Kernel-based Approaches for Neural Networks

How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent

Dynamics of Deep Neural Networks and Neural Tangent Hierarchy

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK

A Theory of Neural Tangent Kernel Alignment and Its Influence on Training

On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model

Evolution of Neural Tangent Kernels under Benign and Adversarial Training

How does a kernel based on gradients of infinite-width neural networks come to be widely used: a review of the neural tangent kernel

Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel Theory?

When and why PINNs fail to train: A neural tangent kernel perspective

A Unified Kernel for Neural Network Learning

Stochastic Gradient Descent for Two-layer Neural Networks

Exact Convergence Rates of the Neural Tangent Kernel in the Large Depth Limit

A generalized neural tangent kernel for surrogate gradient learning

Tensor Programs II: Neural Tangent Kernel for Any Architecture

Optimal Convergence Rates for Neural Operators

A Comparative Analysis of Optimization and Generalization Properties of Two-Layer Neural Network and Random Feature Models under Gradient Descent Dynamics

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains