Abstract:Deep fully connected neural networks (FCNNs) are the workhorses of deep learning and are broadly applicable due to their "agnostic" structure. Generally, the learning capability of FCNNs improves with the increase in the number of layers and the width of each layer, which, however, comes at an increased computational cost in training. To alleviate this difficulty, in this paper, we develop a gradually reinforced differentiable homotopy (GRDH) method to train the FCNNs. Explicitly speaking, by introducing an extra variable t ranging between zero and one, we design a series of auxiliary functions, which are continuous and monotonically increasing in t . With the above functions, we formulate an optimization problem for training an artificial FCNN, which progressively incorporates more layers and nodes into the neural network as t changes from one to zero and eventually becomes an FCNN with a target number of layers or width of nodes. We prove that the set of solutions to the artificial problem contains an everywhere differentiable path, which starts from a uniquely given point at t=1 and ends at the weights and biases of the target FCNN as t goes to zero. The proposed GRDH method is a novel method that incorporates the differentiable homotopy methods into the training of deep learning methods, and retains the satisfactory theoretical convergence property the classical homotopy methods possess. To promote the application of the GRDH method, we implement it and another efficient method called HTA to train the same FCNNs and find that the GRDH method outperforms the HTA both in the computational time and number of iterations for obtaining a solution with similar (even higher) accuracy. Numerical results further confirm the effectiveness of the GRDH method to solve classification problems.

Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks

Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)

Latent Assistance Networks: Rediscovering Hyperbolic Tangents in RL

Convex Formulations for Training Two-Layer ReLU Neural Networks

Differentiable homotopy methods for gradually reinforcing the training of fully connected neural networks

Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK

Phase Diagram for Two-layer ReLU Neural Networks at Infinite-width Limit.

Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks

Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and Initialization

Training a Two Layer ReLU Network Analytically

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks

A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks

Fast Finite Width Neural Tangent Kernel

Tightening convex relaxations of trained neural networks: a unified approach for convex and S-shaped activations

Equidistribution-based training of Free Knot Splines and ReLU Neural Networks

Dynamics of Deep Neural Networks and Neural Tangent Hierarchy

Optimization Over Trained Neural Networks: Taking a Relaxing Walk

Hyperbolic Linear Units For Deep Convolutional Neural Networks

Hadamard Representations: Augmenting Hyperbolic Tangents in RL

Accelerated analysis on the triple momentum method for a two-layer ReLU neural network