Abstract:A good weight initialization is crucial to accelerate the convergence of the weights in a neural network. However, training a neural network is still time-consuming, despite recent advances in weight initialization approaches. In this paper, we propose a mathematical framework for the weight initialization in the last layer of a neural network. We first derive analytically a tight constraint on the weights that accelerates the convergence of the weights during the back-propagation algorithm. We then use linear regression and Lagrange multipliers to analytically derive the optimal initial weights and initial bias of the last layer, that minimize the initial training loss given the derived tight constraint. We also show that the restrictive assumption of traditional weight initialization algorithms that the expected value of the weights is zero is redundant for our approach. We first apply our proposed weight initialization approach to a Convolutional Neural Network that predicts the Remaining Useful Life of aircraft engines. The initial training and validation loss are relatively small, the weights do not get stuck in a local optimum, and the convergence of the weights is accelerated. We compare our approach with several benchmark strategies. Compared to the best performing state-of-the-art initialization strategy (Kaiming initialization), our approach needs 34% less epochs to reach the same validation loss. We also apply our approach to ResNets for the CIFAR-100 dataset, combined with transfer learning. Here, the initial accuracy is already at least 53%. This gives a faster weight convergence and a higher test accuracy than the benchmark strategies.

A weight initialization based on the linear product structure for neural networks

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Improved weight initialization for deep and narrow feedforward neural network

On weight initialization in deep neural networks

A mathematical framework for improved weight initialization of neural networks using Lagrange multipliers

A Sober Look at Neural Network Initializations

Weight initialization based‐rectified linear unit activation function to improve the performance of a convolutional neural network model

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

Deep activity propagation via weight initialization in spiking neural networks

Using linear initialisation to improve speed of convergence and fully-trained error in Autoencoders

On the Role of Initialization on the Implicit Bias in Deep Linear Networks

Where Should We Begin? A Low-Level Exploration of Weight Initialization Impact on Quantized Behaviour of Deep Neural Networks

Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications

Improving performance of recurrent neural network with relu nonlinearity

Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training

Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis

Principled Weight Initialisation for Input-Convex Neural Networks

Towards Understanding the Condensation of Neural Networks at Initial Training

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks

Principles for Initialization and Architecture Selection in Graph Neural Networks with ReLU Activations