Abstract:This paper presents a comprehensive empirical investigation into the interactions between various randomization techniques in Deep Neural Networks (DNNs) and their impact on learning performance. It is well-established that injecting randomness into the training process of DNNs, through various approaches, at different stages, is often beneficial for reducing overfitting and improving generalization. Nonetheless, the interactions between randomness techniques such as weight noise, dropout, and many others remain poorly understood. Consequently, it is challenging to determine which methods can be effectively combined to optimize DNN performance. To address this issue, we categorize the existing randomness techniques into four key types: injection of noise/randomness at the data, model structure, optimization or learning stage. We use this classification to identify gaps in the current coverage of potential mechanisms for the introduction of randomness, leading to proposing two new techniques: adding noise to the loss function and random masking of the gradient updates. In our empirical study, we employ a Particle Swarm Optimizer (PSO) for hyperparameter optimization (HPO) to explore the space of possible configurations to determine where and how much randomness should be injected to maximize DNN performance. We assess the impact of various types and levels of randomness for DNN architectures across standard computer vision benchmarks: MNIST, FASHION-MNIST, CIFAR10, and CIFAR100. Across more than 30 000 evaluated configurations, we perform a detailed examination of the interactions between randomness techniques and their combined impact on DNN performance. Our findings reveal that randomness through data augmentation and in weight initialization are the main contributors to performance improvement. Additionally, correlation analysis demonstrates that different optimizers, such as Adam and Gradient Descent with Momentum, prefer distinct types of randomization during the training process. A GitHub repository with the complete implementation and generated dataset is available 1 .

On Using Quasirandom Sequences in Machine Learning for Model Weight Initialization

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

A Sober Look at Neural Network Initializations

Initialization Seeds Facilitating Neural Network Quantization

An Experimental Study of Weight Initialization and Weight Inheritance Effects on Neuroevolution

On the effects of biased quantum random numbers on the initialization of artificial neural networks

Where Should We Begin? A Low-Level Exploration of Weight Initialization Impact on Quantized Behaviour of Deep Neural Networks

Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme

Comparison of Initial Learning Algorithms for Long Short-Term Memory Method on Real-Time Respiratory Signal Prediction

Deep ConvNet: Non-Random Weight Initialization for Repeatable Determinism, Examined with FSGM

Rolling the Dice for Better Deep Learning Performance: A Study of Randomness Techniques in Deep Neural Networks

A mathematical framework for improved weight initialization of neural networks using Lagrange multipliers

Randomly Initialized Subnetworks with Iterative Weight Recycling

On weight initialization in deep neural networks

Fluctuation-driven initialization for spiking neural network training

Alleviating Barren Plateaus in Parameterized Quantum Machine Learning Circuits: Investigating Advanced Parameter Initialization Strategies

Critical initialisation for deep signal propagation in noisy rectifier neural networks

Influence of Initialization on the Performance of Metaheuristic Optimizers

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks