Abstract:This paper presents a comprehensive empirical investigation into the interactions between various randomization techniques in Deep Neural Networks (DNNs) and their impact on learning performance. It is well-established that injecting randomness into the training process of DNNs, through various approaches, at different stages, is often beneficial for reducing overfitting and improving generalization. Nonetheless, the interactions between randomness techniques such as weight noise, dropout, and many others remain poorly understood. Consequently, it is challenging to determine which methods can be effectively combined to optimize DNN performance. To address this issue, we categorize the existing randomness techniques into four key types: injection of noise/randomness at the data, model structure, optimization or learning stage. We use this classification to identify gaps in the current coverage of potential mechanisms for the introduction of randomness, leading to proposing two new techniques: adding noise to the loss function and random masking of the gradient updates. In our empirical study, we employ a Particle Swarm Optimizer (PSO) for hyperparameter optimization (HPO) to explore the space of possible configurations to determine where and how much randomness should be injected to maximize DNN performance. We assess the impact of various types and levels of randomness for DNN architectures across standard computer vision benchmarks: MNIST, FASHION-MNIST, CIFAR10, and CIFAR100. Across more than 30 000 evaluated configurations, we perform a detailed examination of the interactions between randomness techniques and their combined impact on DNN performance. Our findings reveal that randomness through data augmentation and in weight initialization are the main contributors to performance improvement. Additionally, correlation analysis demonstrates that different optimizers, such as Adam and Gradient Descent with Momentum, prefer distinct types of randomization during the training process. A GitHub repository with the complete implementation and generated dataset is available 1 .

Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks

Rolling the Dice for Better Deep Learning Performance: A Study of Randomness Techniques in Deep Neural Networks

Randomness Regularization with Simple Consistency Training for Neural Networks

How Does Data Diversity Shape the Weight Landscape of Neural Networks?

Learning Randomized Algorithms with Transformers

Quantifying Inherent Randomness in Machine Learning Algorithms

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

Understanding Stochastic Optimization Behavior at the Layer Update Level (Student Abstract)

A Random Focusing Method with Jensen–Shannon Divergence for Improving Deep Neural Network Performance Ensuring Architecture Consistency

Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness against Adversarial Attack

Approximate Random Dropout.

Improving the Diversity of Bootstrapped DQN by Replacing Priors With Noise

Slot Machines: Discovering Winning Combinations of Random Weights in Neural Networks

Exploring the effect of training-time randomness on the performance of deep neural networks for intrusion detection

On Using Quasirandom Sequences in Machine Learning for Model Weight Initialization

Harvesting Randomness to Optimize Distributed Systems

Uniform Learning in a Deep Neural Network via "Oddball" Stochastic Gradient Descent

"Oddball SGD": Novelty Driven Stochastic Gradient Descent for Training Deep Neural Networks

Towards Dropout Training for Convolutional Neural Networks

NeuFair: Neural Network Fairness Repair with Dropout

Meta-probability Weighting for Improving Reliability of DNNs to Label Noise