Abstract:This paper presents a comprehensive empirical investigation into the interactions between various randomization techniques in Deep Neural Networks (DNNs) and their impact on learning performance. It is well-established that injecting randomness into the training process of DNNs, through various approaches, at different stages, is often beneficial for reducing overfitting and improving generalization. Nonetheless, the interactions between randomness techniques such as weight noise, dropout, and many others remain poorly understood. Consequently, it is challenging to determine which methods can be effectively combined to optimize DNN performance. To address this issue, we categorize the existing randomness techniques into four key types: injection of noise/randomness at the data, model structure, optimization or learning stage. We use this classification to identify gaps in the current coverage of potential mechanisms for the introduction of randomness, leading to proposing two new techniques: adding noise to the loss function and random masking of the gradient updates. In our empirical study, we employ a Particle Swarm Optimizer (PSO) for hyperparameter optimization (HPO) to explore the space of possible configurations to determine where and how much randomness should be injected to maximize DNN performance. We assess the impact of various types and levels of randomness for DNN architectures across standard computer vision benchmarks: MNIST, FASHION-MNIST, CIFAR10, and CIFAR100. Across more than 30 000 evaluated configurations, we perform a detailed examination of the interactions between randomness techniques and their combined impact on DNN performance. Our findings reveal that randomness through data augmentation and in weight initialization are the main contributors to performance improvement. Additionally, correlation analysis demonstrates that different optimizers, such as Adam and Gradient Descent with Momentum, prefer distinct types of randomization during the training process. A GitHub repository with the complete implementation and generated dataset is available 1 .

A Random Focusing Method with Jensen–Shannon Divergence for Improving Deep Neural Network Performance Ensuring Architecture Consistency

FocusNet: Classifying Better by Focusing on Confusing Classes

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

Randomness Regularization with Simple Consistency Training for Neural Networks

Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks

NeuFair: Neural Network Fairness Repair with Dropout

Shakeout: A New Approach to Regularized Deep Neural Network Training

Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability

The Overfocusing Bias of Convolutional Neural Networks: A Saliency-Guided Regularization Approach

Rolling the Dice for Better Deep Learning Performance: A Study of Randomness Techniques in Deep Neural Networks

Regularizing neural networks with adaptive local drop

AntiDote: Attention-based Dynamic Optimization for Neural Network Runtime Efficiency

Theoretical analysis of skip connections and batch normalization from generalization and optimization perspectives

Approximate Random Dropout.

NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks

Growing Deep Neural Network Considering with Similarity between Neurons

FlexiDrop: Theoretical Insights and Practical Advances in Random Dropout Method on GNNs

Continuous Dropout

A Progressive Subnetwork Searching Framework for Dynamic Inference

Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training

Direct Feedback Alignment With Sparse Connections for Local Learning