Dropout Reduces Underfitting

Zhuang Liu,Zhiqiu Xu,Joseph Jin,Zhiqiang Shen,Trevor Darrell

2023-06-01

Abstract:Introduced by Hinton et al. in 2012, dropout has stood the test of time as a regularizer for preventing overfitting in neural networks. In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training. During the early phase, we find dropout reduces the directional variance of gradients across mini-batches and helps align the mini-batch gradients with the entire dataset's gradient. This helps counteract the stochasticity of SGD and limit the influence of individual batches on model training. Our findings lead us to a solution for improving performance in underfitting models - early dropout: dropout is applied only during the initial phases of training, and turned off afterwards. Models equipped with early dropout achieve lower final training loss compared to their counterparts without dropout. Additionally, we explore a symmetric technique for regularizing overfitting models - late dropout, where dropout is not used in the early iterations and is only activated later in training. Experiments on ImageNet and various vision tasks demonstrate that our methods consistently improve generalization accuracy. Our results encourage more research on understanding regularization in deep learning and our methods can be useful tools for future neural network training, especially in the era of large data. Code is available at <a class="link-external link-https" href="https://github.com/facebookresearch/dropout" rel="external noopener nofollow">this https URL</a>.

Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use the dropout technique to reduce underfitting in the training of deep - learning models. The traditional view holds that dropout is mainly used to prevent overfitting, but the author found that using dropout in the early stage of training can reduce the directional variance of gradients on small - batch data, making the directions of small - batch gradients more consistent and more aligned with the gradient directions of the entire dataset. This helps counteract the randomness of stochastic gradient descent (SGD), limits the influence of individual batches on model training, and thus improves the training effect of the model. Especially when the amount of data is very large, the model is more likely to have underfitting rather than overfitting. Therefore, the paper proposes two new dropout usage strategies: early dropout and late dropout. - **Early dropout**: Use dropout only in the early stage of training and then turn off dropout. This method helps the model fit the training data better and reduces the final training loss. - **Late dropout**: Do not use dropout in the early stage of training, but start using it in the later stage of training. This method helps improve the generalization performance of the model, especially for large - scale models. Through these methods, the paper aims to provide a new perspective for understanding the role of dropout in deep learning and provide useful tools for future neural network training.

Dropout Reduces Underfitting

Wordreg: Mitigating the Gap Between Training and Inference with Worst-Case Drop Regularization

R-Drop: Regularized Dropout for Neural Networks.

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

Implicit Regularization of Dropout

Effective and Efficient Dropout for Deep Convolutional Neural Networks

Dropout, a basic and effective regularization method for a deep learning model: a case study

Guided Dropout: Improving Deep Networks Without Increased Computation

Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization.

Shakeout: A New Approach to Regularized Deep Neural Network Training

Reducing Overfitting in Deep Networks by Decorrelating Representations

Dropout in Training Neural Networks: Flatness of Solution and Noise Structure

Drop-Activation: Implicit Parameter Reduction and Harmonic Regularization

Data Dropout: Optimizing Training Data for Convolutional Neural Networks

Heuristic dropout: an efficient regularization method for medical image segmentation models

Continuous Dropout

Self-Balanced Dropout

Surrogate Dropout: Learning Optimal Drop Rate Through Proxy.

<inline-formula> <tex-math notation="LaTeX">$\beta$ </tex-math></inline-formula>-Dropout: A Unified Dropout

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

&Lt;inline-Formula> &Lt;tex-Math Notation="latex">$\beta$ &Lt;/tex-Math> &Lt;/inline-Formula>-dropout: A Unified Dropout

Dropout Reduces Underfitting

Wordreg: Mitigating the Gap Between Training and Inference with Worst-Case Drop Regularization

R-Drop: Regularized Dropout for Neural Networks.

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

Implicit Regularization of Dropout

Effective and Efficient Dropout for Deep Convolutional Neural Networks

Dropout, a basic and effective regularization method for a deep learning model: a case study

Guided Dropout: Improving Deep Networks Without Increased Computation

Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization.

Shakeout: A New Approach to Regularized Deep Neural Network Training

Reducing Overfitting in Deep Networks by Decorrelating Representations

Dropout in Training Neural Networks: Flatness of Solution and Noise Structure

Drop-Activation: Implicit Parameter Reduction and Harmonic Regularization

Data Dropout: Optimizing Training Data for Convolutional Neural Networks

Heuristic dropout: an efficient regularization method for medical image segmentation models

Continuous Dropout

Self-Balanced Dropout

Surrogate Dropout: Learning Optimal Drop Rate Through Proxy.

<inline-formula> <tex-math notation="LaTeX">$\beta$ </tex-math></inline-formula>-Dropout: A Unified Dropout

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

&Lt;inline-Formula&gt; &Lt;tex-Math Notation="latex"&gt;$\beta$ &Lt;/tex-Math&gt; &Lt;/inline-Formula&gt;-dropout: A Unified Dropout

&Lt;inline-Formula> &Lt;tex-Math Notation="latex">$\beta$ &Lt;/tex-Math> &Lt;/inline-Formula>-dropout: A Unified Dropout