Abstract:Variational dropout (VD) is a generalization of Gaussian dropout, which aims at inferring the posterior of network weights based on a log-uniform prior on them to learn these weights as well as dropout rate simultaneously. The log-uniform prior not only interprets the regularization capacity of Gaussian dropout in network training, but also underpins the inference of such posterior However the log-uniform prior is an improper prior (i.e., its integral is infinite), which causes the inference of posterior to be ill-posed, thus restricting the regularization performance of VD. To address this problem, we present a new generalization of Gaussian dropout, termed variational Bayesian dropout (VBD), which turns to exploit a hierarchical prior on the network weights and infer a new joint posterior Specifically, we implement the hierarchical prior as a zero-mean Gaussian distribution with variance sampled from a uniform hyper-prior Then, we incorporate such a prior into inferring the joint posterior over network weights and the variance in the hierarchical prior with which both the network training and dropout rate estimation can be cast into a joint optimization problem. More importantly,the hierarchical prior is a proper prior which enables the inference of posterior to be well-posed. In addition, we further show that the proposed VBD can be seamlessly applied to network compression. Experiments on classification and network compression demonstrate the superior performance of the proposed VBD in regularizing network training.

A Bayesian encourages dropout

Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization.

<inline-formula> <tex-math notation="LaTeX">$\beta$ </tex-math></inline-formula>-Dropout: A Unified Dropout

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

<inline-formula> <tex-math notation="LaTeX">$\beta$ </tex-math> </inline-formula>-Dropout: A Unified Dropout

&Lt;inline-Formula> &Lt;tex-Math Notation="latex">$\beta$ &Lt;/tex-Math> &Lt;/inline-Formula>-dropout: A Unified Dropout

Dropout Reduces Underfitting

Variational Bayesian Dropout With A Hierarchical Prior

Implicit Regularization of Dropout

Self-Balanced Dropout

SOFT DROPOUT AND ITS VARIATIONAL BAYES APPROXIMATION

Stochastic Modified Equations and Dynamics of Dropout Algorithm

Adaptive Dropout Method Based on Biological Principles

Interpreting and Boosting Dropout from a Game-Theoretic View

Surrogate Dropout: Learning Optimal Drop Rate Through Proxy.

Heuristic dropout: an efficient regularization method for medical image segmentation models

R-Drop: Regularized Dropout for Neural Networks.

Towards Understanding and Improving Dropout in Game Theory

Learning Rate Dropout