Wordreg: Mitigating the Gap Between Training and Inference with Worst-Case Drop Regularization

Jun Xia,Ge Wang,Bozhen Hu,Cheng Tan,Jiangbin Zheng,Yongjie Xu,Stan Z. Li
DOI: https://doi.org/10.1109/icassp49357.2023.10095552
2023-01-01
Abstract:Dropout has emerged as one of the most frequently used techniques for training deep neural networks (DNNs). Although effective, the sampled sub-model by random dropout during training is inconsistent with the full model (without dropout) during inference. To mitigate this undesirable gap, we propose WordReg, a simple yet effective regularization built on dropout that enforces the consistency between the outputs of different sub-models sampled by dropout. Specifically, WordReg first obtains the worst-case dropout by maximizing the divergence between the outputs with two sub-models with different random dropouts. And then, it encourages the agreements between the outputs of the two sub-models with worstcase divergence. Extensive experiments on diverse DNNs and tasks reveal that WordReg can achieve notable and consistent improvements over non-regularized models and yields some state-of-the-art results. Theoretically, we verify that WordReg can reduce the gap between training and inference.
What problem does this paper attempt to address?