Abstract:One major challenge in training Deep Neural Networks is preventing overfitting. Many techniques such as data augmentation and novel regularizers such as Dropout have been proposed to prevent overfitting without requiring a massive amount of training data. In this work, we propose a new regularizer called DeCov which leads to significantly reduced overfitting (as indicated by the difference between train and val performance), and better generalization. Our regularizer encourages diverse or non-redundant representations in Deep Neural Networks by minimizing the cross-covariance of hidden activations. This simple intuition has been explored in a number of past works but surprisingly has never been applied as a regularizer in supervised learning. Experiments across a range of datasets and network architectures show that this loss always reduces overfitting while almost always maintaining or increasing generalization performance and often improving performance over Dropout.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is **preventing over - fitting in deep neural networks (DNN)**. Specifically, the author proposes a new regularization method - **DeCov loss**, which reduces redundant representations by minimizing the cross - covariance between hidden - layer activations, thereby improving the generalization ability of the model. ### Problem Background When training deep neural networks, a common challenge is **preventing over - fitting**. Even with a large amount of labeled data, deep networks are still prone to over - fitting. Over - fitting is manifested as the model performing well on the training set but poorly on the validation or test set. This is especially more obvious when dealing with new domains and new tasks, because each new task usually requires re - collecting and labeling a large amount of data. ### Existing Solutions Currently, there are multiple techniques for preventing over - fitting, such as: - **Data Augmentation**: Improve the robustness of the model by generating more diverse training data. - **Dropout**: Randomly drop a part of neurons to prevent co - adaptation, that is, multiple hidden units rely on each other to perform certain functions, resulting in them being highly correlated. - Traditional regularization methods such as **L2 Regularization**, **Lasso**, etc. However, although these methods are effective, there is still room for improvement. In particular, the author observes an association between high correlations between hidden - layer activations and over - fitting, and thus proposes the idea of further improving the model's generalization ability by explicitly reducing this correlation. ### Proposed New Method The author introduces **DeCov loss**, whose core idea is to encourage different, non - redundant representations by minimizing the Frobenius norm of the cross - covariance matrix between hidden - layer activations. The specific formula is as follows: \[ C_{i,j}=\frac{1}{N}\sum_{n}(h_{n,i}-\mu_{i})(h_{n,j}-\mu_{j}) \] Here, \(h_{n,i}\) represents the activation value of the \(n\) - th sample in the \(i\) - th hidden unit, and \(\mu_{i}\) is the mean of the \(i\) - th hidden unit. The final DeCov loss is defined as: \[ L_{\text{DeCov}}=\frac{1}{2}(\|C\|_{F}^{2}-\|\text{diag}(C)\|_{2}^{2}) \] Here, \(\|\cdot\|_{F}\) represents the Frobenius norm, and \(\text{diag}(C)\) extracts the diagonal elements of matrix \(C\). ### Experimental Results The author verifies the effectiveness of DeCov through a series of experiments, including: - **MNIST Bimodal Experiment**: In a synthetic task, predict two adjacent handwritten digits simultaneously. The results show that DeCov can significantly reduce over - fitting and improve generalization performance. - **Image Classification Experiment**: Perform image classification tasks on datasets such as CIFAR10, CIFAR100, and ImageNet. Experiments show that DeCov not only improves the accuracy of the test set but also significantly reduces the gap between the training set and the validation set. ### Conclusion This paper proposes a new regularization method - DeCov loss, which prevents over - fitting by reducing the correlation between hidden - layer activations and improves the generalization ability of the model. Experimental results show that DeCov can effectively reduce over - fitting on multiple datasets and network architectures, and usually can maintain or improve the performance of the model.

Reducing Overfitting in Deep Networks by Decorrelating Representations

On Feature Decorrelation in Self-Supervised Learning

DCCD: Reducing Neural Network Redundancy Via Distillation

Wordreg: Mitigating the Gap Between Training and Inference with Worst-Case Drop Regularization

Reducing Overfitting in Deep Convolutional Neural Networks Using Redundancy Regularizer

Dropout Reduces Underfitting

Regularizing Deep Convolutional Neural Networks with a Structured Decorrelation Constraint.

Improving Deep Neural Network Sparsity Through Decorrelation Regularization

Shakeout: A New Approach to Regularized Deep Neural Network Training

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

Drop-Activation: Implicit Parameter Reduction and Harmonic Regularization

Overfitting Remedy by Sparsifying Regularization on Fully-Connected Layers of CNNs.

Subdomain contraction in deep networks for robust representation learning

Shakedrop Regularization for Deep Residual Learning

R-Drop: Regularized Dropout for Neural Networks.

Variance-Covariance Regularization Improves Representation Learning

Decorrelation-Based Deep Learning for Bias Mitigation

Efficient Deep Learning with Decorrelated Backpropagation

Dropout, a basic and effective regularization method for a deep learning model: a case study

Exploiting the Full Capacity of Deep Neural Networks while Avoiding Overfitting by Targeted Sparsity Regularization

Effective and Efficient Dropout for Deep Convolutional Neural Networks