Abstract:Theoretical and empirical evidence highlights a positive correlation between the flatness of loss landscapes around minima and generalization. However, most current approaches that seek to find flat minima either incur high computational costs or struggle to balance generalization, training stability, and convergence. This work proposes reshaping the loss landscape to induce the optimizer toward flat regions, an approach that has negligible computational costs and does not compromise training stability, convergence, or efficiency. We focus on nonlinear, loss-dependent reshaping functions underpinned by theoretical insights to reshape the loss landscape. To design these functions, we first identify where and how these functions should be applied. With the aid of recently developed tools in stochastic optimization, theoretical analysis shows that steepening the low-loss landscape improves the rate of sharp minimum escape while flattening the high-and ultralow-loss landscapes enhances training stability and optimization performance, respectively. Simulations and experiments reveal that the subtly designed reshaping functions not only induce optimizers to find flat minima and improve generalization performance but also stabilize training, promote optimization, and keep efficiency. Our approach is evaluated on image classification, adversarial robustness, and natural language processing (NLP) tasks and achieves significant improvement in generalization performance with negligible computational cost. We believe that the new perspective introduced in this work will broadly impact the field of deep neural network training. The code is available at https://github.com/LongJin-lab/LLR.

Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks

Embedding Principle of Loss Landscape of Deep Neural Networks

On the Depth of Deep Neural Networks: A Theoretical View

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes.

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks

Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

The Shallow End: Empowering Shallower Deep-Convolutional Networks through Auxiliary Outputs

Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escaping, and Network Embedding

Going Deeper, Generalizing Better: an Information-Theoretic View for Deep Learning.

Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

Layer-Peeled Model: Toward Understanding Well-Trained Deep Neural Networks

Visualizing the Loss Landscape of Neural Nets

Depth Selection for Deep ReLU Nets in Feature Extraction and Generalization

Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization

A Mathematical Principle of Deep Learning: Learn the Geodesic Curve in the Wasserstein Space

Effect of Depth and Width on Local Minima in Deep Learning

Exploring the Geometry and Topology of Neural Network Loss Landscapes

Effective Rank and the Staircase Phenomenon: New Insights into Neural Network Training Dynamics

The Loss Surface of Deep Linear Networks Viewed Through the Algebraic Geometry Lens

Efficient Loss Landscape Reshaping for Convolutional Neural Networks