Abstract:Theoretical and empirical evidence highlights a positive correlation between the flatness of loss landscapes around minima and generalization. However, most current approaches that seek to find flat minima either incur high computational costs or struggle to balance generalization, training stability, and convergence. This work proposes reshaping the loss landscape to induce the optimizer toward flat regions, an approach that has negligible computational costs and does not compromise training stability, convergence, or efficiency. We focus on nonlinear, loss-dependent reshaping functions underpinned by theoretical insights to reshape the loss landscape. To design these functions, we first identify where and how these functions should be applied. With the aid of recently developed tools in stochastic optimization, theoretical analysis shows that steepening the low-loss landscape improves the rate of sharp minimum escape while flattening the high-and ultralow-loss landscapes enhances training stability and optimization performance, respectively. Simulations and experiments reveal that the subtly designed reshaping functions not only induce optimizers to find flat minima and improve generalization performance but also stabilize training, promote optimization, and keep efficiency. Our approach is evaluated on image classification, adversarial robustness, and natural language processing (NLP) tasks and achieves significant improvement in generalization performance with negligible computational cost. We believe that the new perspective introduced in this work will broadly impact the field of deep neural network training. The code is available at https://github.com/LongJin-lab/LLR.

The Loss Surface of Deep Linear Networks Viewed Through the Algebraic Geometry Lens

The loss landscape of deep linear neural networks: a second-order analysis

Exploring the Geometry and Topology of Neural Network Loss Landscapes

Emergent properties of the local geometry of neural loss landscapes

Exact Solutions of a Deep Linear Network

Visualizing the Loss Landscape of Neural Nets

Efficient Loss Landscape Reshaping for Convolutional Neural Networks

Exploring the loss landscape of regularized neural networks via convex duality

Asymmetric Valleys: Beyond Sharp and Flat Local Minima.

Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes

A topological description of loss surfaces based on Betti Numbers

Geometry and Local Recovery of Global Minima of Two-layer Neural Networks at Overparameterization

Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escaping, and Network Embedding

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes.

On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

On the curvature of the loss landscape

On the Landscape of Sparse Linear Networks

A simple connection from loss flatness to compressed representations in neural networks

Avoiding Spurious Local Minima in Deep Quadratic Networks

Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks