Abstract:The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Recent works demonstrated that deep nets trained with mean squared error (MSE) loss perform comparably to those trained with CE. As a preliminary, we empirically establish that NC emerges in such MSE-trained deep nets as well through experiments on three canonical networks and five benchmark datasets. We provide, in a Google Colab notebook, PyTorch code for reproducing MSE-NC and CE-NC: at <a class="link-external link-https" href="https://colab.research.google.com/github/neuralcollapse/neuralcollapse/blob/main/neuralcollapse.ipynb" rel="external noopener nofollow">this https URL</a>. The analytically-tractable MSE loss offers more mathematical opportunities than the hard-to-analyze CE loss, inspiring us to leverage MSE loss towards the theoretical investigation of NC. We develop three main contributions: (I) We show a new decomposition of the MSE loss into (A) terms directly interpretable through the lens of NC and which assume the last-layer classifier is exactly the least-squares classifier; and (B) a term capturing the deviation from this least-squares classifier. (II) We exhibit experiments on canonical datasets and networks demonstrating that term-(B) is negligible during training. This motivates us to introduce a new theoretical construct: the central path, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics. (III) By studying renormalized gradient flow along the central path, we derive exact dynamics that predict NC.

Leveraging Intermediate Neural Collapse with Simplex ETFs for Efficient Deep Neural Networks

Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?

Prevalence of Neural Collapse during the terminal phase of deep learning training

Beyond Unconstrained Features: Neural Collapse for Shallow Neural Networks with General Data

Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data

On the Role of Neural Collapse in Meta Learning Models for Few-shot Learning

An Unconstrained Layer-Peeled Perspective on Neural Collapse

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

Towards understanding neural collapse in supervised contrastive learning with the information bottleneck method

Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model

On the Robustness of Neural Collapse and the Neural Collapse of Robustness

The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic Perspective Through Unconstrained Features

Perturbation Analysis of Neural Collapse

Progressive Feedforward Collapse of ResNet Training

Understanding and Improving Transfer Learning of Deep Models via Neural Collapse

Inducing Neural Collapse to a Fixed Hierarchy-Aware Frame for Reducing Mistake Severity

Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model

Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class Incremental Learning

Learning Equi-angular Representations for Online Continual Learning

Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains

Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path