Abstract:We present a smoothly broken power law functional form (referred to by us as a Broken Neural Scaling Law (BNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures and for each of various tasks within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, out-of-distribution detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, fairness, molecules, computer programming/coding, math word problems, "emergent" "phase transitions / changes", arithmetic, unsupervised/self-supervised learning, & reinforcement learning (single agent & multi-agent). When compared to other functional forms for neural scaling behavior, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models & extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent & the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws

A Neural Scaling Law from Lottery Ticket Ensembling

Scaling Graph Neural Networks for Large-Scale Power Systems Analysis: Empirical Laws for Emergent Abilities

A Neural Scaling Law from the Dimension of the Data Manifold

A Solvable Model of Neural Scaling Laws

Neural Scaling Laws Rooted in the Data Distribution

Scaling Laws for Neural Language Models

Explaining Neural Scaling Laws

Information-Theoretic Foundations for Neural Scaling Laws

A Resource Model For Neural Scaling Law

A Dynamical Model of Neural Scaling Laws

Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments

On Power Laws in Deep Ensembles

A Simple Model of Inference Scaling Laws

Scaling Laws with Hidden Structure

Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study

Scaling Laws For Dense Retrieval

Neural Scaling Laws From Large-N Field Theory: Solvable Model Beyond the Ridgeless Limit

Scaling Laws in Linear Regression: Compute, Parameters, and Data

Broken Neural Scaling Laws

Neural scaling laws for an uncertain world

Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP