Abstract:Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on \emph{predictivity} -- how reliably a feature indicates training-set labels -- but also on \emph{availability} -- how easily the feature can be extracted from inputs. The literature on shortcut learning has noted examples in which models privilege one feature over another, for example texture over shape and image backgrounds over foreground objects. Here, we test hypotheses about which input properties are more available to a model, and systematically study how predictivity and availability interact to shape models' feature use. We construct a minimal, explicit generative framework for synthesizing classification datasets with two latent features that vary in predictivity and in factors we hypothesize to relate to availability, and we quantify a model's shortcut bias -- its over-reliance on the shortcut (more available, less predictive) feature at the expense of the core (less available, more predictive) feature. We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias. Our empirical findings are consistent with a theoretical account based on Neural Tangent Kernels. Finally, we study how models used in practice trade off predictivity and availability in naturalistic datasets, discovering availability manipulations which increase models' degree of shortcut bias. Taken together, these findings suggest that the propensity to learn shortcut features is a fundamental characteristic of deep nonlinear architectures warranting systematic study given its role in shaping how models solve tasks.

On a Sparse Shortcut Topology of Artificial Neural Networks

Analyze and Design Network Architectures by Recursion Formulas

Demystifying ResNet

Understand the Effectiveness of Shortcuts through the Lens of DCA

Be Persistent: Towards a Unified Solution for Mitigating Shortcuts in Deep Learning

On the Foundations of Shortcut Learning

Shortcut learning in deep neural networks

ResNet or DenseNet? Introducing Dense Shortcuts to ResNet

Generalization Properties of NAS under Activation and Skip Connection Search

A Deep Graph Neural Networks Architecture Design: From Global Pyramid-like Shrinkage Skeleton to Local Topology Link Rewiring

Deep, Skinny Neural Networks are not Universal Approximators

Universal structural patterns in sparse recurrent neural networks

Towards Understanding the Importance of Shortcut Connections in Residual Networks

Explore the Knowledge contained in Network Weights to Obtain Sparse Neural Networks

On the Expressive Power of Neural Networks

Sparsity-aware generalization theory for deep neural networks

Learning Connectivity of Neural Networks from a Topological Perspective

Generalization and Expressivity for Deep Nets

ClosNets: a Priori Sparse Topologies for Faster DNN Training

The Expressive Power of Neural Networks: A View from the Width