Task structure and nonlinearity jointly determine learned representational geometry

Matteo Alleman,Jack W Lindsey,Stefano Fusi

2024-01-25

Abstract:The utility of a learned neural representation depends on how well its geometry supports performance in downstream tasks. This geometry depends on the structure of the inputs, the structure of the target outputs, and the architecture of the network. By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. This difference is consistently observed across a broad class of parameterized tasks in which we modulated the degree of alignment between the geometry of the task inputs and that of the task labels. We analyzed the learning dynamics in weight space and show how the differences between the networks with Tanh and ReLU nonlinearities arise from the asymmetric asymptotic behavior of ReLU, which leads feature neurons to specialize for different regions of input space. By contrast, feature neurons in Tanh networks tend to inherit the task label structure. Consequently, when the target outputs are low dimensional, Tanh networks generate neural representations that are more disentangled than those obtained with a ReLU nonlinearity. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.

Machine Learning

What problem does this paper attempt to address?

This paper investigates how the geometric structure of learned representations in neural networks is influenced by the input geometry, label geometry, and non-linear functions. The study focuses on the learning dynamics of single hidden layer networks and finds that in tasks, networks with Tanh activation function tend to learn representations that reflect the target output structure, while ReLU networks preserve more information from the original input. This difference is observed in a range of parameterized tasks that modulate the alignment between the input and the label. The authors quantify the geometric structure of representations by tracking metrics such as linear decoding capacity, kernel alignment, and parallelism score. They find that when the target output is low-dimensional, neural representations generated by Tanh networks are more separable than those obtained by ReLU nonlinearity. Furthermore, the feature neurons of ReLU networks tend to specialize in different regions of the input space, while the feature neurons of Tanh networks are more likely to inherit task label structures. The paper also analyzes the learning dynamics in weight space and reveals how the asymmetrical asymptotic behavior of ReLU nonlinearity leads to specialized feature neurons, while the feature neurons of Tanh networks remain consistent with the task label structure. These findings are important for understanding the interaction between input-output geometry, nonlinearity, and learning representations in neural networks, and can help optimize network design to improve generalization and transfer learning.

Task structure and nonlinearity jointly determine learned representational geometry

Neural population geometry and optimal coding of tasks with shared latent structure

Understanding Dynamics of Nonlinear Representation Learning and Its Application

Task structure tailors the geometry of neural representations in human lateral prefrontal cortex

The role of optimization geometry in single neuron learning

When Representations Align: Universality in Representation Learning Dynamics

Feature learning as alignment: a structural property of gradient descent in non-linear neural networks

Neural population geometry: An approach for understanding biological and artificial neural networks

Topological obstruction to the training of shallow ReLU neural networks

Geometry interaction network alignment

Low-dimensional Intrinsic Dimension Reveals a Phase Transition in Gradient-Based Learning of Deep Neural Networks

Investigating the Compositional Structure Of Deep Neural Networks

Symmetry Induces Structure and Constraint of Learning

Relational Constraints On Neural Networks Reproduce Human Biases towards Abstract Geometric Regularity

Signatures of task learning in neural representations

Randomly Weighted Neuromodulation in Neural Networks Facilitates Learning of Manifolds Common Across Tasks

How deep learning works --The geometry of deep learning

Low-Rank Learning by Design: the Role of Network Architecture and Activation Linearity in Gradient Rank Collapse

Graph Neural Networks Uncover Geometric Neural Representations in Reinforcement-Based Motor Learning

Effects of Nonlinearity and Network Architecture on the Performance of Supervised Neural Networks

Neural networks learn to magnify areas near decision boundaries