Task structure and nonlinearity jointly determine learned representational geometry

Matteo Alleman,Jack W Lindsey,Stefano Fusi
2024-01-25
Abstract:The utility of a learned neural representation depends on how well its geometry supports performance in downstream tasks. This geometry depends on the structure of the inputs, the structure of the target outputs, and the architecture of the network. By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. This difference is consistently observed across a broad class of parameterized tasks in which we modulated the degree of alignment between the geometry of the task inputs and that of the task labels. We analyzed the learning dynamics in weight space and show how the differences between the networks with Tanh and ReLU nonlinearities arise from the asymmetric asymptotic behavior of ReLU, which leads feature neurons to specialize for different regions of input space. By contrast, feature neurons in Tanh networks tend to inherit the task label structure. Consequently, when the target outputs are low dimensional, Tanh networks generate neural representations that are more disentangled than those obtained with a ReLU nonlinearity. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.
Machine Learning
What problem does this paper attempt to address?
This paper investigates how the geometric structure of learned representations in neural networks is influenced by the input geometry, label geometry, and non-linear functions. The study focuses on the learning dynamics of single hidden layer networks and finds that in tasks, networks with Tanh activation function tend to learn representations that reflect the target output structure, while ReLU networks preserve more information from the original input. This difference is observed in a range of parameterized tasks that modulate the alignment between the input and the label. The authors quantify the geometric structure of representations by tracking metrics such as linear decoding capacity, kernel alignment, and parallelism score. They find that when the target output is low-dimensional, neural representations generated by Tanh networks are more separable than those obtained by ReLU nonlinearity. Furthermore, the feature neurons of ReLU networks tend to specialize in different regions of the input space, while the feature neurons of Tanh networks are more likely to inherit task label structures. The paper also analyzes the learning dynamics in weight space and reveals how the asymmetrical asymptotic behavior of ReLU nonlinearity leads to specialized feature neurons, while the feature neurons of Tanh networks remain consistent with the task label structure. These findings are important for understanding the interaction between input-output geometry, nonlinearity, and learning representations in neural networks, and can help optimize network design to improve generalization and transfer learning.