Abstract:Transformers have revolutionized machine learning models of language and vision, but their connection with neuroscience remains tenuous. Built from attention layers, they require a mass comparison of queries and keys that is difficult to perform using traditional neural circuits. Here, we show that neurons can implement attention-like computations using short-term, Hebbian synaptic potentiation. We call our mechanism the match-and-control principle and it proposes that when activity in an axon is synchronous, or matched, with the somatic activity of a neuron that it synapses onto, the synapse can be briefly strongly potentiated, allowing the axon to take over, or control, the activity of the downstream neuron for a short time. In our scheme, the keys and queries are represented as spike trains and comparisons between the two are performed in individual spines allowing for hundreds of key comparisons per query and roughly as many keys and queries as there are neurons in the network. Many of the most impressive recent advances in machine learning, from generating images from text to human-like chatbots, are based on a neural network architecture known as the transformer. Transformers are built from so-called attention layers which perform large numbers of comparisons between the vector outputs of the previous layers, allowing information to flow through the network in a more dynamic way than previous designs. This large number of comparisons is computationally expensive and has no known analogue in the brain. Here, we show that a variation on a learning mechanism familiar in neuroscience, Hebbian learning, can implement a transformer-like attention computation if the synaptic weight changes are large and rapidly induced. We call our method the match-and-control principle and it proposes that when presynaptic and postsynaptic spike trains match up, small groups of synapses can be transiently potentiated allowing a few presynaptic axons to control the activity of a neuron. To demonstrate the principle, we build a model of a pyramidal neuron and use it to illustrate the power and limitations of the idea.

Self-attention as an attractor network: transient memories without backpropagation

Dynamical Mean-Field Theory of Self-Attention Neural Networks

Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers

Gated recurrent neural networks discover attention

Memory Dynamics in Attractor Networks with Saliency Weights.

Dynamic metastability in the self-attention model

Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention

Easy attention: A simple attention mechanism for temporal predictions with transformers

Mapping of attention mechanisms to a generalized Potts model

Short-term Hebbian learning can implement transformer-like attention

The Attention Mechanism Demystiûed

Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth

Mechanics of Next Token Prediction with Self-Attention

A Primal-Dual Framework for Transformers and Neural Networks

Attention as an RNN

Augmenting Self-attention with Persistent Memory

Dynamics of feed forward induced interference training

MLP Can Be A Good Transformer Learner

A Gaussian attractor network for memory and recognition with experience-dependent learning.

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks