Abstract:We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (${\rm Softmax}_1$): it is an approximation of the memory retrieval process of $\mathrm{OutEffHop}$. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like $\mathtt{Clipped\_Softmax}$ and $\mathtt{Gated\_Attention}$. Notably, $\mathrm{OutEffHop}$ achieves an average reduction of 22+\% in average kurtosis and 26+\% in the maximum infinity norm of model outputs across four models. Code is available at \href{<a class="link-external link-https" href="https://github.com/MAGICS-LAB/OutEffHop" rel="external noopener nofollow">this https URL</a>}{GitHub}; models are on \href{<a class="link-external link-https" href="https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f" rel="external noopener nofollow">this https URL</a>}{Hugging Face Hub}; future updates are on \href{<a class="link-internal link-https" href="https://arxiv.org/abs/2404.03828">https://arxiv.org/abs/2404.03828</a>}{arXiv}.

Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models

Nonparametric Modern Hopfield Models

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models

Storage Capacity of the Hopfield Network Associative Memory

Provably Optimal Memory Capacity for Modern Hopfield Models: Transformer-Compatible Dense Associative Memories as Spherical Codes

On Sparse Modern Hopfield Model

Input-Driven Dynamics for Robust Memory Retrieval in Hopfield Networks

Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

Large Associative Memory Problem in Neurobiology and Machine Learning

Dynamic Capacity Estimation in Hopfield Networks

Efficient Continuous-Time Asymmetric Hopfield Networks for Memory Retrieval

A generalized Hopfield model to store and retrieve mismatched memory patterns

Modern Hopfield Networks meet Encoded Neural Representations -- Addressing Practical Considerations

Accelerating Hopfield Network Dynamics: Beyond Synchronous Updates and Forward Euler

Long Sequence Hopfield Memory

Dense Associative Memory Through the Lens of Random Features

Sparse and Structured Hopfield Networks

Improved Robustness and Hyperparameter Selection in the Dense Associative Memory