Abstract:Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever and reader. Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness. Enhanced with memory-augmented adaptation training, LongMem can thus memorize long past context and use long-term memory for language modeling. The proposed memory retrieval module can handle unlimited-length context in its memory bank to benefit various downstream tasks. Typically, LongMem can enlarge the long-form memory to 65k tokens and thus cache many-shot extra demonstration examples as long-form memory for in-context learning. Experiments show that our method outperforms strong long-context models on ChapterBreak, a challenging long-context modeling benchmark, and achieves remarkable improvements on memory-augmented in-context learning over LLMs. The results demonstrate that the proposed method is effective in helping language models to memorize and utilize long-form contents. Our code is open-sourced at https://aka.ms/LongMem.

Learning to Remember Translation History with a Continuous Cache.

Improving Neural Language Models with a Continuous Cache

Neural Machine Translation with Monolingual Translation Memory

Memory-augmented Neural Machine Translation.

Neural Machine Translation with Contrastive Translation Memories

Graph Based Translation Memory for Neural Machine Translation

Learning Efficient Lexically-Constrained Neural Machine Translation with External Memory

Continual Learning for Neural Machine Translation

F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation

Neural Machine Translation with Key-Value Memory-Augmented Attention

CMT: A Memory Compression Method for Continual Knowledge Learning of Large Language Models

Cached Transformers: Improving Transformers with Differentiable Memory Cache

Self-generated Replay Memories for Continual Neural Machine Translation

Augmenting Language Models with Long-Term Memory

Pluggable Neural Machine Translation Models Via Memory-augmented Adapters

Memory-augmented Chinese-Uyghur neural machine translation

Predicting Anchored Text from Translation Memories for Machine Translation Using Deep Learning Methods

Cache Friendly Parallelization of Neural Encoder-Decoder Models Without Padding on Multi-core Architecture.

MemoNet: Memorizing All Cross Features' Representations Efficiently via Multi-Hash Codebook Network for CTR Prediction

Exploiting Cross-Sentence Context for Neural Machine Translation

Modeling Past and Future for Neural Machine Translation