Abstract:Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever and reader. Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness. Enhanced with memory-augmented adaptation training, LongMem can thus memorize long past context and use long-term memory for language modeling. The proposed memory retrieval module can handle unlimited-length context in its memory bank to benefit various downstream tasks. Typically, LongMem can enlarge the long-form memory to 65k tokens and thus cache many-shot extra demonstration examples as long-form memory for in-context learning. Experiments show that our method outperforms strong long-context models on ChapterBreak, a challenging long-context modeling benchmark, and achieves remarkable improvements on memory-augmented in-context learning over LLMs. The results demonstrate that the proposed method is effective in helping language models to memorize and utilize long-form contents. Our code is open-sourced at https://aka.ms/LongMem.

RECURRENT NEURAL NETWORK BASED LANGUAGE MODELING WITH CONTROLLABLE EXTERNAL MEMORY

Attention-based Memory Selection Recurrent Network for Language Modeling

Recurrent Memory Networks for Language Modeling

On extended long short-term memory and dependent bidirectional recurrent neural network

Augmenting Language Models with Long-Term Memory

Extending Memory for Language Modelling

MEMORYLLM: Towards Self-Updatable Large Language Models

NEWLSTM: an Optimized Long Short-Term Memory Language Model for Sequence Prediction.

Learning Efficient Lexically-Constrained Neural Machine Translation with External Memory

ELSTM: An improved long short‐term memory network language model for sequence learning

Nonrecurrent Neural Structure for Long-Term Dependence.

LaMemo: Language Modeling with Look-Ahead Memory

Generic Memory Modeling with Recurrent Neural Network

Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech.

Learning Longer Memory in Recurrent Neural Networks

RecallM: An Adaptable Memory Mechanism with Temporal Understanding for Large Language Models

Non-local Recurrent Neural Memory for Supervised Sequence Modeling

Needle in the Haystack for Memory Based Large Language Models

$\text{Memory}^3$: Language Modeling with Explicit Memory

Schrodinger's Memory: Large Language Models