SirLLM: Streaming Infinite Retentive LLM

Yao Yao,Zuchao Li,Hai Zhao
2024-05-21
Abstract:As Large Language Models (LLMs) become increasingly prevalent in various domains, their ability to process inputs of any length and maintain a degree of memory becomes essential. However, the one-off input of overly long texts is limited, as studies have shown that when input lengths exceed the LLMs' pre-trained text length, there is a dramatic decline in text generation capabilities. Moreover, simply extending the length of pre-training texts is impractical due to the difficulty in obtaining long text data and the substantial memory consumption costs this would entail for LLMs. Recent efforts have employed streaming inputs to alleviate the pressure of excessively long text inputs, but this approach can significantly impair the model's long-term memory capabilities.
Computation and Language
What problem does this paper attempt to address?
The paper primarily aims to address the memory capability issue of large language models (LLMs) when handling inputs of infinite length. Specifically, existing LLMs experience a significant decline in generative ability when processing inputs that exceed the length of the pre-training text. Moreover, simply extending the pre-training text length is not only challenging (as obtaining extremely long text data is very difficult) but also leads to enormous memory consumption. Therefore, researching how to enable LLMs to maintain memory capability while handling inputs of infinite length has become an urgent problem to solve. To address the aforementioned issues, the paper proposes the Stream Infinite Memory LLM (SirLLM). SirLLM introduces a Token Entropy metric and a memory decay mechanism to filter key phrases, allowing LLMs to maintain persistent and flexible memory in conversations of infinite length. Experimental results show that SirLLM significantly improves performance across different tasks and various LLMs, demonstrating its effectiveness.