Context Compression and Extraction: Efficiency Inference of Large Language Models
Junyao Zhou,Ruiqing Du,Yushan Tan,Jintao Yang,Zonghao Yang,Wei Luo,Zhunchen Luo,Xian Zhou,Wenpeng Hu
DOI: https://doi.org/10.1007/978-981-97-5663-6_19
2024-01-01
Abstract:Large language models have shown great capability in dealing with long contexts. However, when applied to question-and-answer response tasks, excessively long contexts unavoidably contain redundant information, which could potentially lead to a loss of significant details. Therefore it is a challenge to retain the information related to the user's query intent in long contexts. To address this problem, our study proposes a novel Context Compression and Extraction (CCE) technique, which takes the impact of the user query into account. CCE computes the mutual information between the query and its context, integrating this with self-information to preserve query-relevant information in the compressed context. We have validated our approach across diverse datasets that require integrated context processing capabilities, such as the arXiv paper dataset and news article dataset. Our methodology exhibits efficacy in various tasks, including summarization, question-answering, and the reconstruction of original contexts. Experimental results validate the superior performance of our method compared to a strong baseline across several evaluation metrics, significantly enhancing the quality of text generated in downstream tasks.