Efficient and Accurate Memorable Conversation Model using DPO based on sLLM

Youngkyung Seo,Yoonseok Heo,Jun-Seok Koh,Du-Seong Chang
2024-08-27
Abstract:In multi-session dialog system, it is essential to continuously update the memory as the session progresses. Simply accumulating memory can make it difficult to focus on the content of the conversation for inference due to the limited input sentence size. Therefore, efficient and accurate conversation model that is capable of managing memory to reflect the conversation history continuously is necessary. This paper presents a conversation model that efficiently manages memory as sessions progress and incorporates this into the model to reflect the conversation history accurately with 3 methodologies: SFT, DPO and DPO with SFT model. Our model using DPO algorithm shows an improvement about 0.0591 of BERTScore in memory accuracy, and the rate of responses reflecting the memory increased as well. Also, response generation performance enhanced about 4.292 in fluency, 3.935 in coherence, and 2.896 in consistency. This paper describes a training method that yields better performance than models with more than twice the parameter size, even when the model size is smaller. Thus, our model demonstrates efficiency not only in terms of accuracy but also in resource utilization.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issue of continuously updating memory in multi-turn dialogue systems, particularly in the context of small language models (sLLMs). Specifically, the paper focuses on the following points: 1. **Efficient Memory Management**: As the dialogue progresses, it is necessary to manage and update memory effectively. Simply accumulating memory can make the inference process difficult due to the limited length of input sentences. 2. **Accurate Reflection of Dialogue History**: An efficient and accurate dialogue model is proposed that can continuously reflect the dialogue history. Through three methods (SFT, DPO, and a combination of SFT and DPO), effective management and integration of memory are achieved. 3. **Resource Utilization Efficiency**: Despite the small scale of the model, significant improvements in memory accuracy can be achieved through specific training methods (such as the DPO algorithm), and enhancements in fluency, coherence, and consistency are also observed. In summary, the paper mainly addresses how to achieve an efficient multi-turn dialogue system using small language models under resource-constrained conditions, ensuring that the system can accurately reference past dialogue information.