Mixed-Session Conversation with Egocentric Memory

Jihyoung Jang,Taeyoung Kim,Hyounghun Kim
2024-10-03
Abstract:Recently introduced dialogue systems have demonstrated high usability. However, they still fall short of reflecting real-world conversation scenarios. Current dialogue systems exhibit an inability to replicate the dynamic, continuous, long-term interactions involving multiple partners. This shortfall arises because there have been limited efforts to account for both aspects of real-world dialogues: deeply layered interactions over the long-term dialogue and widely expanded conversation networks involving multiple participants. As the effort to incorporate these aspects combined, we introduce Mixed-Session Conversation, a dialogue system designed to construct conversations with various partners in a multi-session dialogue setup. We propose a new dataset called MiSC to implement this system. The dialogue episodes of MiSC consist of 6 consecutive sessions, with four speakers (one main speaker and three partners) appearing in each episode. Also, we propose a new dialogue model with a novel memory management mechanism, called Egocentric Memory Enhanced Mixed-Session Conversation Agent (EMMA). EMMA collects and retains memories from the main speaker's perspective during conversations with partners, enabling seamless continuity in subsequent interactions. Extensive human evaluations validate that the dialogues in MiSC demonstrate a seamless conversational flow, even when conversation partners change in each session. EMMA trained with MiSC is also evaluated to maintain high memorability without contradiction throughout the entire conversation.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the shortcomings of current dialogue systems in simulating real-world conversation scenarios. Specifically, existing dialogue systems struggle to achieve dynamic, continuous, long-term interactions involving multiple conversation partners. These limitations are mainly reflected in two aspects: 1. **Long-term Dialogue**: Existing dialogue systems typically fail to maintain long-term dialogue context, especially across multiple sessions. 2. **Multi-participant Dialogue**: Existing dialogue systems are usually limited to two-person conversations and lack the ability to handle complex dialogue networks involving multiple participants. To overcome these limitations, the paper introduces a new dialogue system—**Mixed-Session Conversation**. This system is designed to engage in conversations with different partners in a multi-session setting, thereby constructing a system that better aligns with real-world dialogue scenarios. To this end, the paper proposes a new dataset—**MISC**, and a new dialogue model—**Egocentric Memory Enhanced Mixed-Session Conversation Agent (EMMA)**. ### Key Points of the Solution 1. **MISC Dataset**: - Contains 8.5K dialogue episodes, each episode includes 6 sessions, totaling 51K sessions. - Each episode involves 4 speakers, with one being the main speaker and the other three being conversation partners. - Dialogues are generated through carefully designed methods to ensure coherence and consistency between sessions. 2. **Egocentric Memory**: - Collects and retains memories of each conversation partner from the main speaker's perspective. - Ensures consistency and non-contradiction of memories by linking memories across different sessions. - Generates and updates memories at the end of each session to ensure seamless transitions in subsequent sessions. 3. **EMMA Dialogue Model**: - Based on the FLAN-T5 model, capable of generating dialogues and managing memories. - Ensures natural and smooth dialogues and efficient memory utilization through the collaboration of dialogue and retrieval modules. ### Experimental Results Through extensive user evaluations, the paper validates the high-quality performance of MISC and EMMA: - **MISC**: - Scores 4.83 and 4.90 in consistency and coherence, respectively. - Egocentric Memory achieves a pass rate of over 96% in memory summarization, linking, and tagging. - **EMMA**: - Scores 4.62, 4.70, and 4.66 in humanness, engagingness, and memorability, respectively. - Maintains high dialogue quality and memory accuracy even with changing conversation partners. Overall, by introducing MISC and EMMA, the paper successfully addresses the limitations of existing dialogue systems in long-term and multi-participant dialogues, providing a new solution for constructing systems that better align with real-world dialogue scenarios.