Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search

Hideaki Joko,Shubham Chatterjee,Andrew Ramsay,Arjen P. de Vries,Jeff Dalton,Faegheh Hasibi
DOI: https://doi.org/10.1145/3626772.3657815
2024-05-06
Abstract:The future of conversational agents will provide users with personalized information responses. However, a significant challenge in developing models is the lack of large-scale dialogue datasets that span multiple sessions and reflect real-world user preferences. Previous approaches rely on experts in a wizard-of-oz setup that is difficult to scale, particularly for personalized tasks. Our method, LAPS, addresses this by using large language models (LLMs) to guide a single human worker in generating personalized dialogues. This method has proven to speed up the creation process and improve quality. LAPS can collect large-scale, human-written, multi-session, and multi-domain conversations, including extracting user preferences. When compared to existing datasets, LAPS-produced conversations are as natural and diverse as expert-created ones, which stays in contrast with fully synthetic methods. The collected dataset is suited to train preference extraction and personalized response generation. Our results show that responses generated explicitly using extracted preferences better match user's actual preferences, highlighting the value of using extracted preferences over simple dialogue history. Overall, LAPS introduces a new method to leverage LLMs to create realistic personalized conversational data more efficiently and effectively than previous methods.
Information Retrieval
What problem does this paper attempt to address?
The main problem this paper attempts to address is the challenge of developing personalized multi-session dialogue systems, particularly the lack of large-scale multi-session dialogue datasets that reflect real user preferences. Specifically: 1. **Lack of large-scale multi-session dialogue data**: Existing dialogue datasets are usually small in scale and mostly consist of single sessions, which cannot reflect the changes in user preferences across multiple sessions. 2. **Difficulty in generating high-quality dialogue data**: Traditional expert-generated methods are hard to scale, while fully synthetic methods produce dialogues that lack diversity and naturalness, failing to truly reflect user preferences. To address these issues, the paper proposes the LAPS (LLM-Augmented Personalized Self-Dialogue) method, which uses large language models (LLM) to assist human workers in generating personalized multi-session dialogue data. The LAPS method can: - **Improve data generation efficiency**: By using LLM to generate guiding information, it helps human workers generate high-quality dialogue data more quickly. - **Ensure dialogue diversity and naturalness**: Compared to fully synthetic methods, dialogues generated by LAPS are more natural and diverse. - **Extract and store user preferences**: After each dialogue session, user preferences are extracted from the dialogue and stored in a preference memory for use in subsequent sessions. Through these methods, LAPS can collect large-scale, multi-domain, multi-session dialogue data that includes real user preferences, thereby providing high-quality training data for future personalized dialogue systems.