Abstract:Recent progress on large language models (LLMs) has enabled dialogue agents to generate highly naturalistic and plausible text. However, current LLM language generation focuses on responding accurately to questions and requests with a single effective response. In reality, many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion. Accounting for how an agent can effectively steer a conversation is a crucial ability in many dialogue tasks, from healthcare to preference elicitation. Existing methods for fine-tuning dialogue agents to accomplish such tasks would rely on curating some amount of expert data. However, doing so often requires understanding the underlying cognitive processes of the conversational partner, which is a skill neither humans nor LLMs trained on human data can reliably do. Our key insight is that while LLMs may not be adept at identifying effective strategies for steering conversations a priori, or in the middle of an ongoing conversation, they can do so post-hoc, or in hindsight, after seeing how their conversational partner responds. We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations. We apply our approach to two domains that require understanding human mental state, intelligent interaction, and persuasion: mental health support, and soliciting charitable donations. Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.

Investigating deep reinforcement learning techniques in personalized dialogue generation

Deep Reinforcement Learning for Dialogue Generation

Multitask Learning and Reinforcement Learning for Personalized Dialog Generation: an Empirical Study.

"In Dialogues We Learn": Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning

Less is More: Learning to Refine Dialogue History for Personalized Dialogue Generation

Reinforcement Learning for Personalized Dialogue Management

Personalized Dialogue Response Generation Learned from Monologues

Interactive Narrative Personalization with Deep Reinforcement Learning

A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data

Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations

Learning Personalized End-to-End Goal-Oriented Dialog.

Exploring implicit persona knowledge for personalized dialogue generation

Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue

DLVGen: A Dual Latent Variable Approach to Personalized Dialogue Generation

Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations

Personalized Response Generation via Domain adaptation

Deep RL with Hierarchical Action Exploration for Dialogue Generation

Learning to Know Myself: A Coarse-to-Fine Persona-Aware Training Framework for Personalized Dialogue Generation

Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion Dialogues via Reinforcement Learning and Human Demonstration

Fine Grained Knowledge Transfer for Personalized Task-oriented Dialogue Systems

Cascaded LSTMs based Deep Reinforcement Learning for Goal-driven Dialogue