Abstract:Recent progress on large language models (LLMs) has enabled dialogue agents to generate highly naturalistic and plausible text. However, current LLM language generation focuses on responding accurately to questions and requests with a single effective response. In reality, many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion. Accounting for how an agent can effectively steer a conversation is a crucial ability in many dialogue tasks, from healthcare to preference elicitation. Existing methods for fine-tuning dialogue agents to accomplish such tasks would rely on curating some amount of expert data. However, doing so often requires understanding the underlying cognitive processes of the conversational partner, which is a skill neither humans nor LLMs trained on human data can reliably do. Our key insight is that while LLMs may not be adept at identifying effective strategies for steering conversations a priori, or in the middle of an ongoing conversation, they can do so post-hoc, or in hindsight, after seeing how their conversational partner responds. We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations. We apply our approach to two domains that require understanding human mental state, intelligent interaction, and persuasion: mental health support, and soliciting charitable donations. Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.

Optimizing human-interpretable dialog management policy using Genetic Algorithm

Replicating Complex Dialogue Policy of Humans Via Offline Imitation Learning with Supervised Regularization.

Deep Reinforcement Learning for Dialogue Generation

Learning Dialogue Policy Efficiently Through Dyna Proximal Policy Optimization.

What does the User Want? Information Gain for Hierarchical Dialogue Policy Optimisation

Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach

Multitask Learning and Reinforcement Learning for Personalized Dialog Generation: an Empirical Study.

Generative Dialog Policy for Task-oriented Dialog Systems

Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog

Policy Optimization by Genetic Distillation

Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System

Investigating deep reinforcement learning techniques in personalized dialogue generation

Improving Proactive Dialog Agents Using Socially-Aware Reinforcement Learning

Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion Dialogues via Reinforcement Learning and Human Demonstration

Improving the Probabilistic Framework for Representing Dialogue Systems with User Response Model

Goal-Oriented Dialogue Policy Learning from Failures

Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents.

Semi-Supervised Dialogue Policy Learning Via Stochastic Reward Estimation

Synthetic Dialogue Dataset Generation using LLM Agents

Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation

Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations