Abstract:Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks. However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome. For example, a teacher might try to understand their student's current comprehension level to tailor their instruction accordingly, and a travel agent might ask questions of their customer to understand their preferences in order to recommend activities they might enjoy. LLMs trained with supervised fine-tuning or "single-step" RL, as with standard RLHF, might struggle which tasks that require such goal-directed behavior, since they are not trained to optimize for overall conversational outcomes after multiple turns of interaction. In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue. Our key insight is that, though LLMs might not effectively solve goal-directed dialogue tasks out of the box, they can provide useful data for solving such tasks by simulating suboptimal but human-like behaviors. Given a textual description of a goal-directed dialogue task, we leverage LLMs to sample diverse synthetic rollouts of hypothetical in-domain human-human interactions. Our algorithm then utilizes this dataset with offline reinforcement learning to train an interactive conversational agent that can optimize goal-directed objectives over multiple turns. In effect, the LLM produces examples of possible interactions, and RL then processes these examples to learn to perform more optimal interactions. Empirically, we show that our proposed approach achieves state-of-the-art performance in various goal-directed dialogue tasks that include teaching and preference elicitation.

What problem does this paper attempt to address?

This paper aims to address the problem of the insufficient ability of large - language models (LLMs) in goal - oriented conversations. Specifically, although existing LLMs perform well in many natural - language tasks, they do not perform well in tasks that require interaction with humans to achieve specific goals. For example, a teacher may need to understand the current level of students' understanding to adjust teaching methods, or a travel agent needs to ask about customers' preferences to recommend activities they may like. Such tasks require the conversational agent to be able to optimize the overall conversation result through multi - round interactions, rather than just generating accurate responses to a single query. The paper proposes a new method of using reinforcement learning (RL) to adapt LLMs to achieve such goal - oriented conversations. The core idea is that although LLMs themselves may not be able to effectively solve goal - oriented conversation tasks, they can provide useful data to solve these tasks, that is, to simulate sub - optimal but human - like behaviors. Given the text description of a goal - oriented conversation task, the authors use LLMs to generate diverse synthetic conversation samples based on hypothesized human - human interactions. Then, the algorithm uses this data set to train an interactive conversation agent through offline reinforcement learning so that it can optimize the goal - oriented goals in multi - round interactions. In short, the problem that this research attempts to solve is how to enable large - language models to participate more effectively in goal - oriented conversations, especially in cases where multi - round interactions are required to achieve specific goals. By combining the simulation ability of LLMs and the learning ability of RL, the method proposed in the paper aims to improve the performance of the conversation agent in completing complex conversation tasks.

Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

Dialogue Learning with Human-in-the-Loop.

Deep Reinforcement Learning for Dialogue Generation

Learning through Dialogue Interactions by Asking Questions

Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations

Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation

RL Zero: Zero-Shot Language to Behaviors without any Supervision

Goal Inference from Open-Ended Dialog

Synthetic Dialogue Dataset Generation using LLM Agents

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

Offline RL for Natural Language Generation with Implicit Language Q Learning

Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems

Words as Beacons: Guiding RL Agents with High-Level Language Prompts

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Decision-Oriented Dialogue for Human-AI Collaboration

Dialogue Shaping: Empowering Agents through NPC Interaction

Simulating User Agents for Embodied Conversational-AI

Learning to Generate Better Than Your LLM

End-to-End Optimization of Task-Oriented Dialogue Model with Deep Reinforcement Learning