Abstract:Identifying user intents in information-seeking dialogs is crucial for a system to meet user's information needs. Intent prediction (IP) is challenging and demands sufficient dialogs with human-labeled intents for training. However, manually annotating intents is resource-intensive. While large language models (LLMs) have been shown to be effective in generating synthetic data, there is no study on using LLMs to generate intent-aware information-seeking dialogs. In this paper, we focus on leveraging LLMs for zero-shot generation of large-scale, open-domain, and intent-aware information-seeking dialogs. We propose SOLID, which has novel self-seeding and multi-intent self-instructing schemes. The former improves the generation quality by using the LLM's own knowledge scope to initiate dialog generation; the latter prompts the LLM to generate utterances sequentially, and mitigates the need for manual prompt design by asking the LLM to autonomously adapt its prompt instruction when generating complex multi-intent utterances. Furthermore, we propose SOLID-RL, which is further trained to generate a dialog in one step on the data generated by SOLID. We propose a length-based quality estimation mechanism to assign varying weights to SOLID-generated dialogs based on their quality during the training process of SOLID-RL. We use SOLID and SOLID-RL to generate more than 300k intent-aware dialogs, surpassing the size of existing datasets. Experiments show that IP methods trained on dialogs generated by SOLID and SOLID-RL achieve better IP quality than ones trained on human-generated dialogs.

What problem does this paper attempt to address?

The paper aims to address the problem of user intent prediction (IP) in information retrieval dialogues. Specifically, the paper focuses on the following points: 1. **Challenges of Intent Prediction**: Identifying user intent in information retrieval dialogues is crucial for meeting users' information needs. However, the task of intent prediction is challenging and requires a large amount of dialogue data with manually annotated intents for training. Manual intent annotation is very resource-intensive, leading to a limited amount of data available for intent prediction training. 2. **Utilizing Large Language Models to Generate Intent-Aware Dialogues**: Although large language models (LLMs) excel in generating synthetic data, there is currently no research exploring how to use these models to generate intent-aware information retrieval dialogues. Therefore, the paper proposes a new method called SOLID (Self-seeding and Multi-intent Self-instructing LLMs for Generating Large-scale Intent-aware Information-Seeking Dialogues) for zero-shot generation of large-scale, open-domain, intent-aware information retrieval dialogues. 3. **Self-Guiding Mechanism**: SOLID improves the quality of dialogue generation through a self-seeding mechanism and a multi-intent self-instructing scheme. The former uses the LLM's own knowledge scope to generate dialogue seeds, while the latter generates complex multi-intent expressions by autonomously adjusting prompt instructions through the LLM. 4. **Reinforcement Learning to Improve Efficiency**: To further enhance generation efficiency, the paper also proposes SOLID-RL, a model trained to generate the entire dialogue in one step, and introduces a Length-based Quality Evaluation mechanism (LMQ) to optimize data quality during the training process. In summary, the main goal of this paper is to develop an efficient and high-quality method for generating intent-aware information retrieval dialogues, thereby overcoming the issues of small dataset sizes and high manual annotation costs, and ultimately improving the effectiveness of intent prediction methods.

Self-seeding and Multi-intent Self-instructing LLMs for Generating Intent-aware Information-Seeking dialogs

Are Human-generated Demonstrations Necessary for In-context Learning?

Large Language Models for Intent-Driven Session Recommendations

LUCID: LLM-Generated Utterances for Complex and Interesting Dialogues

IntentGPT: Few-shot Intent Discovery with Large Language Models

A New Dialogue Response Generation Agent for Large Language Models by Asking Questions to Detect User's Intentions

Intention and Context Elicitation with Large Language Models in the Legal Aid Intake Process

Intent Detection in the Age of LLMs

Dial-In LLM: Human-Aligned Dialogue Intent Clustering with LLM-in-the-loop

ILLUMINER: Instruction-tuned Large Language Models as Few-shot Intent Classifier and Slot Filler

UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity

Efficient Data Generation for Source-grounded Information-seeking Dialogs: A Use Case for Meeting Transcripts

Balancing Accuracy and Efficiency in Multi-Turn Intent Classification for LLM-Powered Dialog Systems in Production

Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

Interpreting User Requests in the Context of Natural Language Standing Instructions

Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent Detection

Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification

Exploring In-Context Learning for Knowledge Grounded Dialog Generation

Synthetic Dialogue Dataset Generation using LLM Agents

Leveraging LLMs for Dialogue Quality Measurement

Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations