Abstract:We introduce a technique for multi-document grounded multi-turn synthetic dialog generation that incorporates three main ideas. First, we control the overall dialog flow using taxonomy-driven user queries that are generated with Chain-of-Thought (CoT) prompting. Second, we support the generation of multi-document grounded dialogs by mimicking real-world use of retrievers to update the grounding documents after every user-turn in the dialog. Third, we apply LLM-as-a-Judge to filter out queries with incorrect answers. Human evaluation of the synthetic dialog data suggests that the data is diverse, coherent, and includes mostly correct answers. Both human and automatic evaluations of answerable queries indicate that models fine-tuned on synthetic dialogs consistently out-perform those fine-tuned on existing human generated training data across four publicly available multi-turn document grounded benchmark test sets.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to generate high - quality multi - round dialogue data based on multiple documents in order to improve content - supported dialogue systems. Specifically, the authors aim to generate diverse, coherent, and correctly - answered synthetic dialogue data by simulating the deployment of Retriever - Augmented Generation (RAG) in the real world. These synthetic data can be used to train and evaluate dialogue models, thus outperforming the existing human - generated data on several public benchmark test sets. ### Main Challenges 1. **Diversity**: Ensure that the generated questions and dialogues are diverse enough. 2. **Coherence**: Ensure that the dialogue is natural and fluent, and that subsequent questions are not just a simple collection of question - answer pairs. 3. **Faithfulness**: Ensure that the model's answers are faithful to the content of the retrieved documents, rather than relying solely on the content generated by the model parameters. ### Solutions To address these challenges, the authors propose the following techniques: 1. **Dialogue Flow Control**: Control the dialogue flow through classification - based user query generation and Chain - of - Thought (CoT) prompts. 2. **Multi - Document Support**: Imitate the way real - world retrievers are used and update the base documents after each user question. 3. **LLM - as - a - Judge**: Apply large - language models as judges to filter out queries with incorrect answers. ### Technical Details - **Question Classification**: Two question classification methods are designed, which are used for the first round of the dialogue (ST - QT) and subsequent rounds (MT - QT) respectively, including types such as direct questions, comparison questions, aggregation questions, and unanswerable questions. - **CoT Prompts**: Use CoT prompts to generate queries that conform to predefined question types and ensure that the generated answers are consistent with the documents. - **Dialogue Generation Pipeline**: There are two modes, single - document and multi - document. The former generates dialogues based on a single document, while the latter dynamically selects relevant paragraphs in combination with the retriever. - **LLM - as - a - Judge**: Evaluate each generated dialogue context - answer pair through LLM to ensure answer correctness. ### Experimental Results By conducting experiments on two instruction - tuned models (MERLINITE - 7B and LLAMA - 2 - 13B - CHAT), the authors show that the models trained on synthetic data perform better than those trained on existing human - generated data on four public multi - round dialogue benchmark datasets (CoQA, MultiDoc2Dial, QuAC, and OR - QuAC). Especially for multi - document - supported tasks such as OR - QuAC, the effect of synthetic data is particularly significant. ### Summary The main contributions of this paper are: 1. Propose the first multi - document - supported multi - round dialogue generation pipeline that simulates the real - world RAG deployment. 2. Ensure data diversity by classifying questions instead of relying solely on language models to generate queries. 3. Verify the quality of the generated dialogues through human evaluation. 4. Demonstrate the effectiveness of synthetic data on answerable queries. 5. Will release all code and synthetic data. Through these methods, the authors have successfully addressed the challenges of generating high - quality, multi - document - based multi - round dialogue data and provided a new approach for the improvement of dialogue systems.

Multi-Document Grounded Multi-Turn Synthetic Dialog Generation

A Synthetic Data Generation Framework for Grounded Dialogues.

Read Then Respond: Multi-granularity Grounding Prediction for Knowledge-Grounded Dialogue Generation.

Coarse-to-Fine Knowledge Selection for Document Grounded Dialogs.

Synthetic Dialogue Dataset Generation using LLM Agents

Self-Directed Synthetic Dialogues and Revisions Technical Report

Enhancing Document Information Selection Through Multi-Granularity Responses for Dialogue Generation

Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues

Grounding is All You Need? Dual Temporal Grounding for Video Dialog

Open-domain Dialogue Generation Grounded with Dynamic Multi-form Knowledge Fusion

Multi-Domain Dialogue Acts and Response Co-Generation

Saliency infused dialogue response generation: Improving task oriented text generation using feature attribution

Policy-driven Knowledge Selection and Response Generation for Document-grounded Dialogue

There Is No Standard Answer: Knowledge-Grounded Dialogue Generation with Adversarial Activated Multi-Reference Learning

Building Knowledge-Grounded Dialogue Systems with Graph-Based Semantic Modeling

Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation

Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models

Grounding Description-Driven Dialogue State Trackers with Knowledge-Seeking Turns

DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations

Structured Chain-of-Thought Prompting for Few-Shot Generation of Content-Grounded QA Conversations

q2d: Turning Questions into Dialogs to Teach Models How to Search