Abstract:Dialogue Machine Reading Comprehension requires language models to effectively decouple and model multi-turn dialogue passages. As a dialogue development goes after the intentions of participants, its topic may not remain constant throughout the whole passage. Hence, it is non-trivial to detect and leverage the topic shift in dialogue modeling. Topic modeling, although has been widely studied in plain text, deserves far more utilization in dialogue reading comprehension. This paper proposes to model multi-turn dialogues from a topic-aware perspective. This paper starts with a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way. Then these fragments are used as topic-aware language processing units in further dialogue comprehension. On one hand, the split segments indict specific topics rather than mixed intentions, thus showing convenience on in-domain topic detection and location. For this task, this paper designs a clustering system with a self-training auto-encoder, and two constructed datasets are built for evaluation. On the other hand, the split segments are an appropriate element of multi-turn dialogue response selection. For this purpose, this paper further presents a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements and matches response candidates with a dual cross-attention. Empirical studies on three public benchmarks show great improvements over baselines. Our work continues the previous studies on document topic, and brings the dialogue modeling to a novel topic-aware perspective with exhaustive experiments and analyses.

Creating a Japanese Dialogue Corpus with Multi-level Topic Analysis

A Manually Annotated Chinese Corpus for Non-task-oriented Dialogue Systems

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling

Building a Dialogue Corpus Annotated with Expressed and Experienced Emotions

A Natural Language Corpus of Common Grounding under Continuous and Partially-Observable Context

JMultiWOZ: A Large-Scale Japanese Multi-Domain Task-Oriented Dialogue Dataset

Building Context-Related Dialogue Systems Based on Chinese-Script-Dialogue Corpus

Topic-switch adapted Japanese Dialogue System based on PLATO-2

Multi-turn dialogue comprehension from a topic-aware perspective

NaturalConv: A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation

XDailyDialog: A Multilingual Parallel Dialogue Corpus

Designing the Business Conversation Corpus

A large synchronous corpus as monitoring corpus: Some comparative content analysis of Chinese and Japanese language developments

STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent

Document-aligned Japanese-English Conversation Parallel Corpus

Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control

A corpus-based approach for cooperative response generation in a dialog system

Searching for Snippets of Open-Domain Dialogue in Task-Oriented Dialogue Datasets

Automatic Construction of Discourse Corpora for Dialogue Translation

Chinese Dialogue Analysis Using Multi-Task Learning Framework

JDDC 2.1: A Multimodal Chinese Dialogue Dataset with Joint Tasks of Query Rewriting, Response Generation, Discourse Parsing, and Summarization