Abstract:Multi-domain dialogue state tracking (MDST) is a crucial component of task-oriented dialogue systems. In the context of multi-turn dialogues between the user and the system, MDST necessitates the continuous keeping track of dialogue states based on the information present in the current dialogue utterance and the dialogue states from the preceding turn. Recent work achieves the successful execution of multi-domain dialogue tasks by adopting an approach that treats each state as an individual label, while regrettably neglecting the potential benefits of incorporating domain-specific information associated with these states. Simultaneous, existing models exhibit a deficiency in effectively modeling the explicit correlations between dialogue contextual semantics and dialogue states. In this paper, we introduce the module of multi-domain gate and interactive dual attention as novel solutions to address the aforementioned concerns. For the efficient exploitation of domain-specific information within states, we leverage the multi-domain gate as indices to amplify the states pertinent to the current utterance domain while filtering out irrelevant states. Interactive dual attention comprises utterance attention and slot attention, effectively modeling the correlation between dialogue utterances and slots. Additionally, interactive dual attention ensures that each dialogue utterance interacts with the slots once to derive all state updates, thereby ensuring computational efficiency. Specifically, slot attention models the associations between slots by incorporating semantic features to forecast updates in slot values. Meanwhile, utterance attention captures the semantics of dialogue context and integrates it with slot name features to generate dialogue states. All the aforementioned modules are designed based on a slot-independent framework, enabling efficient scalability of slots and circumventing issues related to model input limitations. The experimental results on the multi-domain dialogues dataset MultiWOZ 2.4 demonstrate the superior performance of our model compared to the baselines. Additionally, we conduct a comprehensive analysis of the effectiveness of the multi-domain gate and interactive dual attention modules, elucidating their contribution to the performance of the model through visualization and case studies.

Multimodal Dialogue Understanding via Holistic Modeling and Sequence Labeling.

Hierarchical and Bidirectional Joint Multi-Task Classifiers for Natural Language Understanding

Overview of the NLPCC 2023 Shared Task 10: Learn to Watch TV: Multimodal Dialogue Understanding and Response Generation.

Multimodal Analysis for Deep Video Understanding with Video Language Transformer

UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model

Multi-Level Multimodal Transformer Network for Multimodal Recipe Comprehension

FC-MTLF: A Fine- and Coarse-grained Multi-Task Learning Framework for Cross-Lingual Spoken Language Understanding.

Chinese Dialogue Analysis Using Multi-Task Learning Framework

Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken Conversations

Human–Machine Multi-Turn Language Dialogue Interaction Based on Deep Learning

Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

Unsupervised Mutual Learning of Dialogue Discourse Parsing and Topic Segmentation

UniTranSeR: A Unified Transformer Semantic Representation Framework for Multimodal Task-Oriented Dialog System

Bridging Text and Video: A Universal Multimodal Transformer for Audio-Visual Scene-Aware Dialog

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Overview of the NLPCC 2022 Shared Task: Multi-modal Dialogue Understanding and Generation

Hierarchical Context Enhanced Multi-Domain Dialogue System for Multi-domain Task Completion

Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue

Multi-domain gate and interactive dual attention for multi-domain dialogue state tracking

Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis

Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention