Abstract:As an intelligent way to interact with computers, the dialog system has been catching more and more attention. However, most research efforts only focus on text-based dialog systems, completely ignoring the rich semantics conveyed by the visual cues. Indeed, the desire for multimodal task-oriented dialog systems is growing with the rapid expansion of many domains, such as the online retailing and travel. Besides, few work considers the hierarchical product taxonomy and the users' attention to products explicitly. The fact is that users tend to express their attention to the semantic attributes of products such as color and style as the dialog goes on. Towards this end, in this work, we present a hierarchical User attention-guided Multimodal Dialog system, named UMD for short. UMD leverages a bidirectional Recurrent Neural Network to model the ongoing dialog between users and chatbots at a high level; As to the low level, the multimodal encoder and decoder are capable of encoding multimodal utterances and generating multimodal responses, respectively. The multimodal encoder learns the visual presentation of images with the help of a taxonomy-attribute combined tree, and then the visual features interact with textual features through an attention mechanism; whereas the multimodal decoder selects the required visual images and generates textual responses according to the dialog history. To evaluate our proposed model, we conduct extensive experiments on a public multimodal dialog dataset in the retailing domain. Experimental results demonstrate that our model outperforms the existing state-of-the-art methods by integrating the multimodal utterances and encoding the visual features based on the users' attribute-level attention.

Short-attention Mechanism for Generative Dialogue System

Hagan: Hierarchical Attentive Adversarial Learning For Task-Oriented Dialogue System

SAC: Accelerating and Structuring Self-Attention Via Sparse Adaptive Connection.

A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation

Get The Point of My Utterance! Learning Towards Effective Responses with Multi-Head Attention Mechanism

Entropy-Enhanced Multimodal Attention Model for Scene-Aware Dialogue Generation

Recurrent Attention Network with Reinforced Generator for Visual Dialog

Sequence Generation with Target Attention.

Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation.

Word Attention for Sequence to Sequence Text Understanding.

Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection

A non-hierarchical attention network with modality dropout for textual response generation in multimodal dialogue systems

An Emotional Dialogue System Using Conditional Generative Adversarial Networks with a Sequence-to-Sequence Transformer Encoder

Multimodal Dialogue Response Generation Based on Selective Attention and Gating Mechanisms

Multi-turn Dialogue Generation Using Self-attention and Nonnegative Matrix Factorization

User Attention-guided Multimodal Dialog Systems

Coherent Dialogue with Attention-based Language Models

Guiding Attention in Sequence-to-Sequence Models for Dialogue Act Prediction

Picking the Underused Heads: A Network Pruning Perspective of Attention Head Selection for Fusing Dialogue Coreference Information

Improve Retrieval-based Dialogue System via Syntax-Informed Attention

An Introductory Survey on Attention Mechanisms in NLP Problems