Abstract:Natural language understanding and dialogue policy learning are both essential in conversational systems that predict the next system actions in response to a current user utterance. Conventional approaches aggregate separate models of natural language understanding (NLU) and system action prediction (SAP) as a pipeline that is sensitive to noisy outputs of error-prone NLU. To address the issues, we propose an end-to-end deep recurrent neural network with limited contextual dialogue memory by jointly training NLU and SAP on DSTC4 multi-domain human-human dialogues. Experiments show that our proposed model significantly outperforms the state-of-the-art pipeline models for both NLU and SAP, which indicates that our joint model is capable of mitigating the affects of noisy NLU outputs, and NLU model can be refined by error flows backpropagating from the extra supervised signals of system actions.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the joint modeling of natural language understanding and dialogue management (NLU and DM) in dialogue systems. Specifically, traditional methods usually treat NLU and DM as independent modules, which leads to several major problems: 1. **Error Propagation**: In traditional pipeline models, the output of NLU is passed to DM. If the output of NLU is noisy or incorrect, these errors will be directly passed to DM, affecting the overall performance of the system. 2. **Complex Feature Engineering**: Traditional methods rely on manually - designed features, which are not only time - consuming but also difficult to optimize. 3. **Inability to Fully Utilize Supervision Signals**: NLU and DM are trained separately and cannot fully utilize the supervision signals between each other to improve the overall performance of the model. To solve these problems, the paper proposes an end - to - end deep recurrent neural network (RNN) with limited dialogue memory, and improves the performance of dialogue systems by jointly training NLU and DM. Specifically, this model aims to: - **Reduce the Influence of NLU Output Noise on DM**: Through joint training, the model can better handle the noisy output of NLU, thereby improving the accuracy of DM. - **Capture Richer Feature Representations**: Compared with traditional feature aggregation methods, the joint model can capture more complex feature representations, thereby improving the overall performance. - **Utilize Additional Supervision Signals**: Further optimize the NLU model by back - propagating the error gradient from system action prediction. In summary, the main goal of this paper is to improve the performance of natural language understanding and dialogue management in dialogue systems through an end - to - end joint training framework, especially in multi - domain human - machine dialogue scenarios.

End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager

Deep Reinforcement Learning for Dialogue Generation

Cascaded LSTMs based Deep Reinforcement Learning for Goal-driven Dialogue

DialogAct2Vec: Towards End-to-End Dialogue Agent by Multi-Task Representation Learning

Performance Improvement on Traditional Chinese Task-Oriented Dialogue Systems With Reinforcement Learning and Regularized Dropout Technique

Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History

Joint Dual Learning with Mutual Information Maximization for Natural Language Understanding and Generation in Dialogues

A Self-Attention Joint Model for Spoken Language Understanding in Situational Dialog Applications

Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning

Dual Dynamic Memory Network for End-to-End Multi-turn Task-oriented Dialog Systems.

Unsupervised Mutual Learning of Dialogue Discourse Parsing and Topic Segmentation

Learning Dialogue History for Spoken Language Understanding.

End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding

SUMBT+LaRL: Effective Multi-domain End-to-end Neural Task-oriented Dialog System

Memory-to-Sequence learning with LSTM joint decoding for task-oriented dialogue systems

A Network-based End-to-End Trainable Task-oriented Dialogue System

Joint Spoken Language Understanding And Domain Adaptive Language Modeling

Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog

MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU

Multijugate Dual Learning for Low-Resource Task-Oriented Dialogue System.