End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager

Xuesong Yang,Yun-Nung Chen,Dilek Hakkani-Tur,Paul Crook,Xiujun Li,Jianfeng Gao,Li Deng
DOI: https://doi.org/10.48550/arXiv.1612.00913
2017-01-04
Abstract:Natural language understanding and dialogue policy learning are both essential in conversational systems that predict the next system actions in response to a current user utterance. Conventional approaches aggregate separate models of natural language understanding (NLU) and system action prediction (SAP) as a pipeline that is sensitive to noisy outputs of error-prone NLU. To address the issues, we propose an end-to-end deep recurrent neural network with limited contextual dialogue memory by jointly training NLU and SAP on DSTC4 multi-domain human-human dialogues. Experiments show that our proposed model significantly outperforms the state-of-the-art pipeline models for both NLU and SAP, which indicates that our joint model is capable of mitigating the affects of noisy NLU outputs, and NLU model can be refined by error flows backpropagating from the extra supervised signals of system actions.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the joint modeling of natural language understanding and dialogue management (NLU and DM) in dialogue systems. Specifically, traditional methods usually treat NLU and DM as independent modules, which leads to several major problems: 1. **Error Propagation**: In traditional pipeline models, the output of NLU is passed to DM. If the output of NLU is noisy or incorrect, these errors will be directly passed to DM, affecting the overall performance of the system. 2. **Complex Feature Engineering**: Traditional methods rely on manually - designed features, which are not only time - consuming but also difficult to optimize. 3. **Inability to Fully Utilize Supervision Signals**: NLU and DM are trained separately and cannot fully utilize the supervision signals between each other to improve the overall performance of the model. To solve these problems, the paper proposes an end - to - end deep recurrent neural network (RNN) with limited dialogue memory, and improves the performance of dialogue systems by jointly training NLU and DM. Specifically, this model aims to: - **Reduce the Influence of NLU Output Noise on DM**: Through joint training, the model can better handle the noisy output of NLU, thereby improving the accuracy of DM. - **Capture Richer Feature Representations**: Compared with traditional feature aggregation methods, the joint model can capture more complex feature representations, thereby improving the overall performance. - **Utilize Additional Supervision Signals**: Further optimize the NLU model by back - propagating the error gradient from system action prediction. In summary, the main goal of this paper is to improve the performance of natural language understanding and dialogue management in dialogue systems through an end - to - end joint training framework, especially in multi - domain human - machine dialogue scenarios.