Learning End-to-End Goal-Oriented Dialog with Maximal User Task Success and Minimal Human Agent Use

Janarthanan Rajendran,Jatin Ganhotra,Lazaros Polymenakos
DOI: https://doi.org/10.1162/tacl_a_00274
2019-07-18
Abstract:Neural end-to-end goal-oriented dialog systems showed promise to reduce the workload of human agents for customer service, as well as reduce wait time for users. However, their inability to handle new user behavior at deployment has limited their usage in real world. In this work, we propose an end-to-end trainable method for neural goal-oriented dialog systems which handles new user behaviors at deployment by transferring the dialog to a human agent intelligently. The proposed method has three goals: 1) maximize user's task success by transferring to human agents, 2) minimize the load on the human agents by transferring to them only when it is essential and 3) learn online from the human agent's responses to reduce human agents load further. We evaluate our proposed method on a modified-bAbI dialog task that simulates the scenario of new user behaviors occurring at test time. Experimental results show that our proposed method is effective in achieving the desired goals.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in practical deployment, the neural end - to - end goal - oriented dialogue system is unable to handle new user behaviors. Specifically, when these systems encounter new user behaviors that did not occur during the training process, they may fail, resulting in the inability to complete the user's task. Such a failure will not only affect the current user's experience, but may also damage the enterprise's reputation and influence future users. Therefore, the paper proposes a method aimed at improving the robustness and practicality of the system through the following three goals: 1. **Maximize the success rate of user tasks**: By intelligently transferring the dialogue to a human agent in cases where the system may fail, ensure that the user's task can be completed smoothly. 2. **Minimize the use of human agents**: Transfer the dialogue to a human agent only when necessary to reduce dependence on human agents. 3. **Online learning from the responses of human agents**: By learning from the responses of human agents, gradually reduce the dependence on human agents in the future. To achieve these goals, the paper proposes an end - to - end trainable method that combines a neural dialogue model \(M\) and a neural classifier \(C\). The classifier \(C\) can determine whether it is necessary to transfer the dialogue to a human agent according to the current dialogue state, and after the transfer, the system will learn from the responses of the human agent so that it can handle similar situations independently in the future. In addition, the paper also designs a reward function to train the classifier \(C\) through reinforcement learning (RL), enabling it to find a balance between maximizing the success rate of user tasks and minimizing the use of human agents. The paper conducted experiments on the modified bAbI dialogue task to verify the effectiveness of the proposed method. The experimental results show that, compared with the baseline method, the proposed method significantly improves the success rate of user tasks while reducing dependence on human agents.