Gaussian Process Based Deep Dyna-Q Approach for Dialogue Policy Learning.

Guanlin Wu,Wenqi Fang,Ji Wang,Jiang Cao,Weidong Bao,Yang Ping,Xiaomin Zhu,Zheng Wang
DOI: https://doi.org/10.18653/v1/2021.findings-acl.156
2021-01-01
Abstract:Applying reinforcement learning to dialogue policy learning requires prohibitively large rounds of human-machine interactions. To improve the learning performance, the Deep Dyna-Q framework with a world model that imitates real users is widely used in recent years. Unfortunately, how to build an effective world model and how to evaluate the experiences generated by the world model efficiently have not been well studied. In order to further improve the effectiveness and efficiency of dialogue policy learning, we present a novel Gaussian Process based Deep Dyna-Q approach in this paper. The Gaussian Process model, which is analytically tractable and fits for small-sample problems, is introduced to build the world model. In addition, we design a highly efficient Kullback-Leibler divergence based discriminator to evaluate the quality of experiences generated by the world model. Extensive experiments validate the effectiveness and robustness of our proposed approach. The task-completion success rate can be improved by about 20% with fewer human-machine interactions.
What problem does this paper attempt to address?