Abstract:Reinforcement learning (RL) has emerged as a key technique for designing dialogue policies. However, action space inflation in dialogue tasks has led to a heavy decision burden and incoherence problems for dialogue policies. In this paper, we propose a novel decomposed deep Q-network (D2Q) that exploits the natural structure of dialogue actions to perform decomposition on Q-function, realizing efficient and coherent dialogue policy learning. Instead of directly evaluating the Q-function, it consists of two separate estimators, one for the abstract action-value functions and the other for the specific action-value functions, both sharing a common feature layer. The abstract action-value function determines the speech act of the system action, while the specific action-value function focuses on the concrete action. This structure establishes a logical relationship between the user and the system on speech actions, avoiding the problem of incoherence. Moreover, the abstract action-value function shields unreasonable specific actions in the inflated action space, reducing the decision complexity. Our results show that the problem of incoherence is prevalent in existing approaches, which significantly impacts the efficiency and quality of dialogue policy learning. Our D2Q architecture alleviates this problem and performs significantly better than competitive baselines in both evaluated and human experiments. Further experiments validate the generality of our method. It can be easily extended to other RL-based dialogue policy approaches.

Gaussian Process Based Deep Dyna-Q Approach for Dialogue Policy Learning.

Deep Reinforcement Learning for Dialogue Generation

Replicating Complex Dialogue Policy of Humans Via Offline Imitation Learning with Supervised Regularization.

Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning.

Emotion-sensitive deep dyna-Q learning for task-completion dialogue policy learning

Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning

Learning Dialogue Policy Efficiently Through Dyna Proximal Policy Optimization.

Switch-Based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning.

Dynamic Reward-Based Dueling Deep Dyna-Q: Robust Policy Learning in Noisy Environments

Dialogue Learning with Human-in-the-Loop.

Automatic Curriculum Learning with Over-repetition Penalty for Dialogue Policy Learning

Hyper-parameter Optimisation of Gaussian Process Reinforcement Learning for Statistical Dialogue Management.

Efficient Dialogue Complementary Policy Learning Via Deep Q-network Policy and Episodic Memory Policy.

Natural Language Understanding Discriminator System Action ( Policy ) Semantic Frame State Representation Real Experience Dialogue State Tracking Dialogue Policy Learning Natural Language Generation Simulated Experience

On-line policy optimisation of spoken dialogue systems via live interaction with human subjects

Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning.

Gaussian processes for fast policy optimisation of POMDP-based dialogue managers

Investigating deep reinforcement learning techniques in personalized dialogue generation

Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning

Budgeted Policy Learning for Task-Oriented Dialogue Systems

Decomposed Deep Q-Network for Coherent Task-Oriented Dialogue Policy Learning