DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

Wentse Chen,Shiyu Huang,Yuan Chiang,Tim Pearce,Wei-Wei Tu,Ting Chen,Jun Zhu
DOI: https://doi.org/10.1609/aaai.v38i10.29019
2024-01-01
Abstract:Most reinforcement learning algorithms seek a single optimal strategy thatsolves a given task. However, it can often be valuable to learn a diverse setof solutions, for instance, to make an agent's interaction with users moreengaging, or improve the robustness of a policy to an unexpected perturbance.We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithmthat discovers multiple strategies for solving a given task. Unlike prior work,it achieves this with a shared policy network trained over a single run.Specifically, we design an intrinsic reward based on an information-theoreticdiversity objective. Our final objective alternately constraints on thediversity of the strategies and on the extrinsic reward. We solve theconstrained optimization problem by casting it as a probabilistic inferencetask and use policy iteration to maximize the derived lower bound. Experimentalresults show that our method efficiently discovers diverse strategies in a widevariety of reinforcement learning tasks. Compared to baseline methods, DGPOachieves comparable rewards, while discovering more diverse strategies, andoften with better sample efficiency.
What problem does this paper attempt to address?