Abstract:Deceptive games are games that utilize the reward structure to keep the agent away from the global optimization and have been grown up to become a huge challenge in the field of deep reinforcement learning intelligent exploration. Most of the cutting-edge exploration approaches, such as count-based and curiosity-driven, even with intrinsic motivation, which achieves better performance in the sparse reward game, still easily fall into local optimal traps in the deceptive game. To address this shortfall, we introduce a further exploration approach called Maximum Entropy Explore (MEE). Based on entropy rewards and the off-policy actor-critic reinforcement learning algorithm, we divided the agent exploration policy into two independent parts, namely, the target policy and the explorer policy. The explorer policy, taking the maximum entropy of the target policy as the optimization goal, is used to interact with the environment and generated trajectories for the target policy. The target policy regards the maximization of external reward as the optimization goal to achieve the global solution. To alleviate the catastrophic forgetting problem which leads to the training of the agent not stabilized during the off-policy exploration phrase, the optimal experience replay is applied. An on-policy mode switch trick is used to validly prevent the unstable and diverge which caused by the deadly triad. We conduct experiments comparing our approach with state-of-the-art deep reinforcement learning algorithm and exploration methods in the grid world and StarCraft II environments with deceptive reward. The experiment indicates that the MME approach sets out to be in the present paper effectively avoids the deceptive reward trap and learns the global optimal strategy.

Sparse Online Maximum Entropy Inverse Reinforcement Learning Via Proximal Optimization and Truncated Gradient

DROP: Conservative Model-based Optimization for Offline Reinforcement Learning

AdaBoost Maximum Entropy Deep Inverse Reinforcement Learning with Truncated Gradient

Convergence Analysis of an Incremental Approach to Online Inverse Reinforcement Learning

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

Maximum Entropy Reinforcement Learning with Evolution Strategies

Agent-Level Maximum Entropy Inverse Reinforcement Learning for Mean Field Games

Maximum Entropy Diverse Exploration: Disentangling Maximum Entropy Reinforcement Learning

Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field Games

Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning

An Efficient Deep Reinforcement Learning Algorithm for Solving Imperfect Information Extensive-Form Games.

When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

Inverse Reinforcement Learning with Explicit Policy Estimates

An Effective Maximum Entropy Exploration Approach for Deceptive Game in Reinforcement Learning.

Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards

MQES: Max-Q Entropy Search for Efficient Exploration in Continuous Reinforcement Learning

Forward and inverse reinforcement learning sharing network weights and hyperparameters

Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models

A Framework and Method for Online Inverse Reinforcement Learning

Historical Decision-Making Regularized Maximum Entropy Reinforcement Learning