Curiosity-tuned experience replay for wargaming decision modeling without reward-engineering

Liwei Dong,Ni Li,Guanghong Gong
DOI: https://doi.org/10.1016/j.simpat.2023.102842
2023-01-01
Abstract:Reinforcement Learning (RL) has become a promising technique to deal with the tough decision modeling problem in the wargaming field. However, to deploy current RL algorithms requires reward-engineering scenario by scenario, which is laborious for massive wargaming scenarios. To tackle this issue, this paper proposes an improved RL method, curiosity-tuned experience replay (CTER), which allows the RL-driven decision model to achieve a relatively effective policy under the sparse reward. CTER uses the curiosity mechanism to regulate the three critical procedures during learning with experience replay: the exploration, storage, and revisitation of the experiences. Based on the prediction-based curiosity, CTER generates an intrinsic reward to fill the sparse reward space, and further provides an adaptive exploration strategy to collect more informative experiences. Moreover, CTER develops a novel prioritized replay and memory updating mechanism to reuse experiences more efficiently. Through the systematic evaluation and comparison on typical game tasks and wargaming tasks, CTER shows its effectiveness and generalization in different scenarios without reward-engineering. Especially, the policy performance of CTER-based RL with the sparse reward is almost equivalent to that of ordinary RL with dense engineered rewards. Our work may offer a relatively universal approach for wargaming decision modeling, which can free the RL-based decision modelers from the laborious rewardengineering.
What problem does this paper attempt to address?