Unified Curiosity-Driven Learning with Smoothed Intrinsic Reward Estimation.

Fuxian Huang,Weichao Li,Jiabao Cui,Yongjian Fu,Xi Li
DOI: https://doi.org/10.1016/j.patcog.2021.108352
IF: 8
2021-01-01
Pattern Recognition
Abstract:•We propose a novel distribution-aware and policy-aware unified curiosity-driven learning framework to unify state novelty and state-action novelty. DAW enables the agent to explore states diversely, and PAWencourage the agent to explore the states that the policy is uncertain about which action to take. The proposed approach improves the exploration ability of RL with complete intrinsic reward;•We propose to improve the robustness of policy learning by smoothing the intrinsic reward with a batch of transitions close to the current transition; we propose to employ an attention module to extract task-relevant features for a more precise estimation of intrinsic reward;•Extensive experiments on Atari games demonstrate the effectiveness of our approach.
What problem does this paper attempt to address?