Effective Interpretable Policy Distillation via Critical Experience Point Identification
Xiao Liu,Shuyang Liu,Bo An,Yang Gao,Shangdong Yang,Wenbin Li
DOI: https://doi.org/10.1109/mis.2023.3265868
IF: 6.744
2023-01-01
IEEE Intelligent Systems
Abstract:Interpretable Policy Distillation aims to imitate a Deep Reinforcement Learning (DRL) policy into a self-explainable model. However, the distilled policy usually does not generalize well to complex tasks. To investigate this phenomenon, we examine the experience pools of DRL tasks and find that these interactive experience distributions are heavy-tailed. However, this critical issue is largely ignored by existing approaches and thus, they do not fully unitize the less frequent but very critical experience points. To address this issue, we propose to characterize decision boundaries via the minimum experience retention to deal with the heavy-tailed experience distributions. Our method identifies critical experience points that are close to the model’s decision boundaries and such experience points are more critical because they portray the prerequisite of a model to take an action. As a result, our method distills the DRL policy to a self-explainable structure without neural structure and ambiguous intermediate parameters. Through experiments on six games, we show that our method outperforms the state-of-the-art baselines in cumulative rewards, stability, and faithfulness.
computer science, artificial intelligence,engineering, electrical & electronic