Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective

Xin Li,Haojie Lei,Li Zhang,Mingzhong Wang
DOI: https://doi.org/10.1109/TPAMI.2023.3285634
Abstract:The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.
What problem does this paper attempt to address?