PADDLE: Logic Program Guided Policy Reuse in Deep Reinforcement Learning.

Hao Zhang,Tianpei Yang,Yan Zheng,Jianye Hao,Matthew E. Taylor
DOI: https://doi.org/10.5555/3635637.3663235
2024-01-01
Abstract:Learning new skills through previous experience is regular in human life, which is the core idea of Transfer Reinforcement Learning (TRL). TRL requires the agent to learn when and which source policy is the best to reuse as the target task's policy and how to reuse the source policy. Most TRL methods learn, transfer, and reuse black-box policies, which is hard to explain: 1) when to reuse, 2) which source policy is effective, and reduces transfer efficiency. In this paper, we propose a novel TRL method called ProgrAm gui DeD poLicy rEuse (PADDLE). PADDLE can measure the logic similarities between tasks and transfer knowledge which reflects the logic behind the target task. To achieve this, we propose a hybrid decision model that synthesizes high-level logic programs and learns low-level DRL policy to learn source tasks. Second, we propose a transferability metric that can measure the logic similarity between the target task and source tasks. Last, we combine it with the low-level policy similarity to select the appropriate source policy as the guiding policy for the target task. Experimental results show that PADDLE can effectively select the appropriate source tasks to guide learning on the target task, outperforming black-box TRL methods.
What problem does this paper attempt to address?