Synthesizing Programmatic Policy for Generalization Within Task Domain

Tianyi Wu,Liwei Shen,Zhen Dong,Xin Peng,Wenyun Zhao
DOI: https://doi.org/10.24963/ijcai.2024/577
2024-01-01
Abstract:Deep reinforcement learning struggles to generalize across tasks that remain unseen during training. Consider a neural process observed in humans and animals, where they not only learn new solutions but also deduce shared subroutines. These subroutines can be applied to tasks involving similar states to improve efficiency. Inspired by this phenomenon, we consider synthesizing a programmatic policy characterized by a conditional branch structure, which is capable of capturing subroutines and state patterns. This enables the learned policy to generalize to unseen tasks. The architecture of the programmatic policy is synthesized based on a context-free grammar. Such a grammar supports a nested If-Then-Else derivation and the incorporation of Recurrent Neural Network. The programmatic policy is trained across tasks in a domain through a meta-learning algorithm. We evaluate our approach in benchmarks, adapted from PDDLGym for task planning and Pybullet for robotic manipulation. Experimental results showcase the effectiveness of our approach across diverse benchmarks. Moreover, the learned policy demonstrates the ability to generalize to tasks that were not seen during training.
What problem does this paper attempt to address?