Meta Generative Flow Networks with Personalization for Task-Specific Adaptation

Xinyuan Ji,Xu Zhang,Wei Xi,Haozhi Wang,Olga Gadyatskaya,Yinchuan Li
2023-06-16
Abstract:Multi-task reinforcement learning and meta-reinforcement learning have been developed to quickly adapt to new tasks, but they tend to focus on tasks with higher rewards and more frequent occurrences, leading to poor performance on tasks with sparse rewards. To address this issue, GFlowNets can be integrated into meta-learning algorithms (GFlowMeta) by leveraging the advantages of GFlowNets on tasks with sparse rewards. However, GFlowMeta suffers from performance degradation when encountering heterogeneous transitions from distinct tasks. To overcome this challenge, this paper proposes a personalized approach named pGFlowMeta, which combines task-specific personalized policies with a meta policy. Each personalized policy balances the loss on its personalized task and the difference from the meta policy, while the meta policy aims to minimize the average loss of all tasks. The theoretical analysis shows that the algorithm converges at a sublinear rate. Extensive experiments demonstrate that the proposed algorithm outperforms state-of-the-art reinforcement learning algorithms in discrete environments.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems encountered by multi - task reinforcement learning (Multi - task Reinforcement Learning, Multi - task RL) and meta - reinforcement learning (Meta - Reinforcement Learning, Meta - RL) when adapting to new tasks, especially the performance degradation problems when these tasks have sparse rewards or heterogeneous transitions. #### Specific problem descriptions: 1. **Limitations of multi - task reinforcement learning and meta - reinforcement learning**: - Multi - task reinforcement learning and meta - reinforcement learning methods usually tend to focus on high - return and frequently occurring tasks, which leads to their poor performance in handling tasks with sparse rewards. - When the transition differences between tasks are large (i.e., there is heterogeneity between tasks), existing meta - learning algorithms (such as GFlowMeta) may encounter performance degradation problems. 2. **Deficiencies of existing methods**: - Although GFlowNets perform well in handling tasks with sparse rewards, when encountering heterogeneous transitions between different tasks, the performance of GFlowMeta will decline significantly. - Existing meta - reinforcement learning methods (such as MAML, E - MAML, PEARL, etc.) can quickly adapt to new tasks, but when the task similarity is low, the model is prone to divergence and it is difficult to generalize to new tasks. #### Proposed solutions: To solve the above problems, the paper proposes a personalized method called pGFlowMeta (Personalized Meta Generative Flow Networks). This method combines task - specific personalized policies and meta - policies to deal with the heterogeneity between different tasks. Specifically: - **Personalized policies**: Each task has a personalized policy for optimizing its own task performance. - **Meta - policies**: A global meta - policy is used to capture the commonalities of all tasks and minimize the average loss of all tasks. - **Balancing mechanism**: By introducing a proximal operator to punish the differences between the meta - policy and the personalized policy, so as to ensure that the common knowledge of all tasks is shared while maintaining personalization. #### Main contributions: 1. **Proposing the pGFlowMeta framework**: This framework can achieve better performance between different tasks, especially when there are large differences between tasks. 2. **Alternating minimization algorithm**: An alternating minimization algorithm is proposed to realize the update of personalized policies and meta - policies, and the sub - linear convergence of this algorithm is proved. 3. **Experimental verification**: Through experiments in multiple discrete environments, the superior performance of pGFlowMeta on different tasks is verified, especially its performance on sparse rewards and heterogeneous tasks. In summary, the main goal of this paper is to improve the performance of multi - task reinforcement learning and meta - reinforcement learning in handling sparse rewards and heterogeneous tasks by introducing the combination of personalized policies and meta - policies.