Abstract:Multi-task reinforcement learning and meta-reinforcement learning have been developed to quickly adapt to new tasks, but they tend to focus on tasks with higher rewards and more frequent occurrences, leading to poor performance on tasks with sparse rewards. To address this issue, GFlowNets can be integrated into meta-learning algorithms (GFlowMeta) by leveraging the advantages of GFlowNets on tasks with sparse rewards. However, GFlowMeta suffers from performance degradation when encountering heterogeneous transitions from distinct tasks. To overcome this challenge, this paper proposes a personalized approach named pGFlowMeta, which combines task-specific personalized policies with a meta policy. Each personalized policy balances the loss on its personalized task and the difference from the meta policy, while the meta policy aims to minimize the average loss of all tasks. The theoretical analysis shows that the algorithm converges at a sublinear rate. Extensive experiments demonstrate that the proposed algorithm outperforms state-of-the-art reinforcement learning algorithms in discrete environments.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problems encountered by multi - task reinforcement learning (Multi - task Reinforcement Learning, Multi - task RL) and meta - reinforcement learning (Meta - Reinforcement Learning, Meta - RL) when adapting to new tasks, especially the performance degradation problems when these tasks have sparse rewards or heterogeneous transitions. #### Specific problem descriptions: 1. **Limitations of multi - task reinforcement learning and meta - reinforcement learning**: - Multi - task reinforcement learning and meta - reinforcement learning methods usually tend to focus on high - return and frequently occurring tasks, which leads to their poor performance in handling tasks with sparse rewards. - When the transition differences between tasks are large (i.e., there is heterogeneity between tasks), existing meta - learning algorithms (such as GFlowMeta) may encounter performance degradation problems. 2. **Deficiencies of existing methods**: - Although GFlowNets perform well in handling tasks with sparse rewards, when encountering heterogeneous transitions between different tasks, the performance of GFlowMeta will decline significantly. - Existing meta - reinforcement learning methods (such as MAML, E - MAML, PEARL, etc.) can quickly adapt to new tasks, but when the task similarity is low, the model is prone to divergence and it is difficult to generalize to new tasks. #### Proposed solutions: To solve the above problems, the paper proposes a personalized method called pGFlowMeta (Personalized Meta Generative Flow Networks). This method combines task - specific personalized policies and meta - policies to deal with the heterogeneity between different tasks. Specifically: - **Personalized policies**: Each task has a personalized policy for optimizing its own task performance. - **Meta - policies**: A global meta - policy is used to capture the commonalities of all tasks and minimize the average loss of all tasks. - **Balancing mechanism**: By introducing a proximal operator to punish the differences between the meta - policy and the personalized policy, so as to ensure that the common knowledge of all tasks is shared while maintaining personalization. #### Main contributions: 1. **Proposing the pGFlowMeta framework**: This framework can achieve better performance between different tasks, especially when there are large differences between tasks. 2. **Alternating minimization algorithm**: An alternating minimization algorithm is proposed to realize the update of personalized policies and meta - policies, and the sub - linear convergence of this algorithm is proved. 3. **Experimental verification**: Through experiments in multiple discrete environments, the superior performance of pGFlowMeta on different tasks is verified, especially its performance on sparse rewards and heterogeneous tasks. In summary, the main goal of this paper is to improve the performance of multi - task reinforcement learning and meta - reinforcement learning in handling sparse rewards and heterogeneous tasks by introducing the combination of personalized policies and meta - policies.

Meta Generative Flow Networks with Personalization for Task-Specific Adaptation

GFlowNet Training by Policy Gradients

Meta-Reinforcement Learning with Dynamic Adaptiveness Distillation

Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets

Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation

Pre-Training and Fine-Tuning Generative Flow Networks

MetaGFN: Exploring Distant Modes with Adapted Metadynamics for Continuous GFlowNets

Pessimistic Backward Policy for GFlowNets

Learning GFlowNets from partial episodes for improved convergence and stability

Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation

Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization

Generative Flow Networks for Precise Reward-Oriented Active Learning on Graphs

On Generalization for Generative Flow Networks

Generative Flow Network for Listwise Recommendation

Order-Preserving GFlowNets

CFlowNets: Continuous Control with Generative Flow Networks

Better Training of GFlowNets with Local Credit and Incomplete Trajectories

Linear Representation Meta-Reinforcement Learning for Instant Adaptation

Generative Flow Networks as Entropy-Regularized RL

Meta-Gradients in Non-Stationary Environments

Rectifying Reinforcement Learning for Reward Matching