Shared-unique Features and Task-aware Prioritized Sampling on Multi-task Reinforcement Learning

Po-Shao Lin,Jia-Fong Yeh,Yi-Ting Chen,Winston H. Hsu
2024-06-02
Abstract:We observe that current state-of-the-art (SOTA) methods suffer from the performance imbalance issue when performing multi-task reinforcement learning (MTRL) tasks. While these methods may achieve impressive performance on average, they perform extremely poorly on a few tasks. To address this, we propose a new and effective method called STARS, which consists of two novel strategies: a shared-unique feature extractor and task-aware prioritized sampling. First, the shared-unique feature extractor learns both shared and task-specific features to enable better synergy of knowledge between different tasks. Second, the task-aware sampling strategy is combined with the prioritized experience replay for efficient learning on tasks with poor performance. The effectiveness and stability of our STARS are verified through experiments on the mainstream Meta-World benchmark. From the results, our STARS statistically outperforms current SOTA methods and alleviates the performance imbalance issue. Besides, we visualize the learned features to support our claims and enhance the interpretability of STARS.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper focuses on the issue of performance imbalance in Multi-Task Reinforcement Learning (MTRL). Existing state-of-the-art methods may perform well on average, but poorly on specific tasks. To address this problem, the paper proposes a new approach called STARS, which consists of two innovative strategies: Shared-Unique Feature Extractor and Task-Aware Priority Sampling. The Shared-Unique Feature Extractor learns shared features among tasks and task-specific features to facilitate knowledge collaboration across different tasks. The Task-Aware Sampling Strategy combines prioritized experience replay to efficiently handle underperforming tasks and dynamically adjust the number of samples. Experiments on the mainstream Meta-World benchmark validate STARS, showing that it statistically outperforms the current state-of-the-art methods and mitigates the performance imbalance issue. Additionally, the paper enhances the interpretability of STARS by visualizing the learned features. The paper suggests that the performance imbalance issue may arise from the ineffective utilization of shared and unique task features, as well as the lack of dynamic attention adjustment based on task performance differences. STARS addresses these issues through its design, improving stability and performance across tasks.