Abstract:Reinforcement learning (RL) with diverse offline datasets can have the advantage of leveraging the relation of multiple tasks and the common skills learned across those tasks, hence allowing us to deal with real-world complex problems efficiently in a data-driven way. In offline RL where only offline data is used and online interaction with the environment is restricted, it is yet difficult to achieve the optimal policy for multiple tasks, especially when the data quality varies for the tasks. In this paper, we present a skill-based multi-task RL technique on heterogeneous datasets that are generated by behavior policies of different quality. To learn the shareable knowledge across those datasets effectively, we employ a task decomposition method for which common skills are jointly learned and used as guidance to reformulate a task in shared and achievable subtasks. In this joint learning, we use Wasserstein auto-encoder (WAE) to represent both skills and tasks on the same latent space and use the quality-weighted loss as a regularization term to induce tasks to be decomposed into subtasks that are more consistent with high-quality skills than others. To improve the performance of offline RL agents learned on the latent space, we also augment datasets with imaginary trajectories relevant to high-quality skills for each task. Through experiments, we show that our multi-task offline RL approach is robust to the mixed configurations of different-quality datasets and it outperforms other state-of-the-art algorithms for several robotic manipulation tasks and drone navigation tasks.
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve
This paper aims to address several key issues encountered in Multi-task Offline Reinforcement Learning (MTO-RL):
1. **Handling Datasets of Different Quality**:
- In offline reinforcement learning, due to limited environment interaction, training data usually comes from pre-collected datasets. The quality of these datasets can vary, leading to poor performance of the learned policies. The paper proposes a skill-based multi-task learning method that can robustly learn from datasets of different quality.
2. **Task Decomposition and Subtask Generation**:
- To improve learning efficiency and generalization ability, the paper introduces a task decomposition method. By decomposing tasks into achievable subtasks and using high-quality skills for guidance, the learning process becomes more efficient and stable.
3. **Data Augmentation**:
- To mitigate the negative impact of low-quality datasets, the paper proposes a data augmentation method based on imagined demonstrations. By generating trajectories similar to those of expert policies, the quality and scale of the dataset are enhanced, thereby improving the performance of offline reinforcement learning algorithms.
### Main Contributions
1. **Proposed a New Multi-task Offline Reinforcement Learning Model**:
- This model uses a quality-aware joint learning method to decompose tasks into achievable subtasks, ensuring robust learning across datasets of different quality.
2. **Designed a Data Augmentation Scheme Specific to Multi-task Offline Reinforcement Learning**:
- This scheme aims to generate trajectories produced by expert policies, thereby improving the quality and scale of the dataset.
3. **Evaluated in Multi-task Robotic Manipulation and Drone Navigation Scenarios**:
- Experimental results show that the model performs excellently under heterogeneous data conditions, outperforming other existing advanced algorithms.
### Method Overview
1. **Task Decomposition**:
- Using a Wasserstein Autoencoder (WAE) to represent skills and tasks in the same latent space, with a quality-weighted loss as a regularization term, inducing task decomposition into subtasks consistent with high-quality skills.
2. **Data Augmentation**:
- Generating imagined trajectories based on high-quality skills to improve the performance of offline reinforcement learning agents through data augmentation.
### Experimental Validation
- **Experimental Setup**:
- Evaluations were conducted using multiple robotic manipulation tasks in the Meta-world environment and navigation tasks in the Airsim drone simulator.
- **Comparison Methods**:
- Compared with various multi-task reinforcement learning methods, including TD3+BC, PCGrad, SoftMod, etc.
- **Experimental Results**:
- Under different dataset configurations, the proposed model significantly outperformed other methods in terms of average success rate, especially under heterogeneous data conditions.
### Conclusion
By introducing skill-based task decomposition and data augmentation methods, this paper effectively addresses the issues of data quality and learning robustness in multi-task offline reinforcement learning, providing new insights for efficiently solving complex real-world problems.