Offline Skill Graph (OSG): A Framework for Learning and Planning using Offline Reinforcement Learning Skills

Ben-ya Halevy,Yehudit Aperstein,Dotan Di Castro
2023-06-24
Abstract:Reinforcement Learning has received wide interest due to its success in competitive games. Yet, its adoption in everyday applications is limited (e.g. industrial, home, healthcare, etc.). In this paper, we address this limitation by presenting a framework for planning over offline skills and solving complex tasks in real-world environments. Our framework is comprised of three modules that together enable the agent to learn from previously collected data and generalize over it to solve long-horizon tasks. We demonstrate our approach by testing it on a robotic arm that is required to solve complex tasks.
Robotics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Long Environment Exploration Time**: Reinforcement Learning (RL) in continuous environments requires long periods of environment exploration, which is not only time-consuming in the real world but may also lead to damage to the robot or the environment. 2. **Single Task Focus**: Most RL algorithms are designed to solve a single task, requiring learning from scratch for new problems, even if the new problem is in the same environment. 3. **Complexity and Poor Interpretability of Solutions**: When using deep neural networks to handle complex tasks, the solutions are complex and difficult to interpret, which particularly affects safety and specific improvements in practical applications. To overcome these limitations, the authors propose a framework that utilizes Offline Reinforcement Learning skills combined with a planner and a controller to solve complex tasks. Specifically, the framework includes three main modules: a set of offline learned skills, an Offline Deep Skill Graph built based on these skills, and a state classification network that maps the state space to the skill graph to achieve reliable initial skill selection. These components are learned offline through previously collected datasets and can effectively plan and execute solutions across different tasks, thereby expanding as the agent's lifecycle progresses.