Bidirectional Progressive Neural Networks With Episodic Return Progress for Emergent Task Sequencing and Robotic Skill Transfer

Suzan Ece Ada,Hanne Say,Emre Ugur,Erhan Oztop
DOI: https://doi.org/10.1109/access.2024.3402089
IF: 3.9
2024-05-24
IEEE Access
Abstract:Human brain and behavior provide a rich venue that can inspire novel control and learning methods for robotics. In an attempt to exemplify such a development by inspiring how humans acquire knowledge and transfer skills among tasks, we introduce a novel multi-task reinforcement learning framework named Episodic Return Progress with Bidirectional Progressive Neural Networks (ERP-BPNN). The proposed ERP-BPNN model 1) learns in a human-like interleaved manner by 2) autonomous task switching based on a novel intrinsic motivation signal and, in contrast to existing methods, 3) allows bidirectional skill transfer among tasks. ERP-BPNN is a general architecture applicable to several multi-task learning settings; in this paper, we present the details of its neural architecture and show its ability to enable effective learning and skill transfer among morphologically different robots in a reaching task. The developed Bidirectional Progressive Neural Network (BPNN) architecture enables bidirectional skill transfer without requiring incremental training and seamlessly integrates with online task arbitration. The task arbitration mechanism developed is based on soft Episodic Return progress (ERP), a novel intrinsic motivation (IM) signal. To evaluate our method, we use quantifiable robotics metrics such as 'expected distance to goal' and 'path straightness' in addition to the usual reward-based measure of episodic return common in reinforcement learning. With simulation experiments, we show that ERP-BPNN achieves faster cumulative convergence and improves performance in all metrics considered among morphologically different robots compared to the baselines. Overall, our method provides a human-inspired and efficient multi-task reinforcement learning approach with interleaved learning, making it highly suitable for lifelong learning applications.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?