Data Informed Residual Reinforcement Learning for High-Dimensional Robotic Tracking Control

Cong Li,Fangzhou Liu,Yongchao Wang,Martin Buss
2024-06-07
Abstract:The learning inefficiency of reinforcement learning (RL) from scratch hinders its practical application towards continuous robotic tracking control, especially for high-dimensional robots. This work proposes a data-informed residual reinforcement learning (DR-RL) based robotic tracking control scheme applicable to robots with high dimensionality. The proposed DR-RL methodology outperforms common RL methods regarding sample efficiency and scalability. Specifically, we first decouple the original robot into low-dimensional robotic subsystems; and further utilize one-step backward (OSBK) data to construct incremental subsystems that are equivalent model-free representations of the above decoupled robotic subsystems. The formulated incremental subsystems allow for parallel learning to relieve computation load and offer us mathematical descriptions of robotic movements for conducting theoretical analysis. Then, we apply DR-RL to learn the tracking control policy, a combination of incremental base policy and incremental residual policy, under a parallel learning architecture. The incremental residual policy uses the guidance from the incremental base policy as the learning initialization and further learns from interactions with environments to endow the tracking control policy with adaptability towards dynamically changing environments. Our proposed DR-RL based tracking control scheme is developed with rigorous theoretical analysis of system stability and weight convergence. The effectiveness of our proposed method is validated numerically on a 7-DoF KUKA iiwa robot manipulator and experimentally on a 3-DoF robot manipulator that would fail for other counterpart RL methods.
Systems and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve efficient tracking control in high - dimensional robots. Specifically, traditional reinforcement learning (RL) methods from scratch have the problem of low learning efficiency when applied to continuous robot tracking control tasks, especially when dealing with high - dimensional robots. This not only requires a large amount of training data, but may also lead to mechanical wear and even damage to the robot itself and the environment in practical applications. In addition, the control tasks of high - dimensional robots further exacerbate the sample complexity problem. To overcome these problems, this paper proposes a data - driven residual reinforcement learning (DR - RL) method, aiming to improve sample efficiency and scalability. The main contributions of this method include: 1. **Data - efficient high - dimensional robot tracking control scheme**: A data - efficient and scalable DR - RL tracking control scheme for high - dimensional robots is proposed, which performs better than common RL methods in experimental tasks. 2. **Data - driven incremental sub - systems**: The high - dimensional robot is decomposed into multiple low - dimensional sub - systems through decoupling techniques, and incremental sub - systems are constructed using one - step backward (OSBK) data. These incremental sub - systems not only improve sample efficiency, but also provide a mathematical description of machine motion, facilitating theoretical analysis, and allowing the use of parallel learning architectures to reduce computational complexity. 3. **Proof of system stability and weight convergence**: Based on the constructed incremental sub - systems and the off - policy empirical data used, a theoretical proof of system stability and weight convergence is provided. ### Specific problem description in the paper The paper focuses on high - dimensional robot tracking control tasks (Problem 1), that is, given a desired trajectory \( \mathbf{x}_d\in\mathbb{R}^n \), learn an efficient tracking control strategy \( \mathbf{u}(\mathbf{x}) \) so that the high - dimensional robot can accurately track this trajectory. ### Solutions 1. **Decoupling techniques**: Decompose the high - dimensional robot into multiple low - dimensional sub - systems, and the dynamic equation of each sub - system is: \[ \dot{\mathbf{x}}_i=\mathbf{f}_i + \mathbf{g}_i\mathbf{u}_i,\quad i = 1,2,\ldots,N \] where \( \mathbf{x}_i\in\mathbb{R}^{n_i} \) and \( \mathbf{u}_i\in\mathbb{R}^{m_i} \) are the local state and control input of the \( i \)-th sub - system respectively, \( \mathbf{f}_i\in\mathbb{R}^{n_i} \) is a combination of local internal dynamics and coupling terms, and \( \mathbf{g}_i\in\mathbb{R}^{n_i\times m_i} \) is the local input gain matrix. 2. **Incremental sub - systems**: Use OSBK data to estimate the unknown model knowledge \( \mathbf{f}_i \) and \( \mathbf{g}_i \), and construct incremental sub - systems: \[ \dot{\mathbf{x}}_i=\dot{\mathbf{x}}_{i,0}+\bar{\mathbf{g}}_i(\Delta\mathbf{u}_i+\boldsymbol{\xi}_i) \] where \( \Delta\mathbf{u}_i=\mathbf{u}_i-\mathbf{u}_{i,0} \) is the incremental policy, and \( \boldsymbol{\xi}_i \) is the estimation error. 3. **DR - RL tracking control scheme**: Under the parallel learning architecture, the incremental policy \( \Delta\mathbf{u}_i \) is designed as a combination of an incremental base policy \( \Delta\mathbf{u}_{ib} \) and an incremental residual policy \( \Delta\mathbf{u}_{ir} \): \[ \Delta\mathbf{u}_i =