Tobias Johannink,Shikhar Bahl,Ashvin Nair,Jianlan Luo,Avinash Kumar,Matthias Loskyll,Juan Aparicio Ojea,Eugen Solowjow,Sergey Levine
Abstract:Conventional feedback control methods can solve various types of robot control problems very efficiently by capturing the structure with explicit models, such as rigid body equations of motion. However, many control problems in modern manufacturing deal with contacts and friction, which are difficult to capture with first-order physical modeling. Hence, applying control design methodologies to these kinds of problems often results in brittle and inaccurate controllers, which have to be manually tuned for deployment. Reinforcement learning (RL) methods have been demonstrated to be capable of learning continuous robot controllers from interactions with the environment, even for problems that include friction and contacts. In this paper, we study how we can solve difficult control problems in the real world by decomposing them into a part that is solved efficiently by conventional feedback control methods, and the residual which is solved with RL. The final control policy is a superposition of both control signals. We demonstrate our approach by training an agent to successfully perform a real-world block assembly task involving contacts and unstable objects.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to efficiently solve the robot control problems involving complex dynamic characteristics such as contact and friction in the real world?** Specifically, although traditional feedback control methods can efficiently solve many types of robot control problems, for tasks involving contact and friction, these methods are often not robust enough and difficult to adjust parameters. And reinforcement learning (RL) can learn continuous robot controllers through interaction with the environment, but standard RL methods may be unsafe in the initial stage and require a large amount of interaction data.
To solve these problems, the author proposes a method that combines traditional feedback control and deep reinforcement learning, called **Residual Reinforcement Learning**. This method solves the control task by decomposing it into two parts:
1. **Traditional feedback control part**: This part is responsible for handling the task structure that can be captured by explicit models (such as the rigid - body motion equations).
2. **Residual part**: This part is solved by the RL algorithm and deals with the parts involving contact and the dynamics of external objects.
The final control strategy is the superposition of these two parts of control signals. In this way, this method can maintain the efficiency of the traditional controller while using the flexibility of RL to deal with complex dynamic characteristics, so as to better adapt to manufacturing tasks in the real world.
### Formula summary
- The discrete - time state transition equation of the dynamic system:
\[
s_{t + 1}=\begin{bmatrix}
s_{m,t + 1}\\
s_{o,t + 1}
\end{bmatrix}=\begin{bmatrix}
A(s_{m,t})&0\\
B(s_{m,t},s_{o,t})&C(s_{o,t})
\end{bmatrix}\begin{bmatrix}
s_{m,t}\\
s_{o,t}
\end{bmatrix}+D\begin{bmatrix}
u_t\\
0
\end{bmatrix}
\]
where \(s_m\) and \(s_o\) represent the state of the robot and the state of the object in the environment respectively, and \(u\) is the control input.
- The form of the reward function:
\[
r_t = f(s_m)+g(s_o)
\]
where \(f(s_m)\) represents the geometric relationship reward related to the robot state, and \(g(s_o)\) represents the reward related to the state of the object in the environment.
- The combined form of the control input:
\[
u=\pi_H(s_m)+\pi_\theta(s_m,s_o)
\]
where \(\pi_H(s_m)\) is a hand - designed controller, and \(\pi_\theta(s_m,s_o)\) is a policy obtained through RL learning.
Through this method, the author shows that in simulated and real environments, residual reinforcement learning can achieve better performance with a smaller number of samples and can better cope with environmental changes and control noise.