Learning to Brachiate via Simplified Model Imitation

Daniele Reda,Hung Yu Ling,Michiel van de Panne
DOI: https://doi.org/10.1145/3528233.3530728
2022-05-09
Abstract:Brachiation is the primary form of locomotion for gibbons and siamangs, in which these primates swing from tree limb to tree limb using only their arms. It is challenging to control because of the limited control authority, the required advance planning, and the precision of the required grasps. We present a novel approach to this problem using reinforcement learning, and as demonstrated on a finger-less 14-link planar model that learns to brachiate across challenging handhold sequences. Key to our method is the use of a simplified model, a point mass with a virtual arm, for which we first learn a policy that can brachiate across handhold sequences with a prescribed order. This facilitates the learning of the policy for the full model, for which it provides guidance by providing an overall center-of-mass trajectory to imitate, as well as for the timing of the holds. Lastly, the simplified model can also readily be used for planning suitable sequences of handholds in a given environment. Our results demonstrate brachiation motions with a variety of durations for the flight and hold phases, as well as emergent extra back-and-forth swings when this proves useful. The system is evaluated with a variety of ablations. The method enables future work towards more general 3D brachiation, as well as using simplified model imitation in other settings.
Machine Learning,Graphics,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to achieve the brachiation of primates through simplified model imitation learning. Specifically, the authors focus on how to use machine - learning techniques, especially Reinforcement Learning (RL), to control a 14 - link planar model so that it can swing between different handles like a gibbon. This task is challenging because the brachiation movement requires precise grasping, advance planning, and efficient movement under limited control authority. The method proposed in the paper solves this problem through a two - stage learning process: 1. **Simplified Model Learning**: First, learn the control strategy on a simplified model. This simplified model is a mass point with a virtual arm, which can more easily explore the action space and quickly generate effective control strategies. The goal of this stage is to learn how to swing on a pre - determined sequence of handlebars. 2. **Full - Model Imitation Learning**: Then, use the center - of - mass trajectory and the time of the grasping hand learned from the simplified model as guidance to train the complete multi - link model. This stage takes advantage of the overall trajectory and the timing of the grasping hand provided by the simplified model, making the learning of complex behaviors more efficient. Through this method, the paper shows that the model can swing on challenging handlebar sequences and can exhibit additional spontaneous behaviors such as forward - and - backward swinging to build enough momentum to cross large gaps. In addition, the simplified model can also be used to plan a suitable handlebar sequence in a given environment, thus providing a basis for future more general three - dimensional brachiation research and simplified - model imitation in other scenarios.