Decentralized Distributed PPO

Erik Wijmans, Abhishek Kadian2 Ari Morcos2 Stefan Lee, Irfan Essa1 Devi Parikh, Manolis Savva, Dhruv Batra
Abstract:We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever ‘stale’), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim (Savva et al., 2019), DD-PPO exhibits near-linear scaling–achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience)–over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs. This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially ‘solves’ the task–near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+ Compass sensor.
What problem does this paper attempt to address?