Neural modular control for embodied question answering

Abhishek Das, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra
2018-10-23
Abstract:We present a modular approach for learning policies for navigation over long planning horizons from language input. Our hierarchical policy operates at multiple timescales, where the higher-level master policy proposes subgoals to be executed by specialized sub-policies. Our choice of subgoals is compositional and semantic, ie they can be sequentially combined in arbitrary orderings, and assume human-interpretable descriptions (eg ‘exit room’,‘find kitchen’,‘find refrigerator’, etc.). We use imitation learning to warm-start policies at each level of the hierarchy, dramatically increasing sample efficiency, followed by reinforcement learning. Independent reinforcement learning at each level of hierarchy enables sub-policies to adapt to consequences of their actions and recover from errors. Subsequent joint hierarchical training enables the master policy to adapt to the sub-policies. On the challenging EQA [1] benchmark in House3D [2], requiring navigating diverse realistic indoor environments, our approach outperforms prior work by a significant margin, both in terms of navigation and question answering.
What problem does this paper attempt to address?