Embodied Question Answering

Samyak Datta,Georgia Gkioxari,Stefan Lee,Devi Parikh,Abhishek Das,Dhruv Batra
DOI: https://doi.org/10.1109/CVPR.2018.00008
2017-11-30
Abstract:We present a new AI task - Embodied Question Answering (EmbodiedQA) - where an agent is spawned at a random location in a 3D environment and asked a question ('What color is the car?'). In order to answer, the agent must first intelligently navigate to explore the environment, gather necessary visual information through first-person (egocentric) vision, and then answer the question ('orange'). EmbodiedQA requires a range of AI skills - language understanding, visual recognition, active perception, goal-driven navigation, commonsense reasoning, long-term memory, and grounding language into actions. In this work, we develop a dataset of questions and answers in House3D environments [1], evaluation metrics, and a hierarchical model trained with imitation and reinforcement learning.
Computer Science
What problem does this paper attempt to address?