Generative World Explorer

Taiming Lu,Tianmin Shu,Alan Yuille,Daniel Khashabi,Jieneng Chen
2024-11-19
Abstract:Planning with partial observation is a central challenge in embodied AI. A majority of prior works have tackled this challenge by developing agents that physically explore their environment to update their beliefs about the world <a class="link-external link-http" href="http://state.In" rel="external noopener nofollow">this http URL</a> contrast, humans can $\textit{imagine}$ unseen parts of the world through a mental exploration and $\textit{revise}$ their beliefs with imagined observations. Such updated beliefs can allow them to make more informed decisions, without necessitating the physical exploration of the world at all times. To achieve this human-like ability, we introduce the $\textit{Generative World Explorer (Genex)}$, an egocentric world exploration framework that allows an agent to mentally explore a large-scale 3D world (e.g., urban scenes) and acquire imagined observations to update its belief. This updated belief will then help the agent to make a more informed decision at the current step. To train $\textit{Genex}$, we create a synthetic urban scene dataset, Genex-DB. Our experimental results demonstrate that (1) $\textit{Genex}$ can generate high-quality and consistent observations during long-horizon exploration of a large virtual physical world and (2) the beliefs updated with the generated observations can inform an existing decision-making model (e.g., an LLM agent) to make better plans.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the planning challenges in partially observable environments, especially in embodied AI. Most previous studies have addressed this challenge by developing agents that can physically explore their environments to update their beliefs about the state of the world. However, humans can imagine unseen parts of the world through mental exploration and use these imagined observations to update their beliefs, thus making more informed decisions without the need for physical exploration at any time. To achieve this human - like ability, the paper introduces the Generative World Explorer (Genex), an egocentric world exploration framework that allows agents to mentally explore large - scale 3D worlds (such as urban scenes) and update their beliefs through imagined observations. These updated beliefs will help agents make more informed decisions in the current step. Specifically, the main contributions of the paper include: 1. Proposing Genex, a novel framework that enables agents to conduct imaginative exploration in a high - quality and exploration - consistent manner. 2. For the first time, attempting to integrate generated videos into the partially observable decision - making process through the introduction of belief updates based on imagination. 3. Highlighting the compelling applications of Genex, including multi - agent decision - making. Through these contributions, the paper aims to enhance the navigation and decision - making abilities of agents in complex, partially observable environments, especially in scenarios that require rapid response and efficient exploration.