Abstract:Embodied AI agents that search for objects in large environments such as households often need to make efficient decisions by predicting object locations based on partial information. We pose this as a new type of link prediction problem: link prediction on partially observable dynamic graphs. Our graph is a representation of a scene in which rooms and objects are nodes, and their relationships are encoded in the edges; only parts of the changing graph are known to the agent at each timestep. This partial observability poses a challenge to existing link prediction approaches, which we address. We propose a novel state representation -- Scene Graph Memory (SGM) -- with captures the agent's accumulated set of observations, as well as a neural net architecture called a Node Edge Predictor (NEP) that extracts information from the SGM to search efficiently. We evaluate our method in the Dynamic House Simulator, a new benchmark that creates diverse dynamic graphs following the semantic patterns typically seen at homes, and show that NEP can be trained to predict the locations of objects in a variety of environments with diverse object movement dynamics, outperforming baselines both in terms of new scene adaptability and overall accuracy. The codebase and more can be found at <a class="link-external link-https" href="https://www.scenegraphmemory.com" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use past partial observation data to predict the position of objects in a dynamic and partially observable environment. Specifically, the researchers are concerned with the problem that an autonomous AI agent searching for objects in a large - scale environment (such as a home) needs to make efficient decisions by predicting the position of objects. This is formalized as a new link prediction problem: link prediction on a partially observable dynamic graph. In this problem setting, rooms and objects are nodes, and the relationships between them are encoded by edges, but the agent can only observe a part of the changing graph at each time point. This partial observability poses a challenge to existing link prediction methods, because these methods usually assume that the past graph states can be fully observed. To meet this challenge, the authors propose a new state representation - Scene Graph Memory (SGM), which captures the set of observations accumulated by the agent, and propose a new neural network architecture - Node Edge Predictor (NEP) for extracting information from SGM to search efficiently. In addition, they introduce a new benchmark - Dynamic House Simulator - to evaluate the performance of agents in dynamic, partially observable environments. Through this simulator, researchers can create diverse dynamic scenarios that follow the semantic patterns common in home environments, thereby verifying the performance of NEP in different environments.

Modeling Dynamic Environments with Scene Graph Memory