Multi-Object Manipulation via Object-Centric Neural Scattering Functions

Stephen Tian,Yancheng Cai,Hong-Xing Yu,Sergey Zakharov,Katherine Liu,Adrien Gaidon,Yunzhu Li,Jiajun Wu
DOI: https://doi.org/10.48550/arXiv.2306.08748
2023-06-15
Abstract:Learned visual dynamics models have proven effective for robotic manipulation tasks. Yet, it remains unclear how best to represent scenes involving multi-object interactions. Current methods decompose a scene into discrete objects, but they struggle with precise modeling and manipulation amid challenging lighting conditions as they only encode appearance tied with specific illuminations. In this work, we propose using object-centric neural scattering functions (OSFs) as object representations in a model-predictive control framework. OSFs model per-object light transport, enabling compositional scene re-rendering under object rearrangement and varying lighting conditions. By combining this approach with inverse parameter estimation and graph-based neural dynamics models, we demonstrate improved model-predictive control performance and generalization in compositional multi-object environments, even in previously unseen scenarios and harsh lighting conditions.
Robotics,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to perform accurate visual modeling and manipulation under complex and changing lighting conditions in multi - object interaction scenarios. Specifically, existing methods usually decompose the scene into discrete objects when dealing with multi - object interactions, but these methods perform poorly when dealing with changes in lighting conditions, especially in extremely harsh lighting conditions. The paper proposes a new method, that is, using object - centric neural scattering functions (OSFs) to represent objects and combining graph neural networks (GNNs) to predict dynamic behaviors in multi - object environments, thereby achieving more accurate model - predictive control (MPC). ### Main Contributions 1. **Inverse Parameter Estimation**: By using neural scattering functions (OSFs), this method can perform inverse parameter estimation under challenging and unseen lighting conditions, including object pose and lighting direction. 2. **Long - term Prediction**: This method can model the composition structure of the scene and support long - term prediction of future system states, thus supporting downstream planning tasks. 3. **Manipulation under Extreme Lighting**: Experiments show that this method can successfully perform manipulation tasks in simulated multi - object scenes containing extreme lighting directions. ### Method Overview 1. **Neural Implicit Scattering Functions (OSFs)**: - OSFs explicitly model the light transmission of each object and can predict the radiative transfer of the object according to the spatial position, incident light direction, and outgoing light direction. - Use KiloOSFs to accelerate the rendering process. KiloOSFs is an extension of NeRF and can handle complex light transmission and shadow effects. 2. **Inverse Parameter Estimation**: - Use covariance matrix adaptation (CMA) to optimize the 6D pose of each object and the lighting position. - Optimize the object pose and lighting parameters by minimizing the mean - squared error (MSE) between multi - view rendered images and observed images. 3. **Action - conditioned Dynamic Model**: - Train a graph neural network (GNN) dynamic model. The input is the current object pose and action, and the output is the 6D pose of the future object. - The dynamic model predicts future states through multiple inter - object propagation steps to handle multi - object interactions. 4. **Visual Model - Predictive Control**: - Given the target image and the initial visual observation, optimize the robot action sequence through sampling and forward prediction to reach the target. - Use MPPI to update the action sampling distribution and execute the first step of the optimal action sequence in the environment. - Update the object pose estimation through inverse parameter estimation and repeat this process for replanning. ### Experimental Results - **Visual Reconstruction**: KiloOSFs can reasonably render the color changes and shadows of objects under extreme lighting conditions, outperforming existing compositional NeRFs. - **Visual Prediction**: Compared with the FitVid model that predicts directly in the pixel space, the method combining the GNN dynamic model and the KiloOSFs rendering module shows higher accuracy in the long - prediction range. - **Model - Predictive Control**: Under random lighting conditions and unseen object configurations, this method shows better performance in model - predictive control tasks, especially in multi - object interaction scenarios. - **Generalization Ability**: This method can naturally handle different numbers of objects (such as 2 or 4 objects) without retraining the model. - **Real - World Application**: Under real - world extreme lighting conditions, this method can successfully estimate the lighting and object pose. In conclusion, the paper proposes a new method that solves the problem of visual modeling and manipulation under complex lighting conditions in multi - object interaction scenarios by combining neural scattering functions and graph neural networks.