NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance Fields

Jiankai Sun,Yan Xu,Mingyu Ding,Hongwei Yi,Chen Wang,Jingdong Wang,Liangjun Zhang,Mac Schwager
DOI: https://doi.org/10.1109/LRA.2023.3293308
2023-07-15
Abstract:Neural Radiance Fields (NeRFs) have become a widely-applied scene representation technique in recent years, showing advantages for robot navigation and manipulation tasks. To further advance the utility of NeRFs for robotics, we propose a transformer-based framework, NeRF-Loc, to extract 3D bounding boxes of objects in NeRF scenes. NeRF-Loc takes a pre-trained NeRF model and camera view as input and produces labeled, oriented 3D bounding boxes of objects as output. Using current NeRF training tools, a robot can train a NeRF environment model in real-time and, using our algorithm, identify 3D bounding boxes of objects of interest within the NeRF for downstream navigation or manipulation tasks. Concretely, we design a pair of paralleled transformer encoder branches, namely the coarse stream and the fine stream, to encode both the context and details of target objects. The encoded features are then fused together with attention layers to alleviate ambiguities for accurate object localization. We have compared our method with conventional RGB(-D) based methods that take rendered RGB images and depths from NeRFs as inputs. Our method is better than the baselines.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The paper primarily addresses the problem of 3D object localization in Neural Radiance Fields (NeRF) environments. Specifically, the study proposes a Transformer-based framework called NeRF-Loc, which is used to extract 3D bounding boxes of objects from a pre-trained NeRF model. Below are the key points of the core issues addressed by the paper: 1. **Background and Motivation**: - NeRF, as an efficient scene representation method, has shown advantages in robotic navigation and manipulation tasks. - To further enhance the application value of NeRF in the robotics field, researchers need a method to directly detect objects in NeRF environments. 2. **Problem Definition**: - The task of object localization within Neural Radiance Fields is defined, i.e., given a pre-constructed NeRF environment model and an observation position, design a Transformer network to estimate the 3D bounding boxes and categories of objects in the current view. 3. **Solution**: - The NeRF-Loc framework is proposed, which includes two parallel streams: Fine Stream and Coarse Stream, used to handle object details and contextual information, respectively. - A Cross-attention Fusion Module is designed to combine information from both streams to improve localization accuracy. - Experimental results show that NeRF-Loc significantly outperforms existing methods in the task of NeRF object localization. 4. **Contribution Overview**: - Introduced the problem of object localization within Neural Radiance Fields and explored its potential applications in NeRF-based robotic perception. - Proposed a framework, NeRF-Loc, that utilizes geometric information in neural representations for 3D object localization. - Conducted extensive experimental evaluations on the NeRF object localization task, demonstrating that the proposed method outperforms existing methods. From the above analysis, it can be seen that the main purpose of this paper is to develop a new method that enables robots to effectively perform object localization in environments represented by NeRF, thereby promoting the development of NeRF-based autonomous robotic systems.