Abstract:Neural Radiance Fields (NeRFs) have become a widely-applied scene representation technique in recent years, showing advantages for robot navigation and manipulation tasks. To further advance the utility of NeRFs for robotics, we propose a transformer-based framework, NeRF-Loc, to extract 3D bounding boxes of objects in NeRF scenes. NeRF-Loc takes a pre-trained NeRF model and camera view as input and produces labeled, oriented 3D bounding boxes of objects as output. Using current NeRF training tools, a robot can train a NeRF environment model in real-time and, using our algorithm, identify 3D bounding boxes of objects of interest within the NeRF for downstream navigation or manipulation tasks. Concretely, we design a pair of paralleled transformer encoder branches, namely the coarse stream and the fine stream, to encode both the context and details of target objects. The encoded features are then fused together with attention layers to alleviate ambiguities for accurate object localization. We have compared our method with conventional RGB(-D) based methods that take rendered RGB images and depths from NeRFs as inputs. Our method is better than the baselines.

What problem does this paper attempt to address?

The paper primarily addresses the problem of 3D object localization in Neural Radiance Fields (NeRF) environments. Specifically, the study proposes a Transformer-based framework called NeRF-Loc, which is used to extract 3D bounding boxes of objects from a pre-trained NeRF model. Below are the key points of the core issues addressed by the paper: 1. **Background and Motivation**: - NeRF, as an efficient scene representation method, has shown advantages in robotic navigation and manipulation tasks. - To further enhance the application value of NeRF in the robotics field, researchers need a method to directly detect objects in NeRF environments. 2. **Problem Definition**: - The task of object localization within Neural Radiance Fields is defined, i.e., given a pre-constructed NeRF environment model and an observation position, design a Transformer network to estimate the 3D bounding boxes and categories of objects in the current view. 3. **Solution**: - The NeRF-Loc framework is proposed, which includes two parallel streams: Fine Stream and Coarse Stream, used to handle object details and contextual information, respectively. - A Cross-attention Fusion Module is designed to combine information from both streams to improve localization accuracy. - Experimental results show that NeRF-Loc significantly outperforms existing methods in the task of NeRF object localization. 4. **Contribution Overview**: - Introduced the problem of object localization within Neural Radiance Fields and explored its potential applications in NeRF-based robotic perception. - Proposed a framework, NeRF-Loc, that utilizes geometric information in neural representations for 3D object localization. - Conducted extensive experimental evaluations on the NeRF object localization task, demonstrating that the proposed method outperforms existing methods. From the above analysis, it can be seen that the main purpose of this paper is to develop a new method that enables robots to effectively perform object localization in environments represented by NeRF, thereby promoting the development of NeRF-based autonomous robotic systems.

NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance Fields

NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance Fields

NeRF-Loc: Visual Localization with Conditional Neural Radiance Field.

Loc-NeRF: Monte Carlo Localization using Neural Radiance Fields

The NeRFect Match: Exploring NeRF Features for Visual Localization

FVLoc-NeRF : Fast Vision-Only Localization Within Neural Radiation Field

NeRF-RPN: A general framework for object detection in NeRFs

PNeRFLoc: Visual Localization with Point-based Neural Radiance Fields

LATITUDE: Robotic Global Localization with Truncated Dynamic Low-pass Filter in City-scale NeRF

DReg-NeRF: Deep Registration for Neural Radiance Fields

Matching Query Image Against Selected NeRF Feature for Efficient and Scalable Localization

NeRF: Neural Radiance Field in 3D Vision, A Comprehensive Review

Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview

DiscoNeRF: Class-Agnostic Object Field for 3D Object Discovery

Open-NeRF: Towards Open Vocabulary NeRF Decomposition

Instance Neural Radiance Field

NeRF-In: Free-Form NeRF Inpainting with RGB-D Priors

NeRFuser: Large-Scale Scene Representation by NeRF Fusion

Obj-NeRF: Extract Object NeRFs from Multi-view Images

CloudLoc-NeRF: Point-cloud Assisted Volume Location for Neural Radiance Fields

Fast Global Localization on Neural Radiance Field