Abstract:As previous representations for reinforcement learning cannot effectively incorporate a human-intuitive understanding of the 3D environment, they usually suffer from sub-optimal performances. In this paper, we present Semantic-aware Neural Radiance Fields for Reinforcement Learning (SNeRL), which jointly optimizes semantic-aware neural radiance fields (NeRF) with a convolutional encoder to learn 3D-aware neural implicit representation from multi-view images. We introduce 3D semantic and distilled feature fields in parallel to the RGB radiance fields in NeRF to learn semantic and object-centric representation for reinforcement learning. SNeRL outperforms not only previous pixel-based representations but also recent 3D-aware representations both in model-free and model-based reinforcement learning.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in Reinforcement Learning (RL), the existing representation methods cannot effectively incorporate the 3D environmental information intuitively understood by humans into the model, resulting in sub - optimal performance. Specifically, traditional image - based Reinforcement Learning methods can usually only learn visual representations from single - view observations, lack an understanding of 3D structural information, and it is difficult to obtain object - related semantic representations. To solve these problems, the paper proposes Semantic - aware Neural Radiance Fields for Reinforcement Learning (SNeRL), aiming to learn 3D - aware neural implicit representations from multi - view images by combining convolutional encoders and semantic - aware Neural Radiance Fields (NeRF). SNeRL introduces a 3D semantic field and a distilled feature field in parallel with the RGB radiation field to learn semantic and object - centered representations suitable for Reinforcement Learning tasks. The following are the main contributions of SNeRL: 1. **Proposing a new framework**: SNeRL uses NeRF together with semantic and distilled feature fields to learn 3D - aware semantic representations, thereby improving the effectiveness of Reinforcement Learning. 2. **Verifying effectiveness**: SNeRL not only performs well in both model - free and model - based methods, but is also the first work to utilize semantic - aware representations without using object masks in RL downstream tasks. 3. **Outperforming existing methods**: SNeRL outperforms previous single - view and multi - view image - based RL algorithms in four different 3D environments, especially in the Meta - world environment. Through these improvements, SNeRL can learn more efficiently in complex control tasks and better understand the semantic information in 3D environments.

SNeRL: Semantic-aware Neural Radiance Fields for Reinforcement Learning

Self-Evolving Neural Radiance Fields

GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding

NeRF-Loc: Visual Localization with Conditional Neural Radiance Field.

Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

SG-NeRF: Semantic-guided Point-based Neural Radiance Fields

SegNeRF: 3D Part Segmentation with Neural Radiance Fields

Reconstructive Latent-Space Neural Radiance Fields for Efficient 3D Scene Representations

OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding

NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

Instant Continual Learning of Neural Radiance Fields

NeRF: Neural Radiance Field in 3D Vision, A Comprehensive Review

Semantic-aware Occlusion Filtering Neural Radiance Fields in the Wild

SN 2 eRF: A Framework for Neural Radiance Fields given Sparse and Noisy Poses

MEIL-NeRF: Memory-Efficient Incremental Learning of Neural Radiance Fields

Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention

NeSLAM: Neural Implicit Mapping and Self-Supervised Feature Tracking With Depth Completion and Denoising

CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering

DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features