Abstract:Recent works in spatiotemporal radiance fields can produce photorealistic free-viewpoint videos. However, they are inherently unsuitable for interactive streaming scenarios (e.g. video conferencing, telepresence) because have an inevitable lag even if the training is instantaneous. This is because these approaches consume videos and thus have to buffer chunks of frames (often seconds) before processing. In this work, we take a step towards interactive streaming via a frame-by-frame approach naturally free of lag. Conventional wisdom believes that per-frame NeRFs are impractical due to prohibitive training costs and storage. We break this belief by introducing Incremental Neural Videos (INV), a per-frame NeRF that is efficiently trained and streamable. We designed INV based on two insights: (1) Our main finding is that MLPs naturally partition themselves into Structure and Color Layers, which store structural and color/texture information respectively. (2) We leverage this property to retain and improve upon knowledge from previous frames, thus amortizing training across frames and reducing redundant learning. As a result, with negligible changes to NeRF, INV can achieve good qualities (>28.6db) in 8min/frame. It can also outperform prior SOTA in 19% less training time. Additionally, our Temporal Weight Compression reduces the per-frame size to 0.3MB/frame (6.6% of NeRF). More importantly, INV is free from buffer lag and is naturally fit for streaming. While this work does not achieve real-time training, it shows that incremental approaches like INV present new possibilities in interactive 3D streaming. Moreover, our discovery of natural information partition leads to a better understanding and manipulation of MLPs. Code and dataset will be released soon.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve high - quality free - view - point video generation in interactive streaming media scenarios (such as video conferencing, tele - presence). Although the existing spatio - temporal radiance field methods can generate realistic free - view - point videos, there is an inevitable delay because a large number of frames need to be buffered for processing, which makes them unsuitable for scenarios requiring real - time interaction. The paper proposes a new method - Incremental Neural Videos (INV). By processing frame by frame, it reduces the delay and has significant optimizations in both training time and storage cost, thus being more suitable for interactive 3D video streaming applications. Specifically, the main contributions of the paper include: 1. **Naturally segmenting the structure layer and the color layer**: It has been found that the multi - layer perceptron (MLP) will naturally divide its internal layers into the early layers that store structural information (structure layer) and the later layers that store color / texture information (color layer). This finding helps to understand the working mechanism of MLP more clearly and provides more effective means of operation. 2. **Designing Incremental Neural Videos (INV)**: Based on the above findings, INV consists of two sub - modules: (1) A color module shared across frames, which is used to encode the color / texture of the scene; (2) A structure module stored per frame, which is used to encode the changing structure of the dynamic scene. This method not only reduces the storage requirements but also improves the training efficiency. 3. **Proposing structure transfer**: This is an incremental training scheme. By using the information already learned in the previous frame to accelerate the training of subsequent frames, the training time is significantly reduced. In addition, the paper also proposes a time - weight compression technique, which further compresses the size of the model, making the size of each INV frame only 0.3MB, accounting for 6.6% of the size of the original NeRF model. These innovations enable INV to complete high - quality training for each frame within a few minutes and are suitable for streaming media transmission, opening up new possibilities for future interactive 3D video applications.

INV: Towards Streaming Incremental Neural Videos

Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos

NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields

NeuVV: Neural Volumetric Videos with Immersive Rendering and Editing

NeVRF: Neural Video-based Radiance Fields for Long-duration Sequences

4D Facial Avatar Reconstruction From Monocular Video via Efficient and Controllable Neural Radiance Fields

IL-NeRF: Incremental Learning for Neural Radiance Fields with Camera Pose Alignment

Learning Neural Volumetric Representations of Dynamic Humans in Minutes.

IOVS4NeRF:Incremental Optimal View Selection for Large-Scale NeRFs

Rate-aware Compression for NeRF-based Volumetric Video

3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos

Progressive Fourier Neural Representation for Sequential Video Compilation

Evaluation of strategies for efficient rate-distortion NeRF streaming

Towards Scalable Neural Representation for Diverse Videos

LiveNVS: Neural View Synthesis on Live RGB-D Streams

Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream

Controllable Free Viewpoint Video Reconstruction Based on Neural Radiance Fields and Motion Graphs.

NERV++: An Enhanced Implicit Neural Video Representation

KiloNeRF: Speeding Up Neural Radiance Fields with Thousands of Tiny MLPs

Efficient Dynamic-NeRF Based Volumetric Video Coding with Rate Distortion Optimization

Baking Neural Radiance Fields for Real-Time View Synthesis