StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation

Yining Shi,Kun Jiang,Ke Wang,Jiusi Li,Yunlong Wang,Mengmeng Yang,Diange Yang

2024-06-11

Abstract:Predicting the future occupancy states of the surrounding environment is a vital task for autonomous driving. However, current best-performing single-modality methods or multi-modality fusion perception methods are only able to predict uniform snapshots of future occupancy states and require strictly synchronized sensory data for sensor fusion. We propose a novel framework, StreamingFlow, to lift these strong limitations. StreamingFlow is a novel BEV occupancy predictor that ingests asynchronous multi-sensor data streams for fusion and performs streaming forecasting of the future occupancy map at any future timestamps. By integrating neural ordinary differential equations (N-ODE) into recurrent neural networks, StreamingFlow learns derivatives of BEV features over temporal horizons, updates the implicit sensor's BEV features as part of the fusion process, and propagates BEV states to the desired future time point. It shows good zero-shot generalization ability of prediction, reflected in the interpolation of the observed prediction time horizon and the reasonable inference of the unseen farther future period. Extensive experiments on two large-scale datasets, nuScenes and Lyft L5, demonstrate that StreamingFlow significantly outperforms previous vision-based, LiDAR-based methods, and shows superior performance compared to state-of-the-art fusion-based methods.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the issue of future environment occupancy state prediction in autonomous driving scenarios. Specifically: 1. **Continuous Time Prediction**: Existing methods can only predict at fixed frequencies (e.g., every 0.1 seconds or every 0.2 seconds), whereas the method proposed in this paper can predict at any given timestamp. This allows autonomous driving algorithms to have shorter latency and faster response times. 2. **Asynchronous Multi-Modal Fusion**: Existing multi-sensor fusion methods require strictly synchronized data input. The method proposed in this paper can effectively fuse data streams in an asynchronous manner, thereby relaxing the requirement for sensor synchronization. By introducing the Neural Ordinary Differential Equation (N-ODE) framework, the paper proposes a new method—StreamingFlow, which can perform occupancy prediction over a continuous time range and supports asynchronous multi-modal sensor fusion. This method not only improves prediction accuracy but also enhances the model's generalization ability, especially excelling in unseen long-term predictions. ### Main Contributions 1. Proposed the first streaming occupancy flow prediction framework that supports asynchronous multi-sensor fusion; 2. Designed a novel time feature propagation strategy to achieve high-density continuous occupancy prediction; 3. Achieved state-of-the-art performance on widely used nuScenes and Lyft L5 datasets, validating the effectiveness and robustness of the algorithm.

StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation

Self-supervised Multi-future Occupancy Forecasting for Autonomous Driving

FSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving

Occupancy Flow Fields for Motion Forecasting in Autonomous Driving

Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving

Streaming Motion Forecasting for Autonomous Driving

Flow-guided Motion Prediction with Semantics and Dynamic Occupancy Grid Maps

Predicting Future Spatiotemporal Occupancy Grids with Semantics for Autonomous Driving

VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction

VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction

Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction

OFMPNet: Deep End-to-End Model for Occupancy and Flow Prediction in Urban Environment

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

FutureNet-LOF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding

AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction

Unsupervised video forecasting with flow parsing mechanism of human visual system

StopNet: Scalable Trajectory and Occupancy Prediction for Urban Autonomous Driving

OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction

A multi-modal spatial–temporal model for accurate motion forecasting with visual fusion

StreamYOLO: Real-time Object Detection for Streaming Perception

ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction