Abstract:Scene flow depicts the dynamics of a 3D scene, which is critical for various applications such as autonomous driving, robot navigation, AR/VR, etc. Conventionally, scene flow is estimated from dense/regular RGB video frames. With the development of depth-sensing technologies, precise 3D measurements are available via point clouds which have sparked new research in 3D scene flow. Nevertheless, it remains challenging to extract scene flow from point clouds due to the sparsity and irregularity in typical point cloud sampling patterns. One major issue related to irregular sampling is identified as the randomness during point set abstraction/feature extraction -- an elementary process in many flow estimation scenarios. A novel Spatial Abstraction with Attention (SA^2) layer is accordingly proposed to alleviate the unstable abstraction problem. Moreover, a Temporal Abstraction with Attention (TA^2) layer is proposed to rectify attention in temporal domain, leading to benefits with motions scaled in a larger range. Extensive analysis and experiments verified the motivation and significant performance gains of our method, dubbed as Flow Estimation via Spatial-Temporal Attention (FESTA), when compared to several state-of-the-art benchmarks of scene flow estimation.

What problem does this paper attempt to address?

This paper attempts to solve the problem of estimating scene flow from point - cloud data. Specifically, traditional scene - flow estimation methods mainly rely on dense/regular RGB video frames. However, with the development of depth - sensing technology, it has become possible to obtain accurate 3D measurements through point clouds, which has inspired a new research direction for 3D scene flow. Nevertheless, due to the sparsity and irregularity of the point - cloud sampling pattern, extracting the scene flow from point clouds remains challenging. In particular, the randomness existing in the point - set abstraction/feature - extraction process leads to the problem of unstable abstraction, which is a fundamental process in many flow - estimation scenarios. To this end, the paper proposes two innovative layers: 1. **Spatial Abstraction Attention (SA2) layer**: It aims to alleviate the problem of unstable abstraction. By introducing a trainable Aggregation Pooling (AP) module, the SA2 layer can generate more stable down - sampled points, thereby defining more stable regions of attention. 2. **Temporal Abstraction Attention (TA2) layer**: It is used to correct attention in the time domain, so as to better handle motions at different scales. Through the initial scene - flow estimation, the TA2 layer can adjust the regions of attention in time to more corresponding positions. Through these innovations, the method proposed in the paper (called FESTA) has demonstrated significant performance improvements in multiple benchmark tests, especially in synthetic and real - world scene - flow estimation tasks. ### Main Contributions 1. **Proposing the SA2 layer**: It achieves stable point - cloud abstraction and can generate invariant position points regardless of how the point cloud is sampled from the scene manifold, thereby defining stable regions of attention. The effectiveness of the SA2 layer has been verified theoretically and empirically. 2. **Proposing the TA2 layer**: It can estimate small - scale and large - scale motions by emphasizing the regions where good matches are more likely to be found, regardless of the scale of the motion. 3. **FESTA architecture**: In synthetic and real - world benchmark tests, the FESTA architecture has achieved state - of - the - art performance in 3D point - cloud scene - flow estimation, significantly outperforming existing scene - flow estimation methods. ### Experimental Verification The paper verifies the stability of the SA2 layer and the overall performance of the FESTA architecture through a series of experiments. The experimental results show that the SA2 layer is significantly superior to the traditional Farthest Point Sampling (FPS) method in terms of point - cloud abstraction stability, especially when the point - cloud sampling density is high. In addition, the FESTA architecture performs well in multiple benchmark tests, especially when dealing with large - scale and small - scale motions. ### Conclusion By introducing the SA2 and TA2 layers, this paper effectively solves the key problem of estimating scene flow from point clouds and provides important technical support for fields such as autonomous driving, robot navigation, AR/VR, etc.

FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity.

DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point Clouds

Self-Supervised Scene Flow Estimation with Point-Voxel Fusion and Surface Representation

Spatial-frequency attention-based optical and scene flow with cross-modal knowledge distillation

STARFlow: Spatial Temporal Feature Re-embedding with Attentive Learning for Real-world Scene Flow

SAFIT: Segmentation-Aware Scene Flow with Improved Transformer

SSFlowNet: Semi-supervised Scene Flow Estimation On Point Clouds With Pseudo Label

Hierarchical Attention Learning of Scene Flow in 3D Point Clouds

Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction

Kalman-Based Scene Flow Estimation for Point Cloud Densification and 3D Object Detection in Dynamic Scenes

3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

Flow4D: Leveraging 4D Voxel Network for LiDAR Scene Flow Estimation

Active Scene Flow Estimation for Autonomous Driving via Real-Time Scene Prediction and Optimal Decision

Optical Flow as Spatial-Temporal Attention Learners

3D Point-Voxel Correlation Fields for Scene Flow Estimation.

3D Scene Flow Estimation on Pseudo-LiDAR: Bridging the Gap on Estimating Point Motion

Let-It-Flow: Simultaneous Optimization of 3D Flow and Object Clustering

TMA: Temporal Motion Aggregation for Event-based Optical Flow

FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous Driving

DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement