Flow4D: Leveraging 4D Voxel Network for LiDAR Scene Flow Estimation

Jaeyeul Kim,Jungwan Woo,Ukcheol Shin,Jean Oh,Sunghoon Im
2024-07-11
Abstract:Understanding the motion states of the surrounding environment is critical for safe autonomous driving. These motion states can be accurately derived from scene flow, which captures the three-dimensional motion field of points. Existing LiDAR scene flow methods extract spatial features from each point cloud and then fuse them channel-wise, resulting in the implicit extraction of spatio-temporal features. Furthermore, they utilize 2D Bird's Eye View and process only two frames, missing crucial spatial information along the Z-axis and the broader temporal context, leading to suboptimal performance. To address these limitations, we propose Flow4D, which temporally fuses multiple point clouds after the 3D intra-voxel feature encoder, enabling more explicit extraction of spatio-temporal features through a 4D voxel network. However, while using 4D convolution improves performance, it significantly increases the computational load. For further efficiency, we introduce the Spatio-Temporal Decomposition Block (STDB), which combines 3D and 1D convolutions instead of using heavy 4D convolution. In addition, Flow4D further improves performance by using five frames to take advantage of richer temporal information. As a result, the proposed method achieves a 45.9% higher performance compared to the state-of-the-art while running in real-time, and won 1st place in the 2024 Argoverse 2 Scene Flow Challenge. The code is available at <a class="link-external link-https" href="https://github.com/dgist-cvlab/Flow4D" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the performance and efficiency problems of LiDAR scene flow estimation in autonomous driving. Specifically, the existing LiDAR scene flow methods have the following limitations: 1. **Loss of spatial information**: Existing methods usually use 2D Bird's Eye View (BEV) representation, which leads to the loss of spatial information on the Z - axis. 2. **Insufficient temporal information**: Most methods only process two frames and fail to fully utilize the richer historical temporal information. 3. **Inadequate spatio - temporal feature extraction**: Traditional methods first extract spatial features from each point cloud and then obtain temporal correlation through channel fusion. This approach cannot explicitly extract spatio - temporal features. To solve these problems, the paper proposes Flow4D, a LiDAR scene flow estimation framework based on 4D voxel networks. The main improvements of Flow4D include: - **4D voxel representation**: By adding a time dimension on the basis of 3D voxels to form a 4D voxel representation, spatio - temporal features can be explicitly extracted. - **Spatio - Temporal Decomposition Block (STDB)**: To reduce the computational burden of 4D convolution, a method of decomposing 4D convolution into 3D spatial convolution and 1D temporal convolution is proposed. - **Multi - frame fusion**: Five consecutive frames are used to capture richer spatio - temporal information, thereby improving the accuracy of scene flow estimation. These improvements enable Flow4D to achieve a 45.9% higher performance than existing methods on the Argoverse 2 dataset and maintain high computational efficiency in real - time operation. ### Formula summary - **Scene flow vector decomposition**: \[ F_{t,t + 1}=F_{t,t + 1}^{\text{ego}}+F_{t,t + 1}^{\text{motion}} \] where \(F_{t,t + 1}^{\text{ego}}\) represents the ego - vehicle motion and \(F_{t,t + 1}^{\text{motion}}\) represents the motion vectors of each point. - **Voxelized feature extraction**: - Initial point feature \(f_{\tau}^{p}\in\mathbb{R}^{N_{\tau}\times16}\) - Initial voxel feature \(f_{\tau}^{v}\in\mathbb{R}^{W\times L\times H\times16}\) - **4D voxel feature**: \[ f^{4D}\in\mathbb{R}^{W\times L\times H\times5\times16} \] These formulas and methods work together to enable Flow4D to achieve significant performance improvement in the scene flow estimation task.