Fusion-attention network using dense scale-invariant feature transform flow image and point cloud for 3D pedestrian detection

DOI: https://doi.org/10.1007/s11042-024-19466-4
IF: 2.577
2024-06-06
Multimedia Tools and Applications
Abstract:In this paper, we introduce a fusion-attention network for three-dimensional (3D) pedestrian detection using the fusion of dense scale-invariant feature transform (SIFT) flow image and point cloud data. Because a pedestrian has a small size and shape, the point cloud data of the pedestrian are insufficient. The absence of point data needs supplementation with an RGB image. However, fusing the RGB image with the point cloud is difficult because of dimensional difference. To fuse the RGB image and point cloud data, point-wise fusion is employed by extracting features from points on the RGB image corresponding to the point cloud. To extract more meaningful features from images, the RGB image is replaced by dense SIFT flow, which represents the movement of the RGB image. To evaluate the proposed method, experimental results were compared with other state-of-the-art methods on the KITTI 3D detection validation set. Then, three ablation studies were conducted. First, to verify the effect of dense SIFT flow, the results were compared with optical flow and RGB image. Second, various fusion methods for dense SIFT flow image and point cloud features were analyzed. Eventually, a final ablation study was set up to ascertain the applicability of the proposed algorithm in a real outdoor driving environment by using a two-dimensional (2D) detection and tracking algorithm instead of a 2D ground-truth boundary box.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?