BB-Align: A Lightweight Pose Recovery Framework for Vehicle-to-Vehicle Cooperative Perception
Lixing Song,William Valentine,Qing Yang,Honggang Wang,Hua Fang,Ye Liu
DOI: https://doi.org/10.1109/icdcs60910.2024.00098
2024-01-01
Abstract:Vehicle-to-Vehicle (V2V) cooperative perception has become increasingly popular in the field of autonomous driving, effectively overcoming the inherent limitations of single-vehicle perception systems, such as limited range and susceptibility to occlusions. In a V2V system, vehicles in close proximity can share perception data. To fuse this data, which is collected from different viewpoints by each vehicle, accurate pose information (including position and heading direction) is essential to transform the received data to the receiving vehicle's viewpoint. However, pose errors, often caused by measurement noise or sensor failures, can lead to severe misalignment during data fusion, resulting in incorrect object detections and potentially hazardous decisions in autonomous driving systems. To address this challenge, we present BB-Align, a lightweight pose recovery framework that utilizes Lidar Bird's-eye View (BV) images and object bounding Boxes for relative pose estimation. Designed as a plug-and-play solution, the proposed method requires no additional model training, enabling effortless integration into existing V2V systems. Our approach uses Lidar-derived BV images with a Log-Gabor filter-based feature map for effective image matching despite image sparsity. To reduce errors from self-motion distortion, we also integrate object bounding boxes for finer alignment. The proposed method is rigorously evaluated on the V2V4Real dataset-currently the only real-world V2V dataset. Our approach demonstrates high pose estimation accuracy, outperforming an existing graph-matching method. It achieves translation and rotation errors of less than 1 m and 1., respectively, in 80% of cases within a 70 m range between vehicles. Furthermore, by integrating the proposed framework into cooperative object detection models under serious pose error, the result shows up to a 2x increase in Average Precision (AP) compared to those without pose recovery, with more pronounced improvements in the short range.