Abstract:Vehicle-to-Vehicle (V2V) cooperative perception has become increasingly popular in the field of autonomous driving, effectively overcoming the inherent limitations of single-vehicle perception systems, such as limited range and susceptibility to occlusions. In a V2V system, vehicles in close proximity can share perception data. To fuse this data, which is collected from different viewpoints by each vehicle, accurate pose information (including position and heading direction) is essential to transform the received data to the receiving vehicle's viewpoint. However, pose errors, often caused by measurement noise or sensor failures, can lead to severe misalignment during data fusion, resulting in incorrect object detections and potentially hazardous decisions in autonomous driving systems. To address this challenge, we present BB-Align, a lightweight pose recovery framework that utilizes Lidar Bird's-eye View (BV) images and object bounding Boxes for relative pose estimation. Designed as a plug-and-play solution, the proposed method requires no additional model training, enabling effortless integration into existing V2V systems. Our approach uses Lidar-derived BV images with a Log-Gabor filter-based feature map for effective image matching despite image sparsity. To reduce errors from self-motion distortion, we also integrate object bounding boxes for finer alignment. The proposed method is rigorously evaluated on the V2V4Real dataset-currently the only real-world V2V dataset. Our approach demonstrates high pose estimation accuracy, outperforming an existing graph-matching method. It achieves translation and rotation errors of less than 1 m and 1., respectively, in 80% of cases within a 70 m range between vehicles. Furthermore, by integrating the proposed framework into cooperative object detection models under serious pose error, the result shows up to a 2x increase in Average Precision (AP) compared to those without pose recovery, with more pronounced improvements in the short range.

AgentAlign: Misalignment-Adapted Multi-Agent Perception for Resilient Inter-Agent Sensor Correlations

BB-Align: A Lightweight Pose Recovery Framework for Vehicle-to-Vehicle Cooperative Perception

Align Before Collaborate: Mitigating Feature Misalignment for Robust Multi-agent Perception

Robust Long-Range Perception Against Sensor Misalignment in Autonomous Vehicles

An Extensible Framework for Open Heterogeneous Collaborative Perception

Collaborative Multimodal Fusion Network for Multiagent Perception

Bridging the Domain Gap for Multi-Agent Perception

Timealign: A multi-modal object detection method for time misalignment fusing in autonomous driving

A Spatial Alignment Framework Using Geolocation Cues for Roadside Multi-View Multi-Sensor Fusion

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection

NLOS Dies Twice: Challenges and Solutions of V2X for Cooperative Perception

AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection

V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric Heterogenous Distillation Network

Adaptive Communications in Collaborative Perception with Domain Alignment for Autonomous Driving

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection.

MACP: Efficient Model Adaptation for Cooperative Perception

Distance-Aware Attentive Framework for Multi-Agent Collaborative Perception in Presence of Pose Error.

Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection

A Collaborative Perception Network Based on Dynamic Multi-scale Fusion

Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment

R-ACP: Real-Time Adaptive Collaborative Perception Leveraging Robust Task-Oriented Communications