Abstract:Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is severely reduced due to the obstruction posed by a large number of road users. Collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages the diverse perspective thanks to the presence at multiple locations of connected agents to form a complete scene representation, is an appealing solution. State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach where the Bird-Eye View images of point clouds are exchanged so that the bandwidth consumption is lower than communicating point clouds as in early collaboration, and the detection performance is higher than late collaboration, which fuses agents' output, thanks to a deeper interaction among connected agents. While achieving strong performance, the real-world deployment of most mid-collaboration approaches is hindered by their overly complicated architectures, involving learnable collaboration graphs and autoencoder-based compressor/ decompressor, and unrealistic assumptions about inter-agent synchronization. In this work, we devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior state-of-the-art methods while minimizing changes made to the single-vehicle detection models and relaxing unrealistic assumptions on inter-agent synchronization. Experiments on the V2X-Sim dataset show that our collaboration method achieves 98\% of the performance of an early-collaboration method, while only consuming the equivalent bandwidth of a late-collaboration method.

CenterCoop: Center-Based Feature Aggregation for Communication-Efficient Vehicle-Infrastructure Cooperative 3D Object Detection

Slim-FCP: Lightweight-Feature-Based Cooperative Perception for Connected Automated Vehicles

Occlusion-Guided Multi-Modal Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

CoFF: Cooperative Spatial Feature Fusion for 3D Object Detection on Autonomous Vehicles

Vehicle-Infrastructure Cooperative 3D Object Detection via Feature Flow Prediction

SparseComm: an Efficient Sparse Communication Framework for Vehicle-Infrastructure Cooperative 3D Detection

Cooperative Perception for 3D Object Detection in Driving Scenarios using Infrastructure Sensors

F-Cooper: Feature based Cooperative Perception for Autonomous Vehicle Edge Computing System Using 3D Point Clouds

Enhancing 3D object detection through multi-modal fusion for cooperative perception

HEAD: A Bandwidth-Efficient Cooperative Perception Approach for Heterogeneous Connected and Autonomous Vehicles

HP3D-V2V: High-Precision 3D Object Detection Vehicle-to-Vehicle Cooperative Perception Algorithm

VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection

Keypoints-Based Deep Feature Fusion for Cooperative Vehicle Detection of Autonomous Driving

Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

SmartCooper: Vehicular Collaborative Perception with Adaptive Fusion and Judger Mechanism

Cooper: Cooperative Perception for Connected Autonomous Vehicles based on 3D Point Clouds

CoFormerNet: A Transformer-Based Fusion Approach for Enhanced Vehicle-Infrastructure Cooperative Perception

Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

Leveraging Temporal Contexts to Enhance Vehicle-Infrastructure Cooperative Perception

SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles