DiffCP: Ultra-Low Bit Collaborative Perception via Diffusion Model

Ruiqing Mao,Haotian Wu,Yukuan Jia,Zhaojun Nan,Yuxuan Sun,Sheng Zhou,Deniz Gündüz,Zhisheng Niu
2024-09-29
Abstract:Collaborative perception (CP) is emerging as a promising solution to the inherent limitations of stand-alone intelligence. However, current wireless communication systems are unable to support feature-level and raw-level collaborative algorithms due to their enormous bandwidth demands. In this paper, we propose DiffCP, a novel CP paradigm that utilizes a specialized diffusion model to efficiently compress the sensing information of collaborators. By incorporating both geometric and semantic conditions into the generative model, DiffCP enables feature-level collaboration with an ultra-low communication cost, advancing the practical implementation of CP systems. This paradigm can be seamlessly integrated into existing CP algorithms to enhance a wide range of downstream tasks. Through extensive experimentation, we investigate the trade-offs between communication, computation, and performance. Numerical results demonstrate that DiffCP can significantly reduce communication costs by 14.5-fold while maintaining the same performance as the state-of-the-art algorithm.
Computer Vision and Pattern Recognition,Machine Learning,Multiagent Systems
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the contradiction between the high - bandwidth requirements and low communication costs faced by Collaborative Perception (CP) in Intelligent Unmanned Systems (IUSs) under the bandwidth limitations of existing wireless communication systems. Specifically: 1. **Background problems**: - A single - agent framework (such as an autonomous vehicle, an intelligent robot) is limited by factors such as sensor failures, limited sensing range, and environmental occlusion, and it is difficult to meet the requirements for safety and reliability. - Although Device - to - Device (D2D) communication technologies (such as the sidelink in C - V2X networks) enable agents to share sensing information through wireless channels, it is still challenging to achieve high - reliability and low - latency transmissions in densely deployed, highly mobile, and obstructed environments. 2. **Limitations of existing methods**: - The raw - data - level CP method, although retaining detailed information, requires a huge amount of bandwidth (for example, a 64 - line LiDAR requires approximately 360 Mbps, and a single HD camera requires approximately 20 Mbps), far exceeding the current C - V2X channel capacity. - The object - level CP method reduces the bandwidth requirement (approximately 150 Kbps) by transmitting detection results, but it depends on the individual detection capabilities of each agent, limiting the overall performance. - Feature - level CP methods (such as F - Cooper and V2VNet) compress raw data for communication, but these methods still have limitations in performance or bandwidth efficiency. 3. **Solutions proposed in the paper**: - DiffCP, a novel CP paradigm based on the diffusion model, is proposed, which can achieve feature - level collaboration within the object - level communication cost. - DiffCP uses geometric and semantic conditional generation models to efficiently compress the perception information of collaborators, thereby significantly reducing communication costs while maintaining high performance. - Through experimental verification, DiffCP can maintain the same performance as the state - of - the - art algorithms while reducing the communication cost by 14.5 times. In summary, this paper aims to solve the high - communication - cost problem of existing CP methods in bandwidth - limited environments by introducing DiffCP based on the diffusion model, thereby promoting the practical application and development of collaborative perception systems.