Learning Distilled Collaboration Graph for Multi-Agent Perception

Yiming Li,Shunli Ren,Pengxiang Wu,Siheng Chen,Chen Feng,Wenjun Zhang
DOI: https://doi.org/10.48550/arXiv.2111.00643
2022-01-16
Abstract:To promote better performance-bandwidth trade-off for multi-agent perception, we propose a novel distilled collaboration graph (DiscoGraph) to model trainable, pose-aware, and adaptive collaboration among agents. Our key novelties lie in two aspects. First, we propose a teacher-student framework to train DiscoGraph via knowledge distillation. The teacher model employs an early collaboration with holistic-view inputs; the student model is based on intermediate collaboration with single-view inputs. Our framework trains DiscoGraph by constraining post-collaboration feature maps in the student model to match the correspondences in the teacher model. Second, we propose a matrix-valued edge weight in DiscoGraph. In such a matrix, each element reflects the inter-agent attention at a specific spatial region, allowing an agent to adaptively highlight the informative regions. During inference, we only need to use the student model named as the distilled collaboration network (DiscoNet). Attributed to the teacher-student framework, multiple agents with the shared DiscoNet could collaboratively approach the performance of a hypothetical teacher model with a holistic view. Our approach is validated on V2X-Sim 1.0, a large-scale multi-agent perception dataset that we synthesized using CARLA and SUMO co-simulation. Our quantitative and qualitative experiments in multi-agent 3D object detection show that DiscoNet could not only achieve a better performance-bandwidth trade-off than the state-of-the-art collaborative perception methods, but also bring more straightforward design rationale. Our code is available on <a class="link-external link-https" href="https://github.com/ai4ce/DiscoNet" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address the trade-off between performance and bandwidth in multi-agent perception systems. Specifically, the authors propose a novel distilled collaborative graph (DiscoGraph) to model trainable, pose-aware, and adaptive collaboration among agents. The key innovations of the paper are in two aspects: 1. **Teacher-Student Framework**: DiscoGraph is trained through knowledge distillation. The teacher model employs early collaboration, inputting data from a global perspective; the student model is based on intermediate collaboration, inputting data from a single perspective. The framework trains DiscoGraph by constraining the post-collaboration feature maps in the student model to match the corresponding feature maps in the teacher model. 2. **Matrix-Valued Edge Weights**: Matrix-valued edge weights are introduced in DiscoGraph, where each element reflects inter-agent attention in specific spatial regions, allowing agents to adaptively highlight information-rich areas. Through these innovations, the paper aims to improve the performance of multi-agent perception systems while maintaining low communication bandwidth. The paper is validated on the V2X-Sim 1.0 dataset, and experimental results show that DiscoNet not only outperforms existing collaborative perception methods in terms of the performance-bandwidth trade-off but also brings more intuitive design principles.