Yuchen Xia,Quan Yuan,Guiyang Luo,Xiaoyuan Fu,Yang Li,Xuanhan Zhu,Tianyou Luo,Siheng Chen,Jinglin Li
Abstract:Collaborative perception in autonomous driving significantly enhances the perception capabilities of individual agents. Immutable heterogeneity in collaborative perception, where agents have different and fixed perception networks, presents a major challenge due to the semantic gap in their exchanged intermediate features without modifying the perception networks. Most existing methods bridge the semantic gap through interpreters. However, they either require training a new interpreter for each new agent type, limiting extensibility, or rely on a two-stage interpretation via an intermediate standardized semantic space, causing cumulative semantic loss. To achieve both extensibility in immutable heterogeneous scenarios and low-loss feature interpretation, we propose PolyInter, a polymorphic feature interpreter. It contains an extension point through which emerging new agents can seamlessly integrate by overriding only their specific prompts, which are learnable parameters intended to guide the interpretation, while reusing PolyInter's remaining parameters. By leveraging polymorphism, our design ensures that a single interpreter is sufficient to accommodate diverse agents and interpret their features into the ego agent's semantic space. Experiments conducted on the OPV2V dataset demonstrate that PolyInter improves collaborative perception precision by up to 11.1% compared to SOTA interpreters, while comparable results can be achieved by training only 1.4% of PolyInter's parameters when adapting to new agents.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the challenges in **immutable heterogeneous collaborative perception**. Specifically, the paper focuses on the problem in the autonomous driving scenario that, due to different manufacturers or models, the perception network structures of different vehicles are fixed and diverse, resulting in the intermediate features exchanged between them being too heterogeneous and difficult to be understood by other vehicles. This heterogeneity exists not only at the semantic level but also involves differences in feature size and distribution.
To solve this problem, existing methods usually rely on interpreters, but these methods have the following limitations:
1. **Poor scalability**: For each new type of agent, a new interpreter needs to be retrained, which limits the scalability of the system.
2. **Cumulative semantic loss**: Through two - stage interpretation (that is, first converting features to a standard semantic space and then from the standard space to the target semantic space), it will lead to cumulative semantic loss.
For this reason, the paper proposes a polymorphic feature interpreter named **PolyInter**. The main contributions of PolyInter are:
- **High scalability**: By inheriting the trained interpreters and only adjusting the specific prompts related to the new neighbor agents, the system can seamlessly integrate new types of agents.
- **Low semantic loss**: Through the Channel Selection Module and the Spatial Attention Module, it ensures that the semantic loss is minimized during the single - stage interpretation process.
### How PolyInter works
The design of PolyInter is based on two stages:
1. **Base Model Training**: Existing agents jointly train the interpreter network, the general prompt, and the specific prompts for each neighbor agent. The interpreter network is responsible for interpreting the semantics of neighbor agents into the semantic space of the ego agent, and the prompts guide feature interpretation.
2. **Generalization**: When a new neighbor agent is added, only the specific prompts related to the new agent need to be fine - tuned, while keeping the interpreter network and other parameters unchanged. This enables the system to adapt to new agent types without retraining the entire interpreter.
In this way, PolyInter achieves efficient immutable heterogeneous collaborative perception, significantly improves the accuracy of collaborative perception, and reduces the need for parameter fine - tuning.
### Experimental results
The experimental results show that PolyInter outperforms the existing state - of - the - art immutable heterogeneous feature interpreters (such as PnPDA and MPDA) on the OPV2V dataset. Specifically, PolyInter improves the average precision (AP) of collaborative perception by 7.9% (IoU = 0.5) and 11.1% (IoU = 0.7) respectively, and only needs to train about 1.4% of the parameters when adapting to new agents.
In conclusion, this paper solves the scalability and semantic loss problems in immutable heterogeneous collaborative perception by proposing PolyInter, providing a more efficient and accurate solution for collaborative perception in autonomous driving.