Multi-Modal Virtual-Real Fusion based Transformer for Collaborative Perception

Hui Zhang,Guiyang Luo,Yuanzhouhan Cao,Yi Jin,Yidong Li
DOI: https://doi.org/10.1109/PAAP56126.2022.10010640
2022-01-01
Abstract:Automobile intelligence and networking have become the inevitable trend in the future development of the automotive industry. Existing intelligent and connected vehicles rely on single-agent intelligence to perform the basic perception, which is still weak in dealing with the problem of accurate recognition and positioning in complex traffic scenes such as small and far away objects. To tackle this issue, we propose a multi-model virtual-real fusion Transformer for collaborative perception. Specifically, to possess the complementary information from both RGB images and LiDAR point clouds, we propose the multi-model virtual-real fusion (MVRF) method, which generates virtual points and compensates for the lack of point information on sparse locations. Furthermore, the heterogeneous graph attention network (HGAN) is constructed to capture the inter-agent interaction and adaptively incorporate multiple agents’ features. The HGAN contains a series of encoder layers, each of which has a heterogeneous inter-agent attention module and a multi-scale self-attention module, which motivates to learn different relationships based on various agents’ types and simultaneously capture the global and local spatial attention. Extensive experiments demonstrate that the proposed method gains superior performance as compared with state-of-the-art methods.
What problem does this paper attempt to address?