Learning to Learn Multiview Detection by Camera-Aware Attention

Lin Chen,Xinyu Yuan,Hung-Min Hsu,Zhongwei Cheng
DOI: https://doi.org/10.1109/ICMEW63481.2024.10645396
2024-07-15
Abstract:Multiview detection task is to utilize multiple camera views to reduce the severity of the occlusion, the critical part of which is the multiview aggregation. The aggregated ground plane feature can be acquired based on the convolutional feature map projections from multiple views, and it uses the same weight to fuse information from all cameras. However, di-rectly using information from all cameras is suboptimal, as the object features undergo various occlusions according to their positions and corresponding camera perspectives. In this paper, we propose a novel meta-learning based multi-view detector, dubbed as MetaMVDet, that adopts a newly introduced camera-aware attention to aggregate the multiview information. Our camera-aware attention aims to select reli-able information from different camera views to reduce the ambiguity by occlusions. We leverage both 2D and 3D information simultaneously while maintaining 2D-3D multiview consistency to guide the learning of the multiview detection network. The proposed solution achieves the state-of-the-art accuracy on two major multiview-detection benchmarks.
Computer Science
What problem does this paper attempt to address?