Abstract:Weakly-supervised learning has emerged as a compelling method for object detection by reducing the fully annotated labels requirement in the training procedure. Recently, some works have treated the detection task as a classification task, resulting in highlighting only discriminative object parts. Moreover, fully-supervised object detectors use specific modules (e.g. feature pyramid networks (FPN) and region proposal network (RPN)) to accurately localize target objects, while weakly-supervised object detectors, such as a well-designed module for object localization, rarely exist. To address the above challenges and gaps, we propose a region-aware continual contrastive fusion (R-CCF) module, which can be plugged into any off-the-shelf weak detector to improve detection performance by refining object location. Specifically, a novel region association (RA) algorithm is proposed to automatically query similarities of the most discriminative regions with their surrounding regions and then to form new rough object locations. Furthermore, we introduce an effective object integration (OI) constraint, including a class sub-constraint and a distance sub-constraint, to refine the rough object locations from the RA algorithm further and achieve accurate object regions. By integrating our R-CCF module into weakly supervised detector architectures and training end-to-end, we can continually refine object locations by contrastively fusing the discriminative regions with surrounding patches. Extensive experiments demonstrate the effectiveness of the proposed method in weakly supervised object detection and show that integrating R-CCF into the state-of-the-art MIST [ 1 ] achieves 58.3% in mAP on the PASCAL VOC2007 benchmark, surpassing MIST by 0.2% absolutely. Moreover, R-CCF based on OICR [ 2 ] and WSDDN [ 3 ] achieve 42.5% and 32.5% in mAP on the PASCAL VOC2007, which is 1.3% and 2.1% higher than the baseline detectors, respectively. We also test the robustness of R-CCF on the PASCAL VOC 2012 dataset, and R-CCF outperforms the baseline methods clearly.

H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection

Cross-domain Object Detection by Local to Global Object-Aware Feature Alignment

Joint Feature-Level And Pixel-Level Domain Adaption For Object Detection In The Wild

HOMDA: High-Order Moment-Based Domain Alignment for Unsupervised Domain Adaptation

RFA-Net: Reconstructed Feature Alignment Network for Domain Adaptation Object Detection in Remote Sensing Imagery

Deeply Aligned Adaptation for Cross-domain Object Detection

R-CCF: region-aware continual contrastive fusion for weakly supervised object detection

AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection

HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

HOFA-Net: A High-order Feature Association Network for Dense Object Detection in Remote Sensing

Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation

Partial Alignment for Object Detection in the Wild

Multi-Level Domain Adaptive Learning For Cross-Domain Detection

Loose to compact feature alignment for domain adaptive object detection

Collaborative Learning for Weakly Supervised Object Detection

Multi-Granularity Alignment Domain Adaptation for Object Detection

MLFA: Toward Realistic Test Time Adaptive Object Detection by Multi-Level Feature Alignment

Weakly Aligned Feature Fusion for Multimodal Object Detection

iFAN: Image-Instance Full Alignment Networks for Adaptive Object Detection

Spatial Alignment for Unsupervised Domain Adaptive Single-Stage Object Detection

A Semantic Consistency Feature Alignment Object Detection Model Based on Mixed-Class Distribution Metrics