R-CCF: region-aware continual contrastive fusion for weakly supervised object detection
Yongqiang Zhang,Rui Tian,Yin Zhang,Zian Zhang,Yancheng Bai,Mingli Ding,Wangmeng Zuo
DOI: https://doi.org/10.1007/s10489-024-05403-3
IF: 5.3
2024-04-04
Applied Intelligence
Abstract:Weakly-supervised learning has emerged as a compelling method for object detection by reducing the fully annotated labels requirement in the training procedure. Recently, some works have treated the detection task as a classification task, resulting in highlighting only discriminative object parts. Moreover, fully-supervised object detectors use specific modules (e.g. feature pyramid networks (FPN) and region proposal network (RPN)) to accurately localize target objects, while weakly-supervised object detectors, such as a well-designed module for object localization, rarely exist. To address the above challenges and gaps, we propose a region-aware continual contrastive fusion (R-CCF) module, which can be plugged into any off-the-shelf weak detector to improve detection performance by refining object location. Specifically, a novel region association (RA) algorithm is proposed to automatically query similarities of the most discriminative regions with their surrounding regions and then to form new rough object locations. Furthermore, we introduce an effective object integration (OI) constraint, including a class sub-constraint and a distance sub-constraint, to refine the rough object locations from the RA algorithm further and achieve accurate object regions. By integrating our R-CCF module into weakly supervised detector architectures and training end-to-end, we can continually refine object locations by contrastively fusing the discriminative regions with surrounding patches. Extensive experiments demonstrate the effectiveness of the proposed method in weakly supervised object detection and show that integrating R-CCF into the state-of-the-art MIST [ 1 ] achieves 58.3% in mAP on the PASCAL VOC2007 benchmark, surpassing MIST by 0.2% absolutely. Moreover, R-CCF based on OICR [ 2 ] and WSDDN [ 3 ] achieve 42.5% and 32.5% in mAP on the PASCAL VOC2007, which is 1.3% and 2.1% higher than the baseline detectors, respectively. We also test the robustness of R-CCF on the PASCAL VOC 2012 dataset, and R-CCF outperforms the baseline methods clearly.
computer science, artificial intelligence