H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection

Yunqiu Xu,Yifan Sun,Zongxin Yang,Jiaxu Miao,Yi Yang
DOI: https://doi.org/10.1109/cvpr52688.2022.01393
2022-01-01
Abstract:Cross-domain weakly supervised object detection (CD-WSOD) aims to adapt the detection model to a novel target domain with easily acquired image-level annotations. How to align the source and target domains is critical to the CDWSOD accuracy. Existing methods usually focus on partial detection components for domain alignment. In contrast, this paper considers that all the detection components are important and proposes a Holistic and Hier-archical Feature Alignment (H 2 FA) R-CNN. H2FA R-CNN enforces two image-level alignments for the backbone features, as well as two instance-level alignments for the RPN and detection head. This coarse-to-fine aligning hierarchy is in pace with the detection pipeline, i.e., processing the image-level feature and the instance-level features from bottom to top. Importantly, we devise a novel hybrid supervision method for learning two instance-level align-ments. It enables the RPN and detection head to simultane-ously receive weak/full supervision from the target/source domains. Combining all these feature alignments, H2 FA R-CNN effectively mitigates the gap between the source and target domains. Experimental results show that H2 FA R-CNN significantly improves cross-domain object detection accuracy and sets new state of the art on popular benchmarks. Code and pre-trained models are available at https://github.com/XuYunqiu/H2FA_R-CNN.
What problem does this paper attempt to address?