Cross attention redistribution with contrastive learning for few shot object detection

Jianing Quan,Baozhen Ge,Lei Chen
DOI: https://doi.org/10.1016/j.displa.2022.102162
IF: 3.074
2022-04-01
Displays
Abstract:Few-shot object detection aims to learn to detect novel objects from only a few annotated samples. Most training frameworks adopt the fusing of high-dimensional features with semantic information on the support images to learn the recognition and localization process of novel objects on the query images. Most prior works directly use a cross-correlation mechanism to integrate semantic information from support features. However, such operations will introduce noise to the query features, confusing the generation of region proposals and affecting the final localization precision. In this paper, we focus on sufficient mining and integrating the support features conducive to generating regional proposals to improve further the stability and accuracy of the few-shot object detector. We propose a cross-attention redistribution (CAReD) module to adaptively integrate support features into query features, effectively removing harmful support features and enhancing the regional features of novel categories. In addition, to classify the novel instances accurately, it is necessary to minimize the intra-class distance while maximizing the inter-class distance. To this end, this paper proposes a network training strategy based on contrastive learning, which can better supervise the training process of CAReD and, more importantly, can effectively improve the classification precision for bounding boxes. Experiments on Pascal VOC and MS-COCO datasets show that CAReD significantly improves upon two baseline detectors (+ 3.6% on Pascal VOC benchmark and + 4.4% on MS-COCO benchmark), achieving state-of-the-art results under few-shot detection settings.
engineering, electrical & electronic,instruments & instrumentation,optics,computer science, hardware & architecture
What problem does this paper attempt to address?