Feature Fusion for Weakly Supervised Object Localization

Xu Tang,Yonghong Song,Yuanlin Zhang
DOI: https://doi.org/10.1109/cac.2018.8623227
2018-01-01
Abstract:Improving the precision of weakly supervised multi-scale objects localization is of significant challenge in computer vision, especially when tackling the small objects. in this paper, we propose to integrate the feature pyramid network (FPN) with convolutional neural network (CNN) for weakly supervised object localization, where the FPN is built upon the outputs of different layers of the CNN. Then, we upsample the high-level maps by nearest-neighbor interpolation and fuse with the low-level maps in the FPN to produce multi-scale fused maps which features of both high resolution and strong semantics. Finally, we produce class activation maps by each layer of the FPN and gain multiple prediction scores by wildcat spatial pooling. To acquire more precise localization, we select the class activation map that corresponds to the highest score across all multi-scale maps for object localization. In particular, we choose the maximum response regions of the class activation map for point-wise localization and choose the largest connected component above the threshold in the class activation map for bounding box localization. By applying the proposed strategy over PASCAL VOC dataset and MS COCO dataset, it is demonstrated that our strategy is highly effective in improving the precision of weakly supervised object localization as compared with some of the state-of-the-art weakly supervised methods.
What problem does this paper attempt to address?