Weakly Supervised Object Localization Using Long-Range Semantic Foreground Activation

Lianxing Wang,Huaxiong Li
DOI: https://doi.org/10.1109/icpr56361.2022.9956339
2022-01-01
Abstract:Weakly supervised object localization (WSOL) is a challenging task to find the object location using only image-level supervision. Previous works using CNN architectures stack in finding only the most discriminative parts. Transformer-based methods expand the finding areas but they still fail to take full advantage of attention information. To address this problem, we propose the Long-range Semantic Foreground Activation (LSFA) method. We demonstrate that localization maps should be generated by attention parameters under the guidance of non-discriminative foreground features. In LSFA, a visual transformer model is first used to generate attention information. Then, a long-range dependency activation module is constructed to help the learned attention maps to focus more on global information by weighting them with different parameters. Finally, a semantic foreground activation module is built to use non-discriminative foreground regions learned from attention maps as indices to activate token semantic areas. Experiments on two benchmark datasets CUB-200-2011 and ILSVRC demonstrate the superiority of our LSFA method in comparison with other state-of-the-art WSOL approaches.
What problem does this paper attempt to address?