Convolution Tells Where to Look

Fan Xu,Lijuan Duan,Yuanhua Qiao,Ji Chen
DOI: https://doi.org/10.1007/978-3-030-88013-2_2
2021-01-01
Abstract:Many attention models have been introduced to boost the representational power of convolutional neural networks (CNNs). Most of them are self-attention models which generate an attention mask based on current features, like spatial attention and channel attention model. However, these attention models may not achieve good results when the current features are the low-level features of CNNs. In this work, we propose a new lightweight attention unit, feature difference (FD) model, which utilizes the difference between two feature maps to generate the attention mask. The FD module can be integrated into most of the state-of-the-art CNNs like ResNets and VGG just by adding some shortcut connections, which does not introduce any additional parameters and layers. Extensive experiments show that the FD model can help improve the performance of the baseline on four benchmarks, including CIFAR10, CIFAR100, ImageNet-1K, and VOC PASCAL. Note that ResNet44 (6.10% error) with FD model achieves better results than ResNet56 (6.24% error), while the former has fewer parameters than the latter one by 29%.
What problem does this paper attempt to address?