Delving deep into spatial pooling for squeeze-and-excitation networks

Xin Jin,Yanping Xie,Xiu-Shen Wei,Bo-Rui Zhao,Zhao-Min Chen,Xiaoyang Tan
DOI: https://doi.org/10.1016/j.patcog.2021.108159
IF: 8
2022-01-01
Pattern Recognition
Abstract:<p>Squeeze-and-Excitation (SE) blocks have demonstrated significant accuracy gains for state-of-the-art deep architectures by re-weighting channel-wise feature responses. The SE block is an architecture unit that integrates two operations: a squeeze operation that employs <em>global</em> average pooling to aggregate spatial convolutional features into a channel feature, and an excitation operation that learns instance-specific channel weights from the squeezed feature to re-weight each channel. In this paper, we revisit the squeeze operation in SE blocks, and shed lights on why and how to embed rich (both <em>global</em> and <em>local</em>) information into the excitation module at minimal extra costs. In particular, we introduce a simple but effective two-stage spatial pooling process: <em>rich descriptor extraction</em> and <em>information fusion</em>. The rich descriptor extraction step aims to obtain a set of diverse (<em>i.e</em>., global and especially local) deep descriptors that contain more informative cues than global average-pooling. While, absorbing more information delivered by these descriptors via a fusion step can aid the excitation operation to return more accurate re-weight scores in a data-driven manner. We validate the effectiveness of our method by extensive experiments on ImageNet for image classification and on MS-COCO for object detection and instance segmentation. For these experiments, our method achieves consistent improvements over the SENets on all tasks, in some cases, by a large margin.</p>
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?