Abstract:There is an inherent and longstanding challenge for vision learners to exploit informative features from digital images with spatial redundancy. Given pre-processing image methods require task-specific customization and may rise unanticipated poor performance due to redundancy removal, we explore improving vision learners to combat spatial redundancy during vision learning, a task-agnostic and robust solution. Among popular vision learners, vision transformers with self-attention can mitigate pixel redundancy by capturing global dependencies, while convolutional learners fall into locality via a limited receptive field. To this end, based on investigating inter-pixel spatial redundancy of images, in this work, we propose spectral norm attention (SNA), a novel yet efficient attention block to help convolutional neural networks (CNNs) highlight informative features. We can seamlessly plug SNA into off-the-shelf CNNs to suppress the contributions of redundant features by globally differentiating and weighting. In particular, SNA performs singular value decomposition (SVD) on intermediate features of each image within a mini-batch to obtain its spectral norm. The features in the direction of the spectral norm are most informative, while the discriminative features in other directions leave less. Hence, we apply the rank-one approximation of the spectral norm direction as attention weights to enhance informative features. Besides, we adopt the power iteration algorithm to approximate the spectral norm to significantly reduce the matrix computation overhead during training, thus keeping inference speed on par with vanilla CNNs. We extensively evaluate our SNA on four mainstream natural datasets to demonstrate the effectiveness and favourability of our SNA against its counterparts. In addition, the experimental results of image classification and object detection show our SNA can bring more gains to medical images with heavy redundancy than other state-of-the-art attention modules.

A Generic Visualization Approach for Convolutional Neural Networks

Interpreting and Improving Attention From the Perspective of Large Kernel Convolution

Variational Structured Attention Networks for Deep Visual Representation Learning

Spatial Global Context Attention for Convolutional Neural Networks: an Efficient Method

Visualizing and Analyzing Convolution Neural Networks with Gradient Information

GAttANet: Global attention agreement for convolutional neural networks

Modelling attention control using a convolutional neural network designed after the ventral visual pathway

RetinotopicNet: An Iterative Attention Mechanism Using Local Descriptors with Global Context

A Visual Cortex-Attentive Deep Convolutional Neural Network for Digital Image Design

Learning 1D Causal Visual Representation with De-focus Attention Networks

Coupled Attention Framework of Convolutional Neural Network Based on Computer Intelligence

Visualization of Convolutional Neural Networks for Monocular Depth Estimation

MCA: Multidimensional Collaborative Attention in Deep Convolutional Neural Networks for Image Recognition

V-CNN: when Convolutional Neural Network Encounters Data Visualization.

Convolutional Neural Network optimization via Channel Reassessment Attention module

How convolutional neural network see the world - A survey of convolutional neural network visualization methods

Information Bottleneck Approach to Spatial Attention Learning

Joint Spatial and Layer Attention for Convolutional Networks

Combating spatial redundancy with spectral norm attention in convolutional learners

Generating Self-Attention Activation Maps for Visual Interpretations of Convolutional Neural Networks

Learning Spatial-Channel Attention for Visual Tracking