Abstract:Visual Attention Networks (VAN) with Large Kernel Attention (LKA) modules have been shown to provide remarkable performance, that surpasses Vision Transformers (ViTs), on a range of vision-based tasks. However, the depth-wise convolutional layer in these LKA modules incurs a quadratic increase in the computational and memory footprints with increasing convolutional kernel size. To mitigate these problems and to enable the use of extremely large convolutional kernels in the attention modules of VAN, we propose a family of Large Separable Kernel Attention modules, termed LSKA. LSKA decomposes the 2D convolutional kernel of the depth-wise convolutional layer into cascaded horizontal and vertical 1-D kernels. In contrast to the standard LKA design, the proposed decomposition enables the direct use of the depth-wise convolutional layer with large kernels in the attention module, without requiring any extra blocks. We demonstrate that the proposed LSKA module in VAN can achieve comparable performance with the standard LKA module and incur lower computational complexity and memory footprints. We also find that the proposed LSKA design biases the VAN more toward the shape of the object than the texture with increasing kernel size. Additionally, we benchmark the robustness of the LKA and LSKA in VAN, ViTs, and the recent ConvNeXt on the five corrupted versions of the ImageNet dataset that are largely unexplored in the previous works. Our extensive experimental results show that the proposed LSKA module in VAN provides a significant reduction in computational complexity and memory footprints with increasing kernel size while outperforming ViTs, ConvNeXt, and providing similar performance compared to the LKA module in VAN on object recognition, object detection, semantic segmentation, and robustness tests.

Spectral Leakage and Rethinking the Kernel Size in CNNs

Kernel Orthogonality does not necessarily imply a Decrease in Feature Map Redundancy in CNNs: Convolutional Similarity Minimization

Enhanced Convolutional Neural Tangent Kernels

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression

A Closer Look at Fourier Spectrum Discrepancies for CNN-generated Images Detection

Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbations

Shift-ConvNets: Small Convolutional Kernel with Large Kernel Effects

An Adaptive Kernels Layer for Deep Neural Networks Based on Spectral Analysis for Image Applications

The Neural Tangent Link Between CNN Denoisers and Non-Local Filters

Translational symmetry in convolutions with localized kernels causes an implicit bias toward high frequency adversarial examples

Visual Analytics of Neuron Vulnerability to Adversarial Attacks on Convolutional Neural Networks

Adversarial amplitude swap towards robust image classifiers

Reverse engineering convolutional neural networks through side-channel information leaks

An Enhanced Convolutional Neural Network in Side-Channel Attacks and Its Visualization

Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers

Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN

Evaluating Adversarial Robustness in the Spatial Frequency Domain

Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches

Examining and Mitigating Kernel Saturation in Convolutional Neural Networks using Negative Images