Abstract:Learning a single static convolutional kernel in each convolutional layer is the common training paradigm of modern Convolutional Neural Networks (CNNs). Instead, recent research in dynamic convolution shows that learning a linear combination of $n$ convolutional kernels weighted with their input-dependent attentions can significantly improve the accuracy of light-weight CNNs, while maintaining efficient inference. However, we observe that existing works endow convolutional kernels with the dynamic property through one dimension (regarding the convolutional kernel number) of the kernel space, but the other three dimensions (regarding the spatial size, the input channel number and the output channel number for each convolutional kernel) are overlooked. Inspired by this, we present Omni-dimensional Dynamic Convolution (ODConv), a more generalized yet elegant dynamic convolution design, to advance this line of research. ODConv leverages a novel multi-dimensional attention mechanism with a parallel strategy to learn complementary attentions for convolutional kernels along all four dimensions of the kernel space at any convolutional layer. As a drop-in replacement of regular convolutions, ODConv can be plugged into many CNN architectures. Extensive experiments on the ImageNet and MS-COCO datasets show that ODConv brings solid accuracy boosts for various prevailing CNN backbones including both light-weight and large ones, e.g., 3.77%~5.71%|1.86%~3.72% absolute top-1 improvements to MobivleNetV2|ResNet family on the ImageNet dataset. Intriguingly, thanks to its improved feature learning ability, ODConv with even one single kernel can compete with or outperform existing dynamic convolution counterparts with multiple kernels, substantially reducing extra parameters. Furthermore, ODConv is also superior to other attention modules for modulating the output features or the convolutional weights.

KAConv: Kernel Attention Convolutions

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention

Interpreting and Improving Attention From the Perspective of Large Kernel Convolution

RFAConv: Innovating Spatial Attention and Standard Convolutional Operation

Omni-Dimensional Dynamic Convolution

KVT: K-Nn Attention for Boosting Vision Transformers.

An Attention Module for Convolutional Neural Networks

Learning Lightweight Dynamic Kernels With Attention Inside via Local–Global Context Fusion

Kernel Product Neural Networks

Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN

Strengthening Dynamic Convolution With Attention and Residual Connection in Kernel Space

GMConv: Modulating Effective Receptive Fields for Convolutional Kernels

KNLConv: Kernel-space Non-local Convolution for Hyperspectral Image Super-resolution

ELA: Efficient Local Attention for Deep Convolutional Neural Networks

Convolutional Neural Network optimization via Channel Reassessment Attention module

Self-attentional Convolution for Neural Networks

Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms

Selective Kernel Networks

KernelWarehouse: Rethinking the Design of Dynamic Convolution

Coupled Attention Framework of Convolutional Neural Network Based on Computer Intelligence

CNXA: A Novel Attention Mechanism Aided Convolution Network