Abstract:Feature pyramid has been an efficient method to extract features at different scales. Development over this method mainly focuses on aggregating contextual information at different levels while seldom touching the inter-level correlation in the feature pyramid. Early computer vision methods extracted scale-invariant features by locating the feature extrema in both spatial and scale dimension. Inspired by this, a convolution across the pyramid level is proposed in this study, which is termed pyramid convolution and is a modified 3-D convolution. Stacked pyramid convolutions directly extract 3-D (scale and spatial) features and outperforms other meticulously designed feature fusion modules. Based on the viewpoint of 3-D convolution, an integrated batch normalization that collects statistics from the whole feature pyramid is naturally inserted after the pyramid convolution. Furthermore, we also show that the naive pyramid convolution, together with the design of RetinaNet head, actually best applies for extracting features from a Gaussian pyramid, whose properties can hardly be satisfied by a feature pyramid. In order to alleviate this discrepancy, we build a scale-equalizing pyramid convolution (SEPC) that aligns the shared pyramid convolution kernel only at high-level feature maps. Being computationally efficient and compatible with the head design of most single-stage object detectors, the SEPC module brings significant performance improvement ($>4$AP increase on MS-COCO2017 dataset) in state-of-the-art one-stage object detectors, and a light version of SEPC also has $\sim3.5$AP gain with only around 7% inference time increase. The pyramid convolution also functions well as a stand-alone module in two-stage object detectors and is able to improve the performance by $\sim2$AP. The source code can be found at <a class="link-external link-https" href="https://github.com/jshilong/SEPC" rel="external noopener nofollow">this https URL</a>.

SIS: A new multi-scale convolutional operator

Lightweight Image Super-Resolution Network Using 3D Convolutional Neural Networks

Single-image Super-Resolution Via Selective Multi-Scale Network

CSINet: A Cross-Scale Interaction Network for Lightweight Image Super-Resolution

A channel-wise multi-scale network for single image super-resolution

Scale-pyramid dynamic atrous convolution for pixel-level labeling

Delving into the Scale Variance Problem in Object Detection

DSIC: Dynamic Sample-Individualized Connector for Multi-Scale Object Detection

A very lightweight and efficient image super-resolution network

Progressive Splitting and Upscaling Structure for Super-Resolution

Multi-scale strip-shaped convolution attention network for lightweight image super-resolution

Single Image Super‐resolution Based on Progressive Fusion of Orientation‐aware Features

Cross-scale collaborative network for single image super resolution

A Lightweight Multi-Scale Channel Attention Network for Image Super-Resolution.

Activating More Information in Arbitrary-Scale Image Super-Resolution

A Single Super-Resolution Method Via Deep Cascade Network

Multi-Scale Implicit Transformer with Re-parameterize for Arbitrary-Scale Super-Resolution

Instance Scale Normalization for image understanding

CSFNet: a compact and efficient convolution-transformer hybrid vision model

Scale-Equalizing Pyramid Convolution for Object Detection

Implicit Grid Convolution for Multi-Scale Image Super-Resolution