Abstract:Successful visual recognition networks benefit from aggregating information spanning from a wide range of scales. Previous research has investigated information fusion of connected layers or multiple branches in a block, seeking to strengthen the power of multi-scale representations. Despite their great successes, existing practices often allocate the neurons for each scale manually, and keep the same ratio in all aggregation blocks of an entire network, rendering suboptimal performance. In this paper, we propose to learn the neuron allocation for aggregating multi-scale information in different building blocks of a deep network. The most informative output neurons in each block are preserved while others are discarded, and thus neurons for multiple scales are competitively and adaptively allocated. Our scale aggregation network (ScaleNet) is constructed by repeating a scale aggregation (SA) block that concatenates feature maps at a wide range of scales. Feature maps for each scale are generated by a stack of downsampling, convolution and upsampling operations. The data-driven neuron allocation and SA block achieve strong representational power at the cost of considerably low computational complexity. The proposed ScaleNet, by replacing all 3x3 convolutions in ResNet with our SA blocks, achieves better performance than ResNet and its outstanding variants like ResNeXt and SE-ResNet, in the same computational complexity. On ImageNet classification, ScaleNets absolutely reduce the top-1 error rate of ResNets by 1.12 (101 layers) and 1.82 (50 layers). On COCO object detection, ScaleNets absolutely improve the mmAP with backbone of ResNets by 3.6 (101 layers) and 4.6 (50 layers) on Faster RCNN, respectively. Code and models are released at <a class="link-external link-https" href="https://github.com/Eli-YiLi/ScaleNet" rel="external noopener nofollow">this https URL</a>.

Set Aggregation Network as a Trainable Pooling Layer

DAR-Net: Dynamic Aggregation Network for Semantic Scene Segmentation

Maximal Independent Sets for Pooling in Graph Neural Networks

Building Detail-Sensitive Semantic Segmentation Networks with Polynomial Pooling.

Stochastic Area Pooling for Generic Convolutional Neural Network

Balanced Mixture of SuperNets for Learning the CNN Pooling Architecture

Deep Generalized Max Pooling

Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree

Generic Convolutional Neural Network with Random Pooling Area

Adaptive Salience Preserving Pooling for Deep Convolutional Neural Networks

Deep Layer and Spatial Aggregation neural network for human pose estimation

LIP: Local Importance-Based Pooling

GSAPool: Gated Structure Aware Pooling for Graph Representation Learning.

Detail-Preserving Pooling in Deep Networks

ReiPool: Reinforced Pooling Graph Neural Networks for Graph-Level Representation Learning

Hierarchical Graph Pooling with Self-Adaptive Cluster Aggregation

Pooling in Graph Convolutional Neural Networks

Data-Driven Neuron Allocation for Scale Aggregation Networks

Deep Scattering Network with Max-pooling

When Residual Learning Meets Dense Aggregation: Rethinking the Aggregation of Deep Neural Networks

Wasserstein Pooling for Image Classification