Dynamic Parallel Pyramid Networks for Scene Recognition

Kai Liu,Seungbin Moon
DOI: https://doi.org/10.1109/tnnls.2021.3129227
IF: 14.255
2021-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Scene recognition is considered a challenging task of image recognition, mainly due to the presence of multiscale information of global layout and local objects in a given scene. Recent convolutional neural networks (CNNs) that can learn multiscale features have achieved remarkable progress in scene recognition. They have two limitations: 1) the receptive field (RF) size is fixed even though a scene may have large-scale variations and 2) they are computing and memory intensive, partially due to the representation of multiscales. To address these limitations, we propose a lightweight dynamic scene recognition approach based on a novel architectural unit, namely, a dynamic parallel pyramid (DPP) block, that can adaptively select RF size based on multiscale information from the input regarding channel dimensions. We encode multiscale features by applying different convolutional (CONV) kernels on different input tensor channels and then dynamically merge their output using a group attention mechanism followed by channel shuffling to generate the parallel feature pyramid. DPP can be easily incorporated with existing CNNs to develop new deep models, called DPP networks (DPP-Nets). Extensive experiments on large-scale scene image datasets, Places365 standard, Places365 challenge, the Massachusetts Institute of Technology (MIT) Indoor67, and Sun397 confirmed that the proposed method provides significant performance improvement compared with current state-of-the-art (SOTA) approaches. We also verified general applicability from compelling results on lightweight models of MobileNetV2 and ShuffleNetV2 on ImageNet-1k and small object centralized benchmarks on CIFAR-10 and CIFAR-100.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?