Abstract:Both high-level and high-resolution feature representations are of great importance in various visual understanding tasks. To acquire high-resolution feature maps with high-level semantic information, one common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps, such as the dilatedFCN-based methods for semantic segmentation. However, due to many convolution operations are conducted on the high-resolution feature maps, such methods have large computational complexity and memory consumption. To balance the performance and efficiency, there also exist encoder-decoder structures that gradually recover the spatial information by combining multi-level feature maps from a feature encoder, such as the FPN architecture for object detection and the U-Net for semantic segmentation. Although being more efficient, the performances of existing encoder-decoder methods for semantic segmentation are far from comparable with the dilatedFCN-based methods. In this paper, we propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding is achieved via novel holistic codeword generation and codeword assembly operations, which take advantages of both the high-level and low-level features from the encoder features. With the proposed holistically-guided decoder, we implement the EfficientFCN architecture for semantic segmentation and HGD-FPN for object detection and instance segmentation. The EfficientFCN achieves comparable or even better performance than state-of-the-art methods with only 1/3 of their computational costs for semantic segmentation on PASCAL Context, PASCAL VOC, ADE20K datasets. Meanwhile, the proposed HGD-FPN achieves $>2\%$>2% higher mean Average Precision (mAP) when integrated into several object detection frameworks with ResNet-50 encoding backbones.

DSNet:Multi-resolution Dense Encoder and Stack Decoder Network for Aerial Image Segmentation

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

High-Resolution Remote Sensing Image Semantic Segmentation Method Based on Improved Encoder-Decoder Convolutional Neural Network

Densely Based Multi-Scale and Multi-Modal Fully Convolutional Networks for High-Resolution Remote-Sensing Image Semantic Segmentation

DESENet: a bilateral network with detail-enhanced semantic encoder for real-time semantic segmentation

ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution Segmentation

Efficient Multi-scale Network for Semantic Segmentation of fine-Resolution Remotely Sensed Images

Aerial-BiSeNet: A real-time semantic segmentation network for high resolution aerial imagery

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Hierarchical Self-Attention Embedded Neural Network With Dense Connection for Remote-Sensing Image Semantic Segmentation

SSNet: A Novel Transformer and CNN Hybrid Network for Remote Sensing Semantic Segmentation

D-CANet: Diverse Class-Aware Coding and Decoding Structure Network for Semantic Segmentation of High-Resolution Remote Sensing Images

Dense Pyramid Network for Semantic Segmentation of High Resolution Aerial Imagery.

Semantic Image Segmentation with Improved Position Attention and Feature Fusion

Multispectral Semantic Land Cover Segmentation From Aerial Imagery With Deep Encoder–Decoder Network

Semantic Segmentation of Aerial Imagery Via Split-Attention Networks with Disentangled Nonlocal and Edge Supervision

A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection.

An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

Dual attention deep fusion semantic segmentation networks of large-scale satellite remote-sensing images

Dual-Path Geometry-Aware Network for Semantic Segmentation of High-Resolution Aerial Images

Scale-Aware Neural Network for Semantic Segmentation of Multi-Resolution Remote Sensing Images