Abstract:Both high-level and high-resolution feature representations are of great importance in various visual understanding tasks. To acquire high-resolution feature maps with high-level semantic information, one common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps, such as the dilatedFCN-based methods for semantic segmentation. However, due to many convolution operations are conducted on the high-resolution feature maps, such methods have large computational complexity and memory consumption. To balance the performance and efficiency, there also exist encoder-decoder structures that gradually recover the spatial information by combining multi-level feature maps from a feature encoder, such as the FPN architecture for object detection and the U-Net for semantic segmentation. Although being more efficient, the performances of existing encoder-decoder methods for semantic segmentation are far from comparable with the dilatedFCN-based methods. In this paper, we propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding is achieved via novel holistic codeword generation and codeword assembly operations, which take advantages of both the high-level and low-level features from the encoder features. With the proposed holistically-guided decoder, we implement the EfficientFCN architecture for semantic segmentation and HGD-FPN for object detection and instance segmentation. The EfficientFCN achieves comparable or even better performance than state-of-the-art methods with only 1/3 of their computational costs for semantic segmentation on PASCAL Context, PASCAL VOC, ADE20K datasets. Meanwhile, the proposed HGD-FPN achieves $>2\%$>2% higher mean Average Precision (mAP) when integrated into several object detection frameworks with ResNet-50 encoding backbones.

Image Semantic Segmentation Based on Encoder-Decoder Network

Image Semantic Segmentation Based on Region and Deep Residual Network

High-Resolution Remote Sensing Image Semantic Segmentation Method Based on Improved Encoder-Decoder Convolutional Neural Network

Semantic Image Segmentation with Improved Position Attention and Feature Fusion

DESENet: a bilateral network with detail-enhanced semantic encoder for real-time semantic segmentation

Image Segmentation Using Encoder-Decoder with Deformable Convolutions

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

DARSegNet: A Real-Time Semantic Segmentation Method Based on Dual Attention Fusion Module and Encoder-Decoder Network

A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection.

Efficient Multi-scale Network for Semantic Segmentation of fine-Resolution Remotely Sensed Images

LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation

Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation

DSNet:Multi-resolution Dense Encoder and Stack Decoder Network for Aerial Image Segmentation

IIE-SegNet: Deep Semantic Segmentation Network With Enhanced Boundary Based on Image Information Entropy

An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

Discriminative Features Reconstruction Network For Semantic Segmentation

Encoder-decoder with double spatial pyramid for semantic segmentation.

EFDCNet: Encoding Fusion and Decoding Correction Network for RGB-D Indoor Semantic Segmentation

Encoder- and Decoder-Based Networks Using Multiscale Feature Fusion and Nonlocal Block for Remote Sensing Image Semantic Segmentation

A Top-Down Manner-Based DCNN Architecture for Semantic Image Segmentation.

LMANet: A Lightweight Asymmetric Semantic Segmentation Network Based on Multi-Scale Feature Extraction