Abstract:Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense classification problems such as semantic segmentation. However, repeated subsampling operations like pooling or convolution striding in deep CNNs lead to a significant decrease in the initial image resolution. Here, we present RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. Further, we introduce chained residual pooling, which captures rich background context in an efficient manner. We carry out comprehensive experiments and set new state-of-the-art results on seven public datasets. In particular, we achieve an intersection-over-union score of 83.4 on the challenging PASCAL VOC 2012 dataset, which is the best reported result to date.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively utilize feature information at different levels to generate high - quality segmentation results in high - resolution semantic segmentation tasks. Specifically, traditional convolutional neural networks (CNNs) perform well in tasks such as object recognition, but when dealing with dense classification problems such as semantic segmentation, due to multiple down - sampling operations (such as pooling or convolution strides), the initial image resolution drops significantly, thus losing a great deal of fine - grained image structure information. This limits the accuracy of the final prediction, especially the accuracy in boundary or detail parts. To solve this problem, the paper proposes ReﬁneNet, which is a multi - path refinement network. It aims to explicitly utilize all available information throughout the down - sampling process through long - distance residual connections to achieve high - resolution prediction. The core idea of ReﬁneNet is to directly refine the deep layers that capture high - level semantic features by fusing fine - grained features from early convolutional layers, thereby retaining and enhancing image detail information and improving the accuracy of segmentation. The main contributions of ReﬁneNet include: 1. Proposing a multi - path refinement network architecture that can utilize multi - level abstract features for high - resolution semantic segmentation. 2. Achieving effective end - to - end training through short - range and long - range residual connections. 3. Introducing a new network component - chained residual pooling, which can efficiently capture background context from large - range image regions. 4. Achieving the latest state - of - the - art performance on seven public datasets, especially on the PASCAL VOC 2012 dataset, reaching an intersection - over - union (IoU) score of 83.4, far surpassing the best method at that time, DeepLab. In summary, the goal of this paper is to solve the problem of detail loss caused by resolution reduction in traditional CNNs in high - resolution semantic segmentation tasks by designing a new network architecture, ReﬁneNet, thereby improving the accuracy and detail performance of segmentation.

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

RefineNet: Multi-Path Refinement Networks for Dense Prediction

Semantic Segmentation with Multi-Path Refinement and Pyramid Pooling Dilated-Resnet.

Sparse Refinement for Efficient High-Resolution Semantic Segmentation

A Deep Semantic Segmentation Network with Semantic and Contextual Refinements

Multilevel feature fusion dilated convolutional network for semantic segmentation

Light-Weight RefineNet for Real-Time Semantic Segmentation

Multi-stage Context Refinement Network for Semantic Segmentation

ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution Segmentation

Refine for Semantic Segmentation Based on Parallel Convolutional Network with Attention Model.

Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation

MFRNet: A Multipath Feature Refinement Network for Semantic Segmentation in High-Resolution Remote Sensing Images

STRNet: Semantic Segmentation Based on Small Target Refinement at Large Scale

RFNet: A Refinement Network for Semantic Segmentation

A Stagewise Refinement Model for Detecting Salient Objects in Images.

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

A Multistage Refinement Network for Salient Object Detection

Hi-ResNet: Edge Detail Enhancement for High-Resolution Remote Sensing Segmentation

High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks

Refinement Co‐supervision Network for Real‐time Semantic Segmentation

Discriminative Features Reconstruction Network For Semantic Segmentation