Abstract:With the advancement of remote-sensed imaging large volumes of very high resolution land cover images can now be obtained. Automation of object recognition in these 2D images, however, is still a key issue. High intra-class variance and low inter-class variance in Very High Resolution (VHR) images hamper the accuracy of prediction in object recognition tasks. Most successful techniques in various computer vision tasks recently are based on deep supervised learning. In this work, a deep Convolutional Neural Network (CNN) based on symmetric encoder-decoder architecture with skip connections is employed for the 2D semantic segmentation of most common land cover object classes - impervious surface, buildings, low vegetation, trees and cars. Atrous convolutions are employed to have large receptive field in the proposed CNN model. Further, the CNN outputs are post-processed using Fully Connected Conditional Random Field (FCRF) model to refine the CNN pixel label predictions. The proposed CNN-FCRF model achieves an overall accuracy of 90.5% on the ISPRS Vaihingen Dataset.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the automation of object recognition in high - resolution remote - sensing images. Specifically, the author focuses on the task of 2D semantic segmentation in very - high - resolution (VHR) images. Such images are characterized by large intra - class variance and small inter - class variance, which makes accurate prediction difficult. ### Problem Background With the progress of remote - sensing imaging technology, a large number of very - high - resolution surface - cover images can be obtained now. However, the automatic identification of objects in these 2D images remains a key issue. Especially in urban scenes, the visual / spectral characteristics of different objects are similar, while those of the same type of objects may vary greatly, which poses a challenge to the segmentation algorithm. ### Solution To solve the above problems, the author proposes a deep convolutional neural network (CNN) based on a symmetric encoder - decoder architecture and uses atrous convolutions in the model to expand the receptive field. In addition, in order to further optimize the pixel - label prediction results, the author also introduces a Fully Connected Conditional Random Field (FCRF) model for post - processing based on the output of the CNN. ### Main Contributions 1. **Model Architecture**: A CNN model based on a symmetric encoder - decoder architecture is proposed, which includes skip connections and expands the receptive field through atrous convolutions. 2. **Receptive Field Expansion**: Atrous convolutions are used instead of increasing the number of convolutional layers or filter sizes to expand the receptive field, so as to obtain larger context information while keeping the computational cost unchanged. 3. **Post - processing Optimization**: The FCRF model is used to post - process the output of the CNN to smooth noise and improve the segmentation boundary. 4. **Experimental Verification**: Experiments were carried out on the ISPRS Vaihingen data set, and the results show that this method has achieved an overall accuracy of 90.5%. Through these improvements, this paper aims to improve the accuracy of semantic segmentation in remote - sensing images, especially when dealing with images with high intra - class differences and low inter - class differences.

Encoder-Decoder based CNN and Fully Connected CRFs for Remote Sensed Image Segmentation

Parallel-Connected Residual Channel Attention Network for Remote Sensing Image Super-Resolution

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

High-Resolution Remote Sensing Image Semantic Segmentation Method Based on Improved Encoder-Decoder Convolutional Neural Network

Densely Based Multi-Scale and Multi-Modal Fully Convolutional Networks for High-Resolution Remote-Sensing Image Semantic Segmentation

LKASeg:Remote-Sensing Image Semantic Segmentation with Large Kernel Attention and Full-Scale Skip Connections

Assessing CNN and Semantic Segmentation Models for Coarse Resolution Satellite Image Classification in Subcontinental Scale Land Cover Mapping

Cascaded CNN and global–local attention transformer network-based semantic segmentation for high-resolution remote sensing image

Semantic Segmentation of High-Resolution Remote Sensing Images Using Multiscale Skip Connection Network

Hierarchical Self-Attention Embedded Neural Network With Dense Connection for Remote-Sensing Image Semantic Segmentation

An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

Semantic Labeling Of High Resolution Aerial Imagery And Lidar Data With Fine Segmentation Network

LAND COVER CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK WITH REMOTE SENSING DATA AND DIGITAL SURFACE MODEL

MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images

Encoder-Decoder With Cascaded CRFs for Semantic Segmentation

SSNet: A Novel Transformer and CNN Hybrid Network for Remote Sensing Semantic Segmentation

CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation

CCTNet: CNN and Cross-Shaped Transformer Hybrid Network for Remote Sensing Image Semantic Segmentation

An Object-Aware Network Embedding Deep Superpixel for Semantic Segmentation of Remote Sensing Images

Remote Sensing Image Semantic Segmentation Method Based on a Deep Convolutional Neural Network and Multiscale Feature Fusion

EFCNet: Ensemble Full Convolutional Network for Semantic Segmentation of High-Resolution Remote Sensing Images