Fine-Grained Road Scene Understanding from Aerial Images Based on Semisupervised Semantic Segmentation Networks

Rong Xiao,Yuze Wang,Chao Tao
DOI: https://doi.org/10.1109/lgrs.2021.3059708
IF: 5.343
2021-01-01
IEEE Geoscience and Remote Sensing Letters
Abstract:High-precision electronic maps are required to provide more detailed and accurate information than traditional maps. With the rapid development of high-resolution remote sensing technology, it has become possible to extract fine-grained road scene information such as vehicles, road lines, zebra crossings, ground signs, and lane widths of roads from unmanned aerial vehicle (UAV) remote sensing images, which opens up opportunities for automatic mapping high-precision maps. The traditional method of deciphering remote sensing images is often obtained through manual visual interpretation. Due to the high cost and long lead time of this method, it leads to inefficiencies in updating large amounts of information. To address this problem, this letter models the fine-grained road scene understanding task as an image semantic segmentation problem and innovatively proposes a semisupervised fully convolutional neural network to extract the information efficiently at a low cost. Compared with the traditional supervised full convolutional neural network, this method can simultaneously optimize the standard supervised classification loss on labeled samples and the unsupervised consistency loss on unlabeled samples by using an integrated prediction technology and then input them to the end-to-end semantic segmentation network for training. This method is designed to effectively improve the classification accuracy of the semantic segmentation network and validly alleviates overfitting problems in the case of small numbers of labeled samples. In order to verify the effectiveness of this method, we constructed a data set for experimental, which is used to verify the effect of a variable number of unlabeled samples on model performance. Experimental results show that our method can efficiently complete the extraction of fine-grained road scene information such as vehicles, road lines, zebra crossings, ground signs, and lane widths of roads with a small number of labeled samples.
What problem does this paper attempt to address?