Remote sensing image classification based on deep learning and conditional random fields
Meng Xia,Guo Cao,Guangya Wang,Yanfeng Shang
DOI: https://doi.org/10.11834/jig.170122
2017-01-01
Journal of Image and Graphics
Abstract:Objective Remote sensing image classification refers to the use of computers to analyze the spectral and spatial information of various land cover objects in remote sensing images,divide feature space into non-overlapping subspaces,and place a pixel into a specific subspace.In computer vision,this procedure aims to assign a predefined semantic label to each pixel in an image.This process is also called "semantic segmentation." The rapid development of computer application technology,aerospace,and sensor technology in recent years has resulted in numerous methods for acquiring different types of remote sensing image data.As an important aspect of remote sensing technology,the classification of high-resolution remote sensing imagery has gained considerable attention.A novel image classification method is proposed in this study.This method is based on a fully connected conditional random field (CRF) model,which is combined with a convolutional neural network (CNN).These two models are merged to utilize their respective advantages to further improve classification accuracy for remote sensing images.Method On the one hand,most traditional classification methods typically rely on artificial experiences to extract the characteristics of training samples.After learning,a single-layer feature without a hierarchical structure is obtained.These methods generally have shallow structures,and the features they produced are relatively simple.By contrast,as a new research direction in the field of machine learning,deep learning can transform the feature representation of training samples from the original space into a new feature space layer by layer,as well as learn to automatically yield a hierarchical feature representation,which is conducive to classification and feature visualization.For the past years,this new subject has achieved a significant breakthrough in the field of computer vision applications,such as visual recognition challenges,image classification,and object detection.As one of its representatives,CNN has been widely used in pattern recognition to avoid the complex preprocessing of images.We use CNN in this study to replace the traditional classification methods to obtain essential features of the input image.On the other hand,traditional classification methods are based on the spectral statistical characteristics of pixels.These methods are also known as pixel-wise classification methods.They analyze the spectral information of each pixel individually by using a statistical learning algorithm,such as support vector machine (SVM),maximum likelihood classification,minimum distance method,decision tree,and k-means clustering.These methods typically produce high classification errors and results with low accuracies because they do not consider the rich spatial contextual information of images.We draw support from the probabilistic graphical model,which is one of the research hot spots in machine learning and pattern recognition,to solve this problem.When this model is utilized,researchers cannot only use Bayesian probability statistic theory to solve the problem,but also mature graph theory to deal with contextual information.As an excellent representative of a probabilistic graphical model,the CRF model for 1D sequence data processing was proposed by Lafferty in 2001.This model can incorporate spatial contextual information in the aspects of labels and observed data.The uniqueness of this model is that it can be flexible to modeling posterior distribution directly.The early CRF model was mainly used in natural language processing and speech recognition fields,and then it was successfully applied to image processing by Kumar and Hebert in 2003.Although considerable research has been conducted on CRF models,the conventional CRF still exhibits oversmoothing problems.Therefore,we add regional restriction (RR) to enhance the consistency of the classification results in connected areas to protect the edge structure of land cover objects.In summary,the steps of our proposed method are as follows.We preclassify the entire remote sensing image into certain land cover types via CNN using the results of class membership probabilities as the unary potential in the CRF model.The pairwise potential of CRF is defined by a linear combination of Gaussian kernels,which forms a fully connected neighbor structure instead of the common four-neighbor or eight-neighbor structure.RR is also incorporated into the framework to promote the consistency of connected areas.We use the mean shift algorithm to obtain superpixels and correct the classification results by calculating their average posterior probabilities.A highly efficient approximate inference algorithm,namely,mean field inference,is generated for the final model.Result Our experimental results,which are based on three different remote sensing images,demonstrate that the proposed classification framework exhibits competitive quantitative and qualitative performances,which effectively alleviate salt-and-pepper classification noise,improve the oversmoothing phenomenon,and protect the edge structure of land cover objects.The experiments are conducted using class accuracy,overall classification accuracy (OA),average classification accuracy (AA),and the kappa coefficient for the entire quantitative analysis.Compared with those of SVM,CNN,and fully connected CRF,the final accuracies of our experiments are significantly improved.AA is increased by 3.28 percentage points,OA is increased by 3.22 percentage points,and the kappa coefficient is increased by 5.07 percentage points.Conclusion Traditional classification methods have two shortcomings.The first problem is insufficient feature extraction,which leads inaccurate classification results.The second problem is that pixel-based methods only consider the information of single points and disregard the mutual influence of surrounding points.The combination of CNN and CRF cannot only obtain the essential characteristics of pixels,but also considers the contextual information of an image.Therefore,our method can achieve accurate classification results.Moreover,the integration of RR can protect the edge structure of land cover objects to yield a satisfactory classification performance.The proposed method is accurate and effective,and it can be used in remote sensing image classification.