Feature and Model Level Fusion of Pretrained CNN for Remote Sensing Scene Classification

Peijun Du,Erzhu Li,Junshi Xia,Alim Samat,Xuyu Bai
DOI: https://doi.org/10.1109/jstars.2018.2878037
IF: 4.715
2018-01-01
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Abstract:Convolutional neural networks (CNN) have attracted tremendous attention in the remote sensing community due to its excellent performance in different domains. Especially for remote sensing scene classification, the CNN-based methods have brought a great breakthrough. However, it is not feasible to fully design and train a new CNN model for remote sensing scene classification, as this usually requires a large number of training samples and high computational costs. To alleviate these limitations of fully training a new model, some work attempts to use the pretrained CNN models as feature extractors to build feature representation of scene images for classification and has achieved impressive results. In this scheme, how to construct feature representation of scene image via the pretrained CNN model becomes the key process. Existing studies paid a little attention to build more discriminative feature representation by exploring the potential benefits of multilayer features from a single CNN model and different feature representations from multiple CNN models. To this end, this paper presents a fusion strategy to build the feature representation of the scene images by integrating multilayer features of a single pretrained CNN model, and extends it to a framework of multiple CNN models. For these purposes, a multiscale improved Fisher kernel coding method is used to build feature representation of the scene images on convolutional layers, and a feature fusion approach based on two feature subspace learning methods [principal component analysis (PCA)/spectral regression kernel discriminant analysis and PCA/spectral regression kernel locality preserving projection] is proposed to construct final fused features for scene classification. For validation and comparison purposes, the proposed approaches are evaluated with two challenging high-resolution remote sensing datasets and shows the competitive performance compared with existing state-of-the-art baselines such as fully trained CNN models, fine tuning CNN models, and other related works.
What problem does this paper attempt to address?