Scale-aware Deep Reinforcement Learning for High Resolution Remote Sensing Imagery Classification

Yinhe Liu,Yanfei Zhong,Sunan Shi,Liangpei Zhang
DOI: https://doi.org/10.1016/j.isprsjprs.2024.01.013
IF: 12.7
2024-01-01
ISPRS Journal of Photogrammetry and Remote Sensing
Abstract:Land-use/land-cover (LULC) classification of high spatial resolution (HSR) remote sensing imagery has been successfully improved using deep learning techniques. However, the current deep learning-based classification methods necessitate the division of remote sensing imagery into smaller and fixed image patches, primarily due to computational constraints arising from the extensive size of these images. This approach limits the receptive field of the classification network and hinders the handling of different-scale LULC objects. A key problem is how to automatically select the appropriate scale of patch for different objects with a deep learning network. To address this challenge, a scale-aware classification network (SAN) based on deep reinforcement learning (DRL) is proposed. In SAN, the state of each image patch is represented by a reduced-resolution version of the high-spatial-resolution (HSR) remote sensing image, referred to as a 'thumbnail', and a positional encoding. The scale selection actions are performed by a scale control agent. A feature indexing module is also proposed to enhance the ability of the agent to distinguish the location of the current image patch. The action switches the patch scale and the viewing area of context branch of a two-branch classification network, which extracts and fuses the features of the multi-scale images. The SAN framework adjusts the network parameters to perform the appropriate scale selection action based on the mapping reward received for the selected scale. In this way, the SAN framework is able to introduce more appropriate contexts by adjusting the scale of the network input based on RL, without the need for labeled scale selection samples. The experimental results obtained using two publicly available datasets and a newly built dataset demonstrate that SAN outperforms the previous LULC deep learning methods with fixed patches, particularly for large-scale mapping applications. When compared to state-of-the-art approaches such as GLNet and WiCoNet, which combine global and local information for segmentation, as well as CascadePSP and MagNet, renowned for their progressive segmentation capabilities, SAN consistently demonstrates approximately 10% higher accuracy. The codes for this research are openly available at http://rsidea.whu.edu.cn/resource_sharing.htm.
What problem does this paper attempt to address?