Index Your Position: A Novel Self-Supervised Learning Method for Remote Sensing Images Semantic Segmentation

Dilxat Muhtar,Xueliang Zhang,Pengfeng Xiao
DOI: https://doi.org/10.1109/tgrs.2022.3177770
IF: 8.2
2022-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Learning effective visual representations without human supervision is a critical problem for the task of semantic segmentation of remote sensing images (RSIs), where pixel-level annotations are difficult to obtain. Self-supervised learning (SSL), which learns useful representations by creating artificial supervised learning problems, has recently emerged as an effective method to learn from unlabelled data. Current SSL methods are generally trained on ImageNet through image-level prediction tasks. We argue that this is suboptimal for application in semantic segmentation of RSIs since it does not take into account spatial position information between objects, which is critical for the segmentation of RSIs characterized by multiobject. In this study, we propose a novel self-supervised dense representation learning method, IndexNet, for the semantic segmentation of RSIs. On the one hand, considering the multiobject characteristics of RSIs, IndexNet learns pixel-level representations by tracking object positions, while maintaining sensitivity to object position changes to ensure that no mismatches are caused. On the other hand, by combining image-level contrast and pixel-level contrast, IndexNet can learn spatiotemporal invariant features. Experimental results show that our method works better than ImageNet pretraining and outperforms state-of-the-art (SOTA) SSL methods. Code and pretrained models will be available at https://github.com/pUmpKin-Co/offical-IndexNet.
What problem does this paper attempt to address?