Weakly Supervised Semantic Segmentation in Aerial Imagery via Explicit Pixel-Level Constraints

Ruixue Zhou,Wenkai Zhang,Zhiqiang Yuan,Xuee Rong,Wenjie Liu,Kun Fu,Xian Sun
DOI: https://doi.org/10.1109/tgrs.2022.3224477
IF: 8.2
2022-12-10
IEEE Transactions on Geoscience and Remote Sensing
Abstract:In recent years, image-level weakly supervised semantic segmentation (WSSS) has developed rapidly in natural scenes due to the easy availability of classification tags. However, limited to complex backgrounds, multicategory scenes, and dense small targets in remote sensing (RS) images, relatively little research has been conducted in this field. To alleviate the impact of the above problems in RS scenes, a self-supervised Siamese network based on an explicit pixel-level constraints framework is proposed, which greatly improves the quality of class activation maps and positioning accuracy in multicategory RS scenes. Specifically, there are three novel devices in this article to promote performance to a new level: 1) a pixel-soft classification loss is proposed, which realizes explicit constraints on pixels during the image-level training; 2) a pixel global awareness module, which captures high-level semantic context and low-level pixel spatial information, is constructed to improve the consistency and accuracy of RS object segmentation; and 3) a dynamic multiscale fusion module with a gating mechanism is devised, which enhances feature representation and improves the positioning accuracy of RS objects, particularly on small and dense objects. Experiments on two RS challenge datasets demonstrate that these proposed modules achieve new state-of-the-art results by only using image-level labels, which improves mean Intersection over Union (mIoU) to 36.79% on iSAID and 45.43% on ISPRS in the WSSS task. To the best of our knowledge, this is the first work to perform image-level WSSS on multiclass RS scenes.
What problem does this paper attempt to address?