Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation

Hao Zhu,Yan Zhu,Jiayu Xiao,Tianxiang Xiao,Yike Ma,Yucheng Zhang,Feng Dai
2024-12-05
Abstract:Automated crop mapping through Satellite Image Time Series (SITS) has emerged as a crucial avenue for agricultural monitoring and management. However, due to the low resolution and unclear parcel boundaries, annotating pixel-level masks is exceptionally complex and time-consuming in SITS. This paper embraces the weakly supervised paradigm (i.e., only image-level categories available) to liberate the crop mapping task from the exhaustive annotation burden. The unique characteristics of SITS give rise to several challenges in weakly supervised learning: (1) noise perturbation from spatially neighboring regions, and (2) erroneous semantic bias from anomalous temporal periods. To address the above difficulties, we propose a novel method, termed exploring space-time perceptive clues (Exact). First, we introduce a set of spatial clues to explicitly capture the representative patterns of different crops from the most class-relative regions. Besides, we leverage the temporal-to-class interaction of the model to emphasize the contributions of pivotal clips, thereby enhancing the model perception for crop regions. Build upon the space-time perceptive clues, we derive the clue-based CAMs to effectively supervise the SITS segmentation network. Our method demonstrates impressive performance on various SITS benchmarks. Remarkably, the segmentation network trained on Exact-generated masks achieves 95% of its fully supervised performance, showing the bright promise of weakly supervised paradigm in crop mapping scenario. Our code will be publicly available.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when implementing automatic crop mapping in satellite image time series (SITS), how to reduce the dependence on pixel - level annotation and overcome the problems of annotation complexity and time - consumption caused by low - resolution and unclear plot boundaries. Specifically, the author proposes a weakly - supervised learning method to relieve the cumbersome annotation burden in crop - mapping tasks. ### Detailed Explanation 1. **Problem Background**: - Automated crop mapping through satellite image time series (SITS) has become an important means for agricultural monitoring and management. - However, due to the low resolution of SITS images and unclear plot boundaries, pixel - level annotation is very complex and time - consuming. - The weakly - supervised learning paradigm (using only image - level class labels) can significantly reduce the annotation workload, but faces the following challenges: - **Spatial Noise Perturbation**: Noise interference from neighboring areas. - **Temporal Semantic Deviation**: Incorrect semantic shift caused by abnormal time periods. 2. **Solution**: - The author proposes a new method named Exact, aiming to explore space - time perceptive clues to address the above challenges. - **Spatial Clues**: Introduce a set of spatial clues to explicitly capture the representative patterns of different crops, thereby enhancing the model's perception ability of crop areas. - **Temporal - Perceptive Affinity Propagation**: Utilize time - class interactions to emphasize the importance of key segments and suppress the influence of abnormal time periods. 3. **Main Contributions**: - **Apply weakly - supervised learning to SITS crop mapping for the first time**, relying only on image - level class labels. - **Propose the Exact framework**, which reduces noise perturbation and corrects incorrect semantic shifts by exploring space - time perceptive clues, providing reliable supervision for SITS segmentation. - **Experimental results show that** the performance of the SITS segmentation model trained with the pseudo - labels generated by Exact is close to that of the fully - supervised model (reaching 95% mIoU), significantly improving the effect of image - level weakly - supervised techniques. ### Formula Summary - **CAM Generation Formula**: \[ M_k^i=\text{ReLU}\left(\sum_{i} w_k^i\cdot F(:,:,i)\right),\quad\forall k\in K \] where \(M_k^i\) is the CAM of the \(k\) - th class, \(w_k^i\) is the classifier weight, and \(F(:,:,i)\) is the feature map. - **Spatial Clue Clustering Update Formula**: \[ p_k^{np}=\alpha p_k^{np}+(1 - \alpha)\frac{(C_k^{np}Z_k)}{\|C_k^{np}\|_1} \] where \(\alpha\) is the momentum coefficient, \(C_k^{np}\) is the assignment matrix, and \(Z_k\) is the dense embedding belonging to class \(k\). - **Contrastive Loss Function**: \[ L_{cbl}=\sum_k\sum_{nk}\left[\log\left(\sum_{p\in P^-}\exp(S(z_k^{nk},p))\right)-S(z_k^{nk},p_k^{np})\right] \] where \(P^-\) is the set of negative samples and \(S(z_k^{nk},p_k^{np})\) is the cosine similarity. - **Temporal - Perceptive Affinity Propagation Loss**: \[ L_{tap}=\sum_k\left|\tilde{M}_k - M_k\right| \] Through these methods, Exact significantly improves the effect of weakly - supervised SITS segmentation, demonstrating its great potential in crop mapping.