Semi-Supervised Contrastive Learning for Remote Sensing: Identifying Ancient Urbanization in the South Central Andes

Jiachen Xu,Junlin Guo,James Zimmer-Dauphinee,Quan Liu,Yuxuan Shi,Zuhayr Asad,D. Mitchell Wilkes,Parker VanValkenburgh,Steven A. Wernke,Yuankai Huo
DOI: https://doi.org/10.48550/arXiv.2112.06437
2023-04-15
Abstract:Archaeology has long faced fundamental issues of sampling and scalar representation. Traditionally, the local-to-regional-scale views of settlement patterns are produced through systematic pedestrian surveys. Recently, systematic manual survey of satellite and aerial imagery has enabled continuous distributional views of archaeological phenomena at interregional scales. However, such 'brute force' manual imagery survey methods are both time- and labor-intensive, as well as prone to inter-observer differences in sensitivity and specificity. The development of self-supervised learning methods offers a scalable learning scheme for locating archaeological features using unlabeled satellite and historical aerial images. However, archaeological features are generally only visible in a very small proportion relative to the landscape, while the modern contrastive-supervised learning approach typically yields an inferior performance on highly imbalanced datasets. In this work, we propose a framework to address this long-tail problem. As opposed to the existing contrastive learning approaches that treat the labelled and unlabeled data separately, our proposed method reforms the learning paradigm under a semi-supervised setting in order to utilize the precious annotated data (<7% in our setting). Specifically, the highly unbalanced nature of the data is employed as the prior knowledge in order to form pseudo negative pairs by ranking the similarities between unannotated image patches and annotated anchor images. In this study, we used 95,358 unlabeled images and 5,830 labelled images in order to solve the issues associated with detecting ancient buildings from a long-tailed satellite image dataset. From the results, our semi-supervised contrastive learning model achieved a promising testing balanced accuracy of 79.0%, which is a 3.8% improvement as compared to other state-of-the-art approaches.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve two main problems encountered in identifying the ancient urbanization process in the south - central Andean region: 1. **Inefficiency in Archaeological Feature Detection**: Traditional field - survey - based methods and methods of manually analyzing satellite and aerial images, although they can provide detailed views of settlement patterns from local to regional scales, are time - consuming and labor - intensive, and are easily affected by differences in sensitivity and specificity among observers. Moreover, since the proportion of archaeological features in the entire landscape is very small (for example, <7% in this paper's setting), manually identifying and labeling these features is extremely inefficient. 2. **Data Imbalance Problem**: Modern contrastive supervised learning methods perform poorly when dealing with highly imbalanced data sets, especially when the number of positive samples (i.e., images containing archaeological features) is far less than that of negative samples. This imbalance will cause the model to be overly biased towards the representation of the majority class, thus affecting the recognition effect of the minority class (i.e., archaeological features). To solve these problems, the author proposes a new semi - supervised contrastive learning framework, aiming to improve the detection of ancient architecture by re - using a large amount of unlabeled data and a small amount of precious labeled data. Specifically, the framework addresses the above challenges in the following ways: - **Pseudo - negative Sample Generation**: Use the similarity ranking between labeled images and unlabeled images to form pseudo - negative sample pairs to better train the model. - **Combination of Self - supervised and Supervised Contrastive Loss**: Introduce the supervised contrastive loss (SupCon loss) and combine it with the self - supervised contrastive loss to make full use of the limited labeled data. - **Effective Use of Large - scale Unlabeled Data**: Through the semi - supervised learning strategy, make maximum use of large - scale unlabeled image data while ensuring the performance improvement of the model on a small amount of labeled data. Finally, the model achieved a balanced accuracy rate of 79.0% on the test set, which is 3.8% higher than other state - of - the - art methods, proving its effectiveness in dealing with remote - sensing image data with long - tailed distributions. ### Formula Summary - **Self - supervised Contrastive Loss**: \[ D(p_1, z_2)=-\frac{p_1}{\|p_1\|_2}\cdot\frac{z_2}{\|z_2\|_2} \] \[ L = \frac{1}{2}D(p_1, z_2)+\frac{1}{2}D(p_2, z_1) \] - **Supervised Contrastive Loss**: \[ L_{\text{sup}}^{\text{out}}=\sum_{i\in I}L_{\text{sup}}^{\text{out},i}=\sum_{i\in I}-\frac{1}{|P(i)|}\sum_{p\in P(i)}\log\frac{\exp(z_i\cdot z_p / \tau)}{\sum_{a\in A(i)}\exp(z_i\cdot z_a / \tau)} \] - **Total Loss Function**: \[ L_{\text{total}}=(e^{-v_1\cdot\text{loss}_1}+v_1)+(e^{-v_2\cdot\text{loss}_2}+v_2) \] where $\text{loss}_1$ and $\text{loss}_2$ represent the self - supervised contrastive loss and the supervised contrastive loss respectively, and $v_1$ and $v_2$ are weight parameters automatically calculated according to the training performance.