Deep Clustering of Remote Sensing Scenes through Heterogeneous Transfer Learning

Isaac Ray,Alexei Skurikhin
2024-09-06
Abstract:This paper proposes a method for unsupervised whole-image clustering of a target dataset of remote sensing scenes with no labels. The method consists of three main steps: (1) finetuning a pretrained deep neural network (DINOv2) on a labelled source remote sensing imagery dataset and using it to extract a feature vector from each image in the target dataset, (2) reducing the dimension of these deep features via manifold projection into a low-dimensional Euclidean space, and (3) clustering the embedded features using a Bayesian nonparametric technique to infer the number and membership of clusters simultaneously. The method takes advantage of heterogeneous transfer learning to cluster unseen data with different feature and label distributions. We demonstrate the performance of this approach outperforming state-of-the-art zero-shot classification methods on several remote sensing scene classification datasets.
Computer Vision and Pattern Recognition,Applications
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to perform unsupervised clustering on remote - sensing images without labels. Specifically, the author proposes a method to perform unsupervised whole - image clustering on remote - sensing scene images in the target dataset through heterogeneous transfer learning. The target dataset has no labels, and many existing clustering methods require pre - specifying the number of clusters or neighborhoods, which is often difficult to achieve in practical applications, especially in applications where feature discovery is the goal. In addition, some current deep - clustering techniques need to train expensive deep neural network clustering layers for each target dataset, which increases the computational cost. To solve these problems, the author proposes a method consisting of three main steps: 1. **Fine - tuning the pre - trained deep neural network**: First, fine - tune the pre - trained deep neural network (such as DINOv2) on a labeled source remote - sensing image dataset, and then use this fine - tuned model to extract feature vectors from each image in the target dataset. 2. **Dimensionality reduction**: Embed these high - dimensional features into a low - dimensional Euclidean space through manifold projection techniques to preserve the structural relationships between features. 3. **Clustering**: Use a Bayesian non - parametric method (such as the Dirichlet process Gaussian mixture model DPGMM) to cluster the embedded features while inferring the number of clusters and membership relationships. The advantage of this method is that it does not require labeling of the target dataset and can adaptively determine the number of clusters, and is suitable for datasets with different feature and label distributions. Through experiments, the author has proven that the performance of this method on multiple remote - sensing scene classification datasets is better than that of existing zero - shot classification methods.