Generative ConvNet Foundation Model With Sparse Modeling and Low-Frequency Reconstruction for Remote Sensing Image Interpretation

Zhe Dong,Yanfeng Gu,Tianzhu Liu
DOI: https://doi.org/10.1109/tgrs.2023.3348479
IF: 8.2
2024-02-02
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Foundation models offer a highly versatile and precise solution for intelligent interpretation of remote sensing images, thus greatly facilitating various remote sensing applications. Nevertheless, conventional remote sensing foundational models based on generative transformers neglect the consideration of multiscale features and frequency information, limiting their potential for dense prediction tasks in remote sensing scenarios. In this article, we make the first attempt to propose a generative convolutional neural network (ConvNet) foundation model tailored for remote sensing scenarios, which comprises two key components: First, a large dataset named GeoSense, containing approximately nine million diverse remote sensing images, is constructed to enhance the robustness and generalization of the foundation model during the pretraining phase. Second, a sparse modeling and low-frequency reconstruction (SMLFR) framework is designed for self-supervised representation learning of the ConvNet foundation model. Specifically, a sparse modeling strategy is proposed in masked image modeling (MIM), which allows ConvNet to process variable-length sequences by treating unmasked patches as voxels and sparsifying the encoder. In addition, a low-frequency reconstruction target is designed to guide the model's attention toward essential ground object features in remote sensing images, while mitigating unnecessary detail interference. To evaluate the general performance of our proposed foundation model, comprehensive experiments have been carried out on five datasets across three downstream tasks. Experimental results demonstrate that our method consistently achieves state-of-the-art performance across all the benchmark datasets and downstream tasks. The code and pretrained models will be available at https://github.com/HIT-SIRS/SMLFR.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?