Recent advances in the application of vision transformers to remote sensing image scene classification

Monika Kumari,Ajay Kaul
DOI: https://doi.org/10.1080/2150704X.2023.2234552
IF: 2.369
2023-07-12
Remote Sensing Letters
Abstract:Researchers have investigated the potential of transformer-based models in remote sensing (RS) applications, such as scene categorization, after their recent success in natural language processing and computer vision tasks. In this review article, we provide an overview of the recent developments in vision transformer (ViT)-based models for remote sensing image scene classification (RSISC). We first introduce the basic architecture of transformer models and their extensions to computer vision tasks. Then, we summarize the current state-of-the-art ViT-based models for RSISC, including their architectures, training strategies, and performance evaluation. We also discuss the challenges and limitations of the existing ViT-based models. Finally, we outline some potential future directions for developing transformer-based models for RS applications. This review article intends to give a complete analysis of the current state-of-the-art and future research prospects for ViTs in RSISC, which can be used as a reference for researchers and practitioners in this field.
imaging science & photographic technology,remote sensing
What problem does this paper attempt to address?