PSCLI-TF: Position-Sensitive Cross-Layer Interactive Transformer Model for Remote Sensing Image Scene Classification

Daxiang Li,Runyuan Liu,Yao Tang,Ying Liu
DOI: https://doi.org/10.1109/lgrs.2024.3359415
IF: 5.343
2024-02-10
IEEE Geoscience and Remote Sensing Letters
Abstract:In the scene classification task of remote sensing image (RSI), to fully perceive multiscale local objects in the image and explore their interdependencies to mine the scene semantics of RSI, this letter designs a novel position-sensitive cross-layer interactive transformer (PSCLI-TF) model to improve the accuracy of RSI scene classification. First, ResNet50 is used as the backbone to extract the multilayer feature maps of RSI. Then, to enhance the model's position sensitivity to local objects in RSI, a new position-sensitive cross-layer interactive attention (PSCLIA) mechanism is designed, and based on it a novel PSCLI-TF encoder is constructed to perform layer-by-layer interactive fusion on the multilayer feature maps to obtain the multigranularity cross-layer fusion (CLF) feature of RSI. Finally, a prototype-based self-supervised loss function (SELF) is constructed to alleviate the semantic gap problem of "large intraclass variance and small interclass variance" in RSI scene classification. Comparative experimental results based on three datasets (i.e., AID, NWPU, and UCM) indicate that the classification performance of the designed PSCLI-TF model is highly competitive compared with other state-of-the-art methods.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?