Remote Sensing Image Super-Resolution with Top-K Token Selective Transformer

Yi Xiao,Qiangqiang Yuan
DOI: https://doi.org/10.1109/igarss53475.2024.10642137
2024-01-01
Abstract:Transformer-based super-resolution (SR) method has recently demonstrated promising performance, due to its long-range and global aggregation capability. However, the existing Transformer brings two critical challenges for applying it in large-area earth observation scenes: (1) redundant token representation due to most irrelevant tokens; (2) single-scale representation which ignores scale correlation modeling of similar ground observation targets. To this end, this paper proposes to adaptively eliminate the interference of irreverent tokens for a more compact self-attention calculation. Specifically, we devise a Residual Token Selective Group (RTSG) to grasp the most crucial token by dynamically selecting the top-k keys in terms of score ranking for each query. For better feature aggregation, a Multi-scale Feed-forward Layer (MFL) is developed to generate an enriched representation of multi-scale feature mixtures during the feed-forward process. In particular, multiple cascaded RTSGs form our final Top-k Token Selective Transformer (TTST) to achieve progressive representation. Extensive experiments on three remote sensing benchmarks demonstrate our TTST performs favorably against state-of-the-art CNN-based and Transformer-based methods, both qualitatively and quantitatively.
What problem does this paper attempt to address?