CITNet: Convolution Interaction Transformer Network for Hyperspectral and LiDAR Image Classification
Minhui Wang,Yaxiu Sun,Jianhong Xiang,Yu Zhong
DOI: https://doi.org/10.1109/tgrs.2024.3477965
IF: 8.2
2024-10-29
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Transformers are increasingly popular in computer vision, which treat an image as a sequence of image patches and learn robust global features from the sequence. However, pure transformers are not entirely suitable for hyperspectral and light detection and ranging (LiDAR) image classification because image classification requires both robust global features and discriminative local features. Therefore, this article introduces a novel convolution interaction transformer network (CITNet) for jointly classifying hyperspectral and LiDAR images. The process begins with a carefully designed multiscale asymmetric depthwise convolution (MADC) module that exploits the local–global correlations of shallow features. On this basis, a novel local–global transformer (LGTM) is equipped with a local–global feed-forward (LGF) network to extract in-depth local–global joint features from the multimodal data. Then, an optimization convolution cross-attention (OCA) module, incorporating a convolutional layer, is developed to simulate the spatial relationships of semantic tokens. Finally, extensive experiments are conducted on the well-known Trento (TR), Augsburg (AU), MUUFL (MU), and Houston2013 (HU) datasets. The overall accuracy (OA) reaches 99.76%, 97.40%, 91.06%, and 99.90%, respectively, which are 0.2%–1.66%, 0.32%–7.37%, 1.52%–12.71%, and 0.14%–93.79% higher than the state-of-the-art (SOTA) methods, demonstrating the effectiveness of CITNet in improving the joint classification accuracy of hyperspectral and LiDAR images.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics