CapViT: Cross-context capsule vision transformers for land cover classification with airborne multispectral LiDAR data

Yongtao Yu,Tao Jiang,Junyong Gao,Haiyan Guan,Dilong Li,Shangbing Gao,E Tang,Wenhao Wang,Peng Tang,Jonathan Li
DOI: https://doi.org/10.1016/j.jag.2022.102837
IF: 7.5
2022-07-01
International Journal of Applied Earth Observation and Geoinformation
Abstract:Equipped with multiple channels of laser scanners, multispectral light detection and ranging (MS-LiDAR) devices possess more advanced prospects in earth observation tasks compared with their single-band counterparts. It also opens up a potential-competitive solution to conducting land cover mapping with MS-LiDAR devices. In this paper, we develop a cross-context capsule vision transformer (CapViT) to serve for land cover classification with MS-LiDAR data. Specifically, the CapViT is structurized with three streams of capsule transformer encoders, which are stacked by capsule transformer (CapFormer) blocks, to exploit long-range global feature interactions at different context scales. These cross-context feature semantics are finally effectively fused to supervise accurate land cover type inferences. In addition, the CapFormer block parallels dual-path multi-head self-attention modules functioning to interpret both spatial token correlations and channel feature interdependencies, which favor significantly to the semantic promotion of feature encodings. Consequently, with the semantic-promoted feature encodings to boost the feature representation distinctiveness and quality, the land cover classification accuracy is effectively improved. The CapViT is elaborately testified on two MS-LiDAR datasets. Both quantitative assessments and comparative analyses demonstrate the competitive capability and advanced performance of the CapViT in tackling land cover classification issues.
remote sensing
What problem does this paper attempt to address?