ESDINet: Efficient Shallow-Deep Interaction Network for Semantic Segmentation of High-Resolution Aerial Images

Xiangrong Zhang,Zhenhang Weng,Peng Zhu,Xiao Han,Jin Zhu,Licheng Jiao
DOI: https://doi.org/10.1109/tgrs.2024.3351437
IF: 8.2
2024-02-10
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Semantic segmentation of high-resolution remote sensing images is essential in many fields. Nevertheless, in practical applications, constrained by limited computational resources and complex network structures, many advanced models on semantic segmentation often fail to show efficient performance, prompting research on lightweight models. For lightweight semantic segmentation models, the two-branch architecture has been shown to work well in speed and performance. However, such two-branch architectures usually do not utilize enough information for shallow structures to efficiently provide richer multiscale information for the two branches. The lightweight modules it uses are difficult to extract the global context information of the features effectively. Compared with the current advanced semantic segmentation models, lightweight models still have some differences in performance. In order to solve these problems, we propose a new lightweight dual-branch architecture efficient shallow-deep interaction network (ESDINet), which can quickly extract low-level spatial and high-level semantic information of images through the detail branch and semantic branch. Specifically, we have constructed an efficient double-branch structure with shallow and deep different interactions to achieve multiscale information interaction. At the same time, we optimize the semantic branch and propose a new linear attention block to effectively improve the global perception of the semantic branch. We performed extensive experiments and the results show that our model achieves a good balance between segmentation accuracy and inference speed. In particular, ESDINet achieves 82.03% mean intersection over union (mIoU) on the Vaihingen test set, while the proposed model achieves an inference speed of 116 frames/s (FPS) for inputs on a single NVIDIA GTX 2080Ti GPU.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?