SeLT: Sonar Echo Image Recognition for Small Targets Using Lightweight Swin Transformer

Sijia Xia,Mengyang Hou,Yina Han,Ziyuan Xiao,Zihao Guo,Qingyu Liu,Yuanliang Ma
DOI: https://doi.org/10.1109/oceans51537.2024.10682372
2024-01-01
Abstract:Underwater sonar echo image data containing targets is relatively scarce, usually limiting the recognition performance of the model when employing a high-capacity (even state-of-the-art) network for recognition. To address this issue, we propose SeLT, a lightweight adaptation of the Swin-T,using lightweight feature extraction and feature coding modules. Specifically, we have reduced the stacking of Swin Transformer blocks and introduced a lightweight channel attention module to replace the MLP in each block. This eases the requirements for training data and computing resources, greatly accelerating the model training phase. Extensive experiments have demonstrated that, compared to the original Swin-T,our model achieves higher recognition performance (increasing 1.9% in terms of AUC value) with fewer parameters (reduced by 71%) and lower computational complexity (reduced by 73%).
What problem does this paper attempt to address?