ACSTNet: An Attention Cross Stage Transformers Network for Small Object Detection in Remote Sensing Images
Yang LIU,Jialong WEI,Shujian ZHAO,Wenhua XIE,Niankuan CHEN,Jie LI,Xin CHEN,Kaixuan YANG,Yongwei LI,Zhen ZHAO
DOI: https://doi.org/10.1587/transfun.2023eap1130
2024-01-01
Abstract:Deep learning based object detection methods have achieved promising performance recently. However, these methods lack sufficient capabilities to handle satellite images owing to the fact that smallsized objects in remote sensing images are difficult to detect. To address this issue, we propose a novel small object detection method based on YOLO X named Attention Cross Stage Transformers Network (ACSTNet). Specifically, a novel backbone network, Multi-scale Cross Fusion Network (MCFNet) is constructed to capture semantic dependencies between pixels over long distances and increase the depth-interaction information at different levels. Meanwhile, a new feature fusion layer is added to the upper feature output layer of dark3, allowing the model to maximize the retention of low-level features of small objects and to locate them more accurately. Furthermore, to address the problem of the inaccurate feature extraction caused by overlapping and occlusion of dense objects, we propose an efficient channel and space normalized fusion attention mechanism (ECSNFAM), which is composed of channel attention, space attention, and batch normalization attention branches, using residual structure to enhance the sensitivity of the attention mechanism for small targets. Experiments are conducted to evaluate the performance of the general remote sensing dataset, and the results show that our proposed method improves the mean Average Precision (mAP) by 1.2% and 1.4% on the DIOR and the RSOD-DATA datasets compared with the YOLO X. The source code is available at https:github.com/Wei-JL/ACSTNet.git.
computer science, information systems,engineering, electrical & electronic, hardware & architecture