Target detection based on improved swin transformer and cascade RCNN

Yuchao Chen,Mingju Liang
DOI: https://doi.org/10.1117/12.3011400
2023-11-08
Abstract:This paper proposes two improvements to address the issues of high complexity and computational burden in the Swin Transformer backbone network for feature extraction and the feature mismatch problem caused by coupled detection in the Cascade R-CNN detection network. First, a lightweight PoolFormer Block based on pooling is introduced in the third and fourth stages of the Swin-T network to reduce its complexity. Then, to improve the feature extraction capability of the lightweight Swin-T network, a coordinate attention mechanism is introduced. Second, the classification and regression tasks of objects in the Cascade R-CNN detection network are decoupled to alleviate the issue of feature mismatch in the two tasks and further improve detection performance. The experimental findings obtained from the PASCAL VOC dataset indicate that the proposed approach increased the average detection precision by 2.2% compared to the baseline model.
Engineering,Computer Science
What problem does this paper attempt to address?