Shadow-Enhanced Self-Attention and Anchor-Adaptive Network for Video SAR Moving Target Tracking.

Jinyu Bao,Xiaoling Zhang,Tianwen Zhang,Tianjiao Zeng,Zhenyu Yang,Xu Zhan,Jun Shi,Shunjun Wei
DOI: https://doi.org/10.1109/tgrs.2023.3260254
IF: 8.2
2023-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Video synthetic aperture radar (Video SAR) has drawn much attention because it can continuously observe and track the moving target. Rather than tracking the target directly, it is better to track its shadow because the shadow has no location shift, and the backscattering characteristic is stable. However, most current shadow tracking methods not only suffer from false alarms because their discrimination capacities are not good enough but also suffer from missed detection because the feature extraction capacities are limited under a complicated environment. Therefore, we propose a shadow-enhanced self-attention and anchor-adaptive network (SE-SA-AAN) to achieve accurate moving target tracking for Video SAR. First, the preprocessing technique sparse low-rank noise decomposition (SLRND) is proposed for enhancing shadows' salience to facilitate subsequent feature extraction. Second, the transformer self-attention mechanism (TSAM) is embedded in the parameter-shared backbone in the feature extraction network to concentrate on regions of interest for suppressing clutter interference. Then, the representative features are input to the detector and tracker. The detector adds the semantic guided anchor-adaptive mechanism (SGAAM) to obtain optimized anchors that match the shadows' location and shape in each frame. Meanwhile, the tracker applies a Siamese network to achieve trajectory tracking for each shadow. Based on the detection and tracking results, a data association is applied to achieve moving target tracking. Finally, experiments on Sandia National Laboratories (SNL) data demonstrate that SE-SA-AAN outperforms the state-of-the-art methods FairMOT, TransTrack, and Centertrack by 6.4%, 7.8%, and 8.3% multiple object tracking accuracy (MOTA) separately.
What problem does this paper attempt to address?