FLSTrack: focused linear attention swin-transformer network with dual-branch decoder for end-to-end multi-object tracking

Dafu Zu,Xun Duan,Guangqian Kong,Huiyun Long
DOI: https://doi.org/10.1007/s11760-024-03676-2
IF: 1.583
2024-12-04
Signal Image and Video Processing
Abstract:This study proposes FLSTrack, an end-to-end multi-object tracking algorithm that integrates Focused Linear Attention with dual decoders. The algorithm aims to address the limitations of current multi-object tracking methods, including poor performance in complex scenarios, inadequate data association, and high computational complexity. Initially, the SwinTransformer is paired with a Focused Linear Attention module to enhance the network's ability to extract both local and global information, thereby reducing computational costs. Subsequently, a dual-branch decoder based on window attention is developed, with one branch dedicated to tracking and the other to detecting targets in image frames. To further enhance the algorithm's speed, the complex feature re-identification (ReID) network is replaced with the BYTE data association method. To compensate for the loss of feature appearance resulting from omitting the ReID network, the SIoU loss function is introduced, significantly improving target localization accuracy. The experimental results of FLSTrack on the MOT17, MOT20, DanceTrack, and KITTI datasets show superior performance. Moreover, with an inference speed nearing 30 FPS, the algorithm achieves an optimal balance between tracking accuracy and real-time performance.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?