A Multi-target Tracking Method Based on Feature Enhancement

Yuxin Tan,Zhixiang Zhu
DOI: https://doi.org/10.23919/ccc63176.2024.10662000
2024-01-01
Abstract:In the field of multi-target tracking, associating targets in neighboring frames to generate target trajectories requires capturing fine-grained features of different targets. Therefore, efficient fine-grained feature extraction is particularly critical. Effective representation of fine-grained features requires both high resolution and accurate semantic information, so it is crucial to improve the resolution of features and enrich the semantic information of features. To solve this problem, this paper proposes a feature-enhanced multi-target tracking method that combines feature pyramid and attention mechanism. First, a stream-aligned feature pyramid is introduced to align and aggregate multi-scale features by generating semantic streams between feature maps with different resolutions in order to improve the feature extraction effect and thus make the tracking results more accurate. Second, the efficient multi-scale attention mechanism is introduced into the backbone network to further enhance the model’s attention and trajectory association performance. At the same time, the attention weights are smoothed to reduce the model’s excessive attention to individual local features, thus enhancing the model’s attention to the overall semantics. Experimental results on the DanceTrack dataset with extremely similar target appearance show that compared to MeMOTR, the proposed method in this paper improves $0.7 \%$ and $1.1 \%$ in both HOTA and IDF1, respectively, which validates its effectiveness in multi-target tracking tasks.
What problem does this paper attempt to address?