DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking

Fei Xie,Zhongdao Wang,Chao Ma
DOI: https://doi.org/10.1109/cvpr52733.2024.01808
2024-01-01
Abstract:Existing Siamese or transformer trackers commonly pose visual object tracking as a one-shot detection problem, i.e., locating the target object in a single forward evaluation scheme. Despite the demonstrated success, these trackers may easily drift towards distractors with similar appear-ance due to the single forward evaluation scheme lacking self-correction. To address this issue, we cast visual tracking as a point set based denoising diffusion process and propose a novel generative learning based tracker, dubbed Diffusion Track. Our DiffusionTrack possesses two appealing properties: 1) It follows a novel noise-to-target tracking paradigm that leverages multiple denoising diffusion steps to localize the target in a dynamic searching man-ner per frame. 2) It models the diffusion process using a point set representation, which can better handle appear-ance variations for more precise localization. One side benefit is that DiffusionTrack greatly simplifies the post-processing, e.g. removing window penalty scheme. Without bells and whistles, our DiffusionTrack achieves leading per-formance over the state-of-the-art trackers and runs in real-time. The code is in https://github.com/VISION-SJTU/DiffusionTrack.
What problem does this paper attempt to address?