A Distractor-Aware Memory for Visual Object Tracking with SAM2

Jovana Videnovic,Alan Lukezic,Matej Kristan
2024-12-04
Abstract:Memory-based trackers are video object segmentation methods that form the target model by concatenating recently tracked frames into a memory buffer and localize the target by attending the current image to the buffered frames. While already achieving top performance on many benchmarks, it was the recent release of SAM2 that placed memory-based trackers into focus of the visual object tracking community. Nevertheless, modern trackers still struggle in the presence of distractors. We argue that a more sophisticated memory model is required, and propose a new distractor-aware memory model for SAM2 and an introspection-based update strategy that jointly addresses the segmentation accuracy as well as tracking robustness. The resulting tracker is denoted as SAM2.1++. We also propose a new distractor-distilled DiDi dataset to study the distractor problem better. SAM2.1++ outperforms SAM2.1 and related SAM memory extensions on seven benchmarks and sets a solid new state-of-the-art on six of them.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve a key challenge in visual object tracking (VOT): **tracking failures caused by distractors**. Specifically, modern trackers perform poorly when faced with external distractors (such as other objects similar to the target) and internal distractors (such as similarities between parts of the target). These problems are particularly prominent when the target leaves and re - enters the field of view. To address this challenge, the authors propose a new **Distractor - Aware Memory (DAM)** model and apply it to the existing memory - based tracking framework SAM2. By introducing this new memory model and update strategy, the authors hope to improve the robustness and accuracy of the tracker in complex scenarios. ### Main contributions 1. **New Distractor - Aware Memory (DAM)**: - Divide memory into two parts: Recent Appearance Memory (RAM) and Distractor - Resolution Memory (DRM). RAM is used to ensure the segmentation accuracy of the current frame, while DRM helps to distinguish between the target and distractors. - Propose a new update mechanism that uses the information output by SAM2 to update DRM, thus dealing with distractors more effectively. 2. **New Distractor Distillation Dataset (DiDi)**: - Semi - automatically extract sequences containing significant distractors from multiple benchmark datasets to form a new tracking dataset focused on distractors. - This dataset can better evaluate the performance of trackers in the presence of distractors. 3. **Performance improvement**: - On multiple standard bounding box and segmentation tracking benchmarks, SAM2.1++ significantly outperforms SAM2.1 and other related extensions and sets new state - of - the - art levels on six benchmarks. - In particular, on the VOT2022 benchmark, SAM2.1++ improves the Expected Average Overlap (EAO) metric by 12% compared to the existing state - of - the - art method. ### Core ideas of the solution - **Distractor - Aware Memory (DAM)**: By dividing memory into RAM and DRM, different task requirements are handled respectively. RAM is responsible for storing the recent appearance of the target, ensuring segmentation accuracy; DRM contains critical distractor information, ensuring the robustness of tracking. - **Intelligent update strategy**: Dynamically update RAM and DRM according to whether the target appears and whether there are distractors, avoiding invalid or incorrect memory updates. - **New dataset (DiDi)**: Provide a more challenging test environment specifically for evaluating the performance of trackers in distractor scenarios. Through these improvements, the method proposed in the paper shows higher robustness and accuracy in the face of complex scenarios, bringing important progress to the field of visual object tracking.