Abstract:Currently, state-of-the-art object-tracking algorithms are facing a severe threat from adversarial attacks, which can significantly undermine their performance. In this research, we introduce MUNet, a novel defensive model designed for visual tracking. This model is capable of generating defensive images that can effectively counter attacks while maintaining a low computational overhead. To achieve this, we experiment with various configurations of MUNet models, finding that even a minimal three-layer setup significantly improves tracking robustness when the target tracker is under attack. Each model undergoes end-to-end training on randomly paired images, which include both clean and adversarial noise images. This training separately utilizes pixel-wise denoiser and feature-wise defender. Our proposed models significantly enhance tracking performance even when the target tracker is attacked or the target frame is clean. Additionally, MUNet can simultaneously share its parameters on both template and search regions. In experimental results, the proposed models successfully defend against top attackers on six benchmark datasets, including OTB100, LaSOT, UAV123, VOT2018, VOT2019, and GOT-10k. Performance results on all datasets show a significant improvement over all attackers, with a decline of less than 4.6% for every benchmark metric compared to the original tracker. Notably, our model demonstrates the ability to enhance tracking robustness in other blackbox trackers.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem that the performance of the current state - of - the - art object tracking algorithms drops significantly when facing adversarial attacks. Specifically, the paper introduces a new defense model named MUNet (Multi - Model UNet), which is specifically used for adversarial defense in visual tracking tasks. #### Background and Problem Description In recent years, adversarial attacks have posed a serious threat to visual tracking models. These attacks cause the model to fail by adding small but carefully designed perturbations (i.e., adversarial noise) to the input data, thus seriously affecting the accuracy of object tracking. Common adversarial attack methods include FGSM, iFGSM, and miFGSM, etc. These attack means can generate adversarial noise in real - time, causing top - level trackers to completely fail. #### Research Motivation To meet this challenge, researchers have proposed a variety of adversarial defense measures, but in the field of visual tracking, these methods are often ineffective. The reason lies in the fundamental differences in network structures between classification models and visual tracking models. Visual tracking models require not only feature extraction but also functional modules such as template matching and region proposal networks. Therefore, simple denoising or feature enhancement methods cannot effectively improve their robustness against adversarial attacks. #### Solution To solve the above problems, the author proposes a multi - model defense mechanism based on the UNet architecture - MUNet. This model improves the robustness of visual tracking in the following ways: 1. **Pixel - wise Denoiser**: It is used to remove the adversarial noise in the input image and ensure that the tracking model receives clean data. 2. **Feature - wise Defender**: It uses convolutional neural networks (CNN) for feature extraction and denoising to further improve the robustness of the model. 3. **End - to - End Training**: Training on a large - scale dataset containing clean images and adversarial - noise - contaminated images enables the model to distinguish and process different types of inputs. #### Experimental Results The experimental results show that the MUNet model successfully resists top - level attackers on multiple benchmark datasets (such as OTB100, LaSOT, UAV123, VOT2018, VOT2019, and GOT - 10k), and the performance degradation on all indicators is no more than 4.6%. In addition, MUNet can also transfer its defense capabilities to other top - level trackers (such as DiMP and DaSiam), showing good generalization ability. ### Summary This paper provides an effective adversarial defense mechanism by introducing the MUNet model, which significantly improves the robustness and reliability of visual tracking models when facing adversarial attacks. This provides new ideas and directions for future research, especially for ensuring the security and stability of models in critical application scenarios.

Multi-Model UNet: An Adversarial Defense Mechanism for Robust Visual Tracking

Multi-Model U-Net: an Adversarial Defense Mechanism for Robust Visual Tracking

F&F Attack: Adversarial Attack Against Multiple Object Trackers by Inducing False Negatives and False Positives

A Unified Multi-Scenario Attacking Network for Visual Object Tracking.

Exploit the Connectivity: Multi-Object Tracking with TrackletNet

Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks

Robust Unsupervised Multi-Object Tracking in Noisy Environments

DIMBA: Discretely Masked Black-Box Attack in Single Object Tracking

Ad2Attack: Adaptive Adversarial Attack on Real-Time UAV Tracking

Diminishing-feature Attack: the Adversarial Infiltration on Visual Tracking

One-Shot Adversarial Attacks on Visual Tracking With Dual Attention

Robust Deep Object Tracking against Adversarial Attacks

Only Once Attack: Fooling the Tracker with Adversarial Template

Towards Universal Physical Attacks on Single Object Tracking

Blinding and blurring the multi-object tracker with adversarial perturbations

Efficient Adversarial Attacks for Visual Object Tracking

Pluggable Attack for Visual Object Tracking

Hijacking Tracker: A Powerful Adversarial Attack on Visual Tracking.

UnMask: Adversarial Detection and Defense Through Robust Feature Alignment

SPARK: Spatial-Aware Online Incremental Attack Against Visual Tracking

Security in Transformer Visual Trackers: A Case Study on the Adversarial Robustness of Two Models