Multi-Model UNet: An Adversarial Defense Mechanism for Robust Visual Tracking

Wattanapong Suttapak,Jianfu Zhang,Haohuo Zhao,Liqing Zhang
DOI: https://doi.org/10.1007/s11063-024-11592-2
IF: 2.565
2024-04-02
Neural Processing Letters
Abstract:Currently, state-of-the-art object-tracking algorithms are facing a severe threat from adversarial attacks, which can significantly undermine their performance. In this research, we introduce MUNet, a novel defensive model designed for visual tracking. This model is capable of generating defensive images that can effectively counter attacks while maintaining a low computational overhead. To achieve this, we experiment with various configurations of MUNet models, finding that even a minimal three-layer setup significantly improves tracking robustness when the target tracker is under attack. Each model undergoes end-to-end training on randomly paired images, which include both clean and adversarial noise images. This training separately utilizes pixel-wise denoiser and feature-wise defender. Our proposed models significantly enhance tracking performance even when the target tracker is attacked or the target frame is clean. Additionally, MUNet can simultaneously share its parameters on both template and search regions. In experimental results, the proposed models successfully defend against top attackers on six benchmark datasets, including OTB100, LaSOT, UAV123, VOT2018, VOT2019, and GOT-10k. Performance results on all datasets show a significant improvement over all attackers, with a decline of less than 4.6% for every benchmark metric compared to the original tracker. Notably, our model demonstrates the ability to enhance tracking robustness in other blackbox trackers.
computer science, artificial intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem that the performance of the current state - of - the - art object tracking algorithms drops significantly when facing adversarial attacks. Specifically, the paper introduces a new defense model named MUNet (Multi - Model UNet), which is specifically used for adversarial defense in visual tracking tasks. #### Background and Problem Description In recent years, adversarial attacks have posed a serious threat to visual tracking models. These attacks cause the model to fail by adding small but carefully designed perturbations (i.e., adversarial noise) to the input data, thus seriously affecting the accuracy of object tracking. Common adversarial attack methods include FGSM, iFGSM, and miFGSM, etc. These attack means can generate adversarial noise in real - time, causing top - level trackers to completely fail. #### Research Motivation To meet this challenge, researchers have proposed a variety of adversarial defense measures, but in the field of visual tracking, these methods are often ineffective. The reason lies in the fundamental differences in network structures between classification models and visual tracking models. Visual tracking models require not only feature extraction but also functional modules such as template matching and region proposal networks. Therefore, simple denoising or feature enhancement methods cannot effectively improve their robustness against adversarial attacks. #### Solution To solve the above problems, the author proposes a multi - model defense mechanism based on the UNet architecture - MUNet. This model improves the robustness of visual tracking in the following ways: 1. **Pixel - wise Denoiser**: It is used to remove the adversarial noise in the input image and ensure that the tracking model receives clean data. 2. **Feature - wise Defender**: It uses convolutional neural networks (CNN) for feature extraction and denoising to further improve the robustness of the model. 3. **End - to - End Training**: Training on a large - scale dataset containing clean images and adversarial - noise - contaminated images enables the model to distinguish and process different types of inputs. #### Experimental Results The experimental results show that the MUNet model successfully resists top - level attackers on multiple benchmark datasets (such as OTB100, LaSOT, UAV123, VOT2018, VOT2019, and GOT - 10k), and the performance degradation on all indicators is no more than 4.6%. In addition, MUNet can also transfer its defense capabilities to other top - level trackers (such as DiMP and DaSiam), showing good generalization ability. ### Summary This paper provides an effective adversarial defense mechanism by introducing the MUNet model, which significantly improves the robustness and reliability of visual tracking models when facing adversarial attacks. This provides new ideas and directions for future research, especially for ensuring the security and stability of models in critical application scenarios.