Abstract:Enhancing the robustness of deep learning models, particularly in the realm of vision transformers (ViTs), is crucial for their real-world deployment. In this work, we provide a finetuning approach to enhance the robustness of vision transformers inspired by the concept of nullspace from linear algebra. Our investigation centers on whether a vision transformer can exhibit resilience to input variations akin to the nullspace property in linear mappings, implying that perturbations sampled from this nullspace do not influence the model's output when added to the input. Firstly, we show that for many pretrained ViTs, a non-trivial nullspace exists due to the presence of the patch embedding layer. Secondly, as nullspace is a concept associated with linear algebra, we demonstrate that it is possible to synthesize approximate nullspace elements for the non-linear blocks of ViTs employing an optimisation strategy. Finally, we propose a fine-tuning strategy for ViTs wherein we augment the training data with synthesized approximate nullspace noise. After finetuning, we find that the model demonstrates robustness to adversarial and natural image perbutations alike.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to enhance the robustness of Vision Transformers (ViTs), making them more stable in the face of adversarial and natural image perturbations. Specifically, the authors propose a fine - tuning method based on the concept of null space in linear algebra to improve the anti - interference ability of ViTs against input changes. ### Problem Background Deep - learning models, especially ViTs, face challenges from adversarial and natural perturbations (such as JPEG compression, weather effects, brightness adjustment, etc.) in practical deployment. These perturbations may lead to inaccurate model outputs or over - confident predictions. Therefore, improving the robustness of the model is crucial for its practical applications. ### Solution Starting from the concept of null space in linear algebra, the authors explore whether ViTs can show invariance to certain input perturbations like linear mappings. The null space refers to the space where a linear mapping maps certain vectors to zero. The authors find that since the first layer of ViTs is a linear embedding layer, there exists a non - trivial null space. They further propose the following points: 1. **Identifying the non - trivial null space in ViTs**: By analyzing the pre - trained ViT models, the authors prove that these models do indeed have a non - trivial null space. 2. **Synthesizing approximate null - space noise**: For the non - linear parts in ViTs, the authors use an optimization strategy to synthesize approximate null - space noise. 3. **Using null - space noise for data augmentation**: The authors propose a fine - tuning method to enhance the robustness of the model by adding the synthesized null - space noise to the training data. ### Main Contributions - **Theoretical connection**: The authors discover rich connections between the robustness of ViTs and the concept of null space in linear algebra, and verify through experiments that expanding the approximate null space can effectively improve the robustness of the model. - **Comprehensive analysis**: The authors conduct a detailed analysis of the existence of null space in ViTs, including verifying its algebraic properties in the embedding layer and non - linear encoder layer. - **Effective method**: A data augmentation method that utilizes and expands the approximate null space of the model is proposed, which significantly improves the robustness of the model on multiple benchmark datasets only through fine - tuning without modifying the architecture. ### Experimental Results The experimental results show that this method significantly improves the robustness of ViTs in multiple settings, especially performing particularly well on adversarial attacks and out - of - distribution data. This not only verifies the effectiveness of null - space training but also supports the authors' hypothesis about the relationship between null - space tolerance and model robustness. Through this method, the authors provide a new perspective and an effective solution for improving the robustness of ViTs.

Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Deeper Insights into the Robustness of ViTs towards Common Corruptions

On the Adversarial Robustness of Vision Transformers

Improving Robustness for Vision Transformer with a Simple Dynamic Scanning Augmentation

Improving Vision Transformers by Revisiting High-Frequency Components

Are Vision Transformers Robust to Patch Perturbations?

Denoising Vision Transformers

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

Discrete Representations Strengthen Vision Transformer Robustness

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

CF-ViT: A General Coarse-to-Fine Method for Vision Transformer

A Light Recipe to Train Robust Vision Transformers

Evaluating Robustness of Vision Transformers on Imbalanced Datasets (Student Abstract)

Improve Vision Transformers Training by Suppressing Over-smoothing

Auto-scaling Vision Transformers without Training

Towards Efficient Adversarial Training on Vision Transformers

Enhancing the robustness of vision transformer defense against adversarial attacks based on squeeze-and-excitation module

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers

Improved Robustness of Vision Transformer via PreLayerNorm in Patch Embedding

Boosting Vanilla Lightweight Vision Transformers Via Re-parameterization

Zero-Shot Certified Defense against Adversarial Patches with Vision Transformers