Abstract:Deep neural networks that achieve remarkable performance in image classification have previously been shown to be easily fooled by tiny transformations such as a one pixel translation of the input image. In order to address this problem, two approaches have been proposed in recent years. The first approach suggests using huge datasets together with data augmentation in the hope that a highly varied training set will teach the network to learn to be invariant. The second approach suggests using architectural modifications based on sampling theory to deal explicitly with image translations. In this paper, we show that these approaches still fall short in robustly handling 'natural' image translations that simulate a subtle change in camera orientation. Our findings reveal that a mere one-pixel translation can result in a significant change in the predicted image representation for approximately 40% of the test images in state-of-the-art models (e.g. open-CLIP trained on LAION-2B or DINO-v2) , while models that are explicitly constructed to be robust to cyclic translations can still be fooled with 1 pixel realistic (non-cyclic) translations 11% of the time. We present Robust Inference by Crop Selection: a simple method that can be proven to achieve any desired level of consistency, although with a modest tradeoff with the model's accuracy. Importantly, we demonstrate how employing this method reduces the ability to fool state-of-the-art models with a 1 pixel translation to less than 5% while suffering from only a 1% drop in classification accuracy. Additionally, we show that our method can be easy adjusted to deal with circular shifts as well. In such case we achieve 100% robustness to integer shifts with state-of-the-art accuracy, and with no need for any further training.

Generalization to translation shifts: a study in architectures and augmentations

On the Generalization Effects of Linear Transformations in Data Augmentation

Quantifying Translation-Invariance in Convolutional Neural Networks

Adversarial Learning of General Transformations for Data Augmentation

Augmentation Invariant Training

Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations

Generalization Gap in Data Augmentation: Insights from Illumination

Augmentation-based Domain Generalization for Semantic Segmentation

Soft Augmentation for Image Classification

Improving generalization for geometric variations in images for efficient deep learning

Automatic Data Augmentation via Invariance-Constrained Learning

Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers

A data-centric approach to class-specific bias in image data augmentation

Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data

The Ultimate Combo: Boosting Adversarial Example Transferability by Composing Data Augmentations

Data Augmentation Can Improve Robustness

Untapped Potential of Data Augmentation: A Domain Generalization Viewpoint

Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance with Expanded Views

Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches

Data-Efficient Augmentation for Training Neural Networks

Learning to Augment via Implicit Differentiation for Domain Generalization