Abstract:Adversarial Examples threaten to fool deep learning models to output erroneous predictions with high confidence. Optimization-based methods for constructing such samples have been extensively studied. While being effective in terms of aggression, they typically lack clear interpretation and constraint about their underlying generation process, which thus hinders us from leveraging the produced adversarial samples for model protection in the reverse direction. Hence, we expect them to repair bugs in the pre-trained models by produced additional training data equipped with strong attack ability rather than time-consuming full re-training from scratch. To address these issues, we first study the black-box behaviors and the intrinsic deficiency of neighborhood information in previous optimization-based adversarial attacks and defenses, respectively. Then we introduce a new method dubbed FeaCP, which uses correct predicted samples in disjoint classes to guide the generation of more explainable adversarial samples in the ambiguous region around the decision boundary instead of uncontrolled “blind spots”, via convex combination in a feature component-wise manner which takes the individual importance of feature ingredients into account. Our method incorporates the prior fact that for well-separated samples, the path connecting them would go through model’s decision-boundary that lies in a low-density region, however, wherein adversarial examples are spread with high probability, thus having an impact on the ultimate trained model. In our work, the path is constructed by proposed inhomogeneous feature-wise convex interpolation rather than operating on sample-wise level, limiting the search space of FeaCP to obtain an adaptive neighborhood. Finally, we provide detailed insights and extend our method to adversarial fine-tuning using vicinity distribution to optimize the approximated decision boundary, and validate the significance of our FeaCP to model performance. The experimental results show that our method provides competitive performance on various datasets and networks.

ELAA: an Efficient Local Adversarial Attack Using Model Interpreters.

Fooling Neural Network Interpretations - Adversarial Noise to Attack Images.

Saliency Map-Based Local White-Box Adversarial Attack Against Deep Neural Networks

AdvJND: Generating Adversarial Examples with Just Noticeable Difference

An efficient adversarial example generation algorithm based on an accelerated gradient iterative fast gradient

Towards Imperceptible and Robust Adversarial Example Attacks Against Neural Networks

Adversarial Adaptive Neighborhood With Feature Importance-Aware Convex Interpolation

ELAA: An Ensemble-Learning-Based Adversarial Attack Targeting Image-Classification Model

LFAA: Crafting Transferable Targeted Adversarial Examples with Low-Frequency Perturbations

Local Black-box Adversarial Attacks: A Query Efficient Approach

ALA: Adversarial Lightness Attack via Naturalness-aware Regularizations

Local Adversarial Attacks for Understanding Model Decisions.

Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models Via Diffusion Models

ILA-DA: Improving Transferability of Intermediate Level Attack with Data Augmentation

Generating Adversarial Examples in Limited Queries with Image Encoding and Noise Decoding.

ADSAttack: an Adversarial Attack Algorithm Via Searching Adversarial Distribution in Latent Space

Adversarial Examples Detection with Enhanced Image Difference Features based on Local Histogram Equalization

DI-AA: an Interpretable White-box Attack for Fooling Deep Neural Networks

Demiguise Attack: Crafting Invisible Semantic Adversarial Perturbations with Perceptual Similarity

Image Adversarial Example Generation Method Based on Adaptive Parameter Adjustable Differential Evolution

EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples