Abstract:Although deep learning models have achieved unprecedented success, their vulnerabilities towards adversarial attacks have attracted increasing attention, especially when deployed in security-critical domains. To address the challenge, numerous defense strategies, including reactive and proactive ones, have been proposed for robustness improvement. From the perspective of image feature space, some of them cannot reach satisfying results due to the shift of features. Besides, features learned by models are not directly related to classification results. Different from them, We consider defense method essentially from model inside and investigated the neuron behaviors before and after attacks. We observed that attacks mislead the model by dramatically changing the neurons that contribute most and least to the correct label. Motivated by it, we introduce the concept of neuron influence and further divide neurons into front, middle and tail part. Based on it, we propose neuron-level inverse perturbation(NIP), the first neuron-level reactive defense method against adversarial attacks. By strengthening front neurons and weakening those in the tail part, NIP can eliminate nearly all adversarial perturbations while still maintaining high benign accuracy. Besides, it can cope with different sizes of perturbations via adaptivity, especially larger ones. Comprehensive experiments conducted on three datasets and six models show that NIP outperforms the state-of-the-art baselines against eleven adversarial attacks. We further provide interpretable proofs via neuron activation and visualization for better understanding. Impact Statement—Deep learning has attracted tremendous attentions in many fields but some studies have proved that they are vulnerable to adversarial attacks. Meanwhile, defense methods against them are developed as well. Some solutions via simple image transformations are easy to implement but may be compromised in the event of larger perturbations. Methods based on feature space are recently proposed, by projecting or mappings the distribution of adversarial examples back to that of benign examples. But they may fail due to the shift of feature during operations on adversarial examples. Besides, feature distribution doesn’t directly relate to classifications. We propose NIP, a neuron-level defense against generic adversarial attack, which This research was supported by the National Natural Science Foundation of China under Grant No. 62072406, the Natural Science Foundation of Zhejiang Provincial under Grant No. LY19F020025. R. Chen is with the College of Information Engineering at Zhejiang University of Technology, Hangzhou 310007, China. (e-mail: 2112003149@zjut.edu.cn). H. Jin is with the College of Information Engineering at Zhejiang University of Technology, Hangzhou 310007, China. (e-mail: 2112003035@zjut.edu.cn). J. Chen is with the Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China. (e-mail: chenjinyin@zjut.edu.cn) H. Zheng is with the College of Information Engineering at Zhejiang University of Technology, Hangzhou 310007, China. (e-mail: haibinzheng320@gmail.com). Y. Yue is with the Key Laboratory of Parallel and Distributed Computing, College of Computer, National University of Defense Technology, Changsha, 410000, China. (email: yuyue@nudt.edu.cn) S. Ji is with the College of Computer Science and Technology at Zhejiang University, Hangzhou 310007, China. (e-mail: sji@zju.edu.cn) bridge the neuron behaviors and correct classifications, from the perspective of model inside. By suppressing neurons exploited by attacks and enhancing class-relevant ones, it provides an attackagnostic, input-aware and more fine-grained solution for defense.

FDI: Attack Neural Code Generation Systems through User Feedback Channel

Backdooring Neural Code Search

Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

You see what I want you to see: poisoning vulnerabilities in neural code search.

A Disguised Wolf Is More Harmful Than a Toothless Tiger: Adaptive Malicious Code Injection Backdoor Attack Leveraging User Behavior as Triggers

DeepPayload: Black-box Backdoor Attack on Deep Learning Models Through Neural Payload Injection

AdverseGen: A Practical Tool for Generating Adversarial Examples to Deep Neural Networks Using Black-Box Approaches

A GAN-Based Data Injection Attack Method on Data-Driven Strategies in Power Systems

IDSGAN: Generative Adversarial Networks for Attack Generation Against Intrusion Detection

A CMA-ES-Based Adversarial Attack on Black-Box Deep Neural Networks

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

Towards Neural Network-Based Communication System: Attack and Defense

Backdoor Attacks and Defenses for Deep Neural Networks in Outsourced Cloud Environments

AdvFoolGen: Creating Persistent Troubles for Deep Classifiers

Boosting Adversarial Attacks with Nadam Optimizer

NeuroAttack: Undermining Spiking Neural Networks Security through Externally Triggered Bit-Flips

Like Teacher, Like Pupil: Transferring Backdoors Via Feature-Based Knowledge Distillation

Eliminating Backdoors in Neural Code Models via Trigger Inversion

VulnerGAN: a Backdoor Attack Through Vulnerability Amplification Against Machine Learning-Based Network Intrusion Detection Systems

NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks.