Abstract:Nowadays, machine learning (ML) and deep learning (DL) methods have become fundamental building blocks for a wide range of AI applications. The popularity of these methods also makes them widely exposed to malicious attacks, which may cause severe security concerns. To understand the security properties of the ML/DL methods, researchers have recently started to turn their focus to adversarial attack algorithms that could successfully corrupt the model or clean data owned by the victim with imperceptible perturbations. In this paper, we study the Label Flipping Attack (LFA) problem, where the attacker expects to corrupt an ML/DL model's performance by flipping a small fraction of the labels in the training data. Prior art along this direction adopts combinatorial optimization problems, leading to limited scalability toward deep learning models. To this end, we propose a novel minimax problem which provides an efficient reformulation of the sample selection process in LFA. In the new optimization problem, the sample selection operation could be implemented with a single thresholding parameter. This leads to a novel training algorithm called Sample Thresholding. Since the objective function is differentiable and the model complexity does not depend on the sample size, we can apply Sample Thresholding to attack deep learning models. Moreover, since the victim's behavior is not predictable in a poisonous attack setting, we have to employ surrogate models to simulate the true model employed by the victim model. Seeing the problem, we provide a theoretical analysis of such a surrogate paradigm. Specifically, we show that the performance gap between the true model employed by the victim and the surrogate model is small under mild conditions. On top of this paradigm, we extend Sample Thresholding to the crowdsourced ranking task, where labels collected from the annotators are vulnerable to adversarial attacks. Finally, experimental analyses on three real-world datasets speak to the efficacy of our method.

LabelFool: A Trick In The Label Space

Fooling Neural Network Interpretations - Adversarial Noise to Attack Images.

A Small Sticker is Enough: Spoofing Face Recognition Systems Via Small Stickers

Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only

F&F Attack: Adversarial Attack Against Multiple Object Trackers by Inducing False Negatives and False Positives

Rethinking Label Flipping Attack: From Sample Masking to Sample Thresholding

Label Poisoning is All You Need

Clean-label attack based on negative afterimage on neural networks

AdvFoolGen: Creating Persistent Troubles for Deep Classifiers

Misleading attention and classification: An adversarial attack to fool object detection models in the real world

Query-efficient label-only attacks against black-box machine learning models

Impart: An Imperceptible and Effective Label-Specific Backdoor Attack

Fast Adversarial Label-Flipping Attack on Tabular Data

Adversary-Aware Partial label learning with Label distillation

Defending Against Label-Only Attacks via Meta-Reinforcement Learning

Fooling the Textual Fooler via Randomizing Latent Representations

On Defending Against Label Flipping Attacks on Malware Detection Systems

GreedyFool: Multi-factor imperceptibility and its application to designing a black-box adversarial attack

Under-confidence Backdoors Are Resilient and Stealthy Backdoors

Transparency Attacks: How Imperceptible Image Layers Can Fool AI Perception

When Measures are Unreliable: Imperceptible Adversarial Perturbations toward Top-$k$ Multi-Label Learning