Abstract:Nowadays, machine learning (ML) and deep learning (DL) methods have become fundamental building blocks for a wide range of AI applications. The popularity of these methods also makes them widely exposed to malicious attacks, which may cause severe security concerns. To understand the security properties of the ML/DL methods, researchers have recently started to turn their focus to adversarial attack algorithms that could successfully corrupt the model or clean data owned by the victim with imperceptible perturbations. In this paper, we study the Label Flipping Attack (LFA) problem, where the attacker expects to corrupt an ML/DL model's performance by flipping a small fraction of the labels in the training data. Prior art along this direction adopts combinatorial optimization problems, leading to limited scalability toward deep learning models. To this end, we propose a novel minimax problem which provides an efficient reformulation of the sample selection process in LFA. In the new optimization problem, the sample selection operation could be implemented with a single thresholding parameter. This leads to a novel training algorithm called Sample Thresholding. Since the objective function is differentiable and the model complexity does not depend on the sample size, we can apply Sample Thresholding to attack deep learning models. Moreover, since the victim's behavior is not predictable in a poisonous attack setting, we have to employ surrogate models to simulate the true model employed by the victim model. Seeing the problem, we provide a theoretical analysis of such a surrogate paradigm. Specifically, we show that the performance gap between the true model employed by the victim and the surrogate model is small under mild conditions. On top of this paradigm, we extend Sample Thresholding to the crowdsourced ranking task, where labels collected from the annotators are vulnerable to adversarial attacks. Finally, experimental analyses on three real-world datasets speak to the efficacy of our method.

Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing Attack

D-DAE: Defense-Penetrating Model Extraction Attacks.

B3: Backdoor Attacks Against Black-box Machine Learning Models

Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only

Towards Efficient Data Free Blackbox Adversarial Attack

Query-efficient label-only attacks against black-box machine learning models

Data-Free Hard-Label Robustness Stealing Attack

Policy-Driven Attack: Learning to Query for Hard-label Black-box Adversarial Examples.

Efficient Label Contamination Attacks Against Black-Box Learning Models.

Defending Against Label-Only Attacks via Meta-Reinforcement Learning

Rethinking Label Flipping Attack: From Sample Masking to Sample Thresholding

Box-Free Model Watermarks Are Prone to Black-Box Removal Attacks

Label Poisoning is All You Need

Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence

Efficient Data-Free Model Stealing with Label Diversity

Label-Only Model Inversion Attacks via Knowledge Transfer

Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking

Impart: An Imperceptible and Effective Label-Specific Backdoor Attack

Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring

Data Stealing Attacks against Large Language Models via Backdooring

Data-Free Adversarial Perturbations for Practical Black-Box Attack