Abstract:Nowadays, machine learning (ML) and deep learning (DL) methods have become fundamental building blocks for a wide range of AI applications. The popularity of these methods also makes them widely exposed to malicious attacks, which may cause severe security concerns. To understand the security properties of the ML/DL methods, researchers have recently started to turn their focus to adversarial attack algorithms that could successfully corrupt the model or clean data owned by the victim with imperceptible perturbations. In this paper, we study the Label Flipping Attack (LFA) problem, where the attacker expects to corrupt an ML/DL model's performance by flipping a small fraction of the labels in the training data. Prior art along this direction adopts combinatorial optimization problems, leading to limited scalability toward deep learning models. To this end, we propose a novel minimax problem which provides an efficient reformulation of the sample selection process in LFA. In the new optimization problem, the sample selection operation could be implemented with a single thresholding parameter. This leads to a novel training algorithm called Sample Thresholding. Since the objective function is differentiable and the model complexity does not depend on the sample size, we can apply Sample Thresholding to attack deep learning models. Moreover, since the victim's behavior is not predictable in a poisonous attack setting, we have to employ surrogate models to simulate the true model employed by the victim model. Seeing the problem, we provide a theoretical analysis of such a surrogate paradigm. Specifically, we show that the performance gap between the true model employed by the victim and the surrogate model is small under mild conditions. On top of this paradigm, we extend Sample Thresholding to the crowdsourced ranking task, where labels collected from the annotators are vulnerable to adversarial attacks. Finally, experimental analyses on three real-world datasets speak to the efficacy of our method.

Efficient Label Contamination Attacks Against Black-Box Learning Models.

B3: Backdoor Attacks Against Black-box Machine Learning Models

Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only

Towards Efficient Data Free Blackbox Adversarial Attack

Query-efficient label-only attacks against black-box machine learning models

Rethinking Label Flipping Attack: From Sample Masking to Sample Thresholding

Label Poisoning is All You Need

Label-Only Model Inversion Attacks via Knowledge Transfer

Improving Query Efficiency of Black-Box Attacks via the Preference of Deep Learning Models

Defending Against Label-Only Attacks via Meta-Reinforcement Learning

Label Sanitization against Label Flipping Poisoning Attacks

Data Contamination Calibration for Black-box LLMs

Fast Adversarial Label-Flipping Attack on Tabular Data

A Black-Box Attack Algorithm Targeting Unlabeled Industrial AI Systems With Contrastive Learning

Latent Code Augmentation Based on Stable Diffusion for Data-free Substitute Attacks

Label-free Poisoning Attack Against Deep Unsupervised Domain Adaptation

Clean-label attack based on negative afterimage on neural networks

BadLabel: A Robust Perspective on Evaluating and Enhancing Label-Noise Learning

Leveraging Model Poisoning Attacks on License Plate Recognition Systems

Policy-Driven Attack: Learning to Query for Hard-label Black-box Adversarial Examples.