Defending Against Label-Only Attacks via Meta-Reinforcement Learning
Dayong Ye,Tianqing Zhu,Kun Gao,Wanlei Zhou
DOI: https://doi.org/10.1109/tifs.2024.3357292
IF: 7.231
2024-02-14
IEEE Transactions on Information Forensics and Security
Abstract:Machine learning models are susceptible to a range of adversarial activities. These attacks are designed to either infer private information from the target model or deceive it. For instance, an attacker may attempt to discern if a given data example is from the model's training set (membership inference attacks) or create adversarial examples to mislead the model to make incorrect predictions (adversarial example attacks). Numerous defense methods have been proposed to counter these attacks. However, these methods typically share two common limitations. Firstly, most are not designed to address label-only attacks, which is a newly emerged kind of attacks that rely solely on the hard labels predicted by the target model. Secondly, they are often developed to mitigate specific attacks rather than universally various attacks. To address these limitations, this paper proposes a novel defense method that focuses on the most challenging attacks, i.e., label-only attacks, and can handle various types of label-only attacks. The key idea is to strategically modify the target model's predicted labels using a meta-reinforcement learning technique. This ensures that attackers receive incorrect labels while benign users continue to receive correct labels. Notably, the defender, i.e., the owner of the target model, can make effective decisions without knowledge of the attacker's behavior. The experimental results demonstrate that our proposed method is an effective defense against a range of attacks, including label-only model stealing, label-only membership inference, label-only model inversion, and label-only adversarial example attacks.
computer science, theory & methods,engineering, electrical & electronic