LabelFool: A Trick In The Label Space

Yujia Liu,Ming Jiang,Tingting Jiang
DOI: https://doi.org/10.1109/IJCNN55064.2022.9892136
2022-01-01
Abstract:Adversarial attack methods can induce machine learning classifiers to mislabel errors. Current methods pay much attention to errors in the image space, i.e. the imperceptibility of adversarial perturbations, to avoid attacks being detected by humans. However, they overlook errors in the label space, i.e. the similarity between the wrong label and the true label. It is easy for humans to detect attacks if the wrong label has a big difference with the true label, for example, a dog is mislabeled as a cat. In this paper, we propose a novel attack method called LabelFool which attacks images with undetectable errors in both label space and image space. Given a classifier, for each input image, LabelFool first predicts the true label by estimating its probability distribution, then selects one label perceptually nearest to the predicted true label as the target label. Then LabelFool generates the adversarial sample by moving the input image towards the classification boundary between the predicted true label and the target label. The subjective experiments on ImageNet and visual results on CASIA-WebFace show that LabelFool is less detectable in the label space than other attack methods. Moreover, LabelFool has low perceptibility in the image space together with a high attack rate.
What problem does this paper attempt to address?