Poster: Detecting Adversarial Examples Hidden under Watermark Perturbation via Usable Information Theory

Ziming Zhao,Zhaoxuan Li,Tingting Li,Zhuoxue Song,Fan Zhang,Rui Zhang
DOI: https://doi.org/10.1145/3576915.3624396
2023-01-01
Abstract:Image watermark is a technique widely used for copyright protection. Recent studies show that the image watermark can be added to the clear image as a kind of noise to realize fooling deep learning models. However, previous adversarial example (AE) detection schemes tend to be ineffective since the watermark logo differs from typical noise perturbations. In this poster, we propose Themis, a novel AE detection method against watermark perturbation. Different from prior methods, Themis neither modifies the protected classifier nor requires knowledge of the process for generating AEs. Specifically, Themis leverages usable information theory to calculate the pointwise score, thereby discovering those instances that may be watermark AEs. The empirical evaluations involving 5 different logo watermark perturbations demonstrate the proposed scheme can efficiently detect AEs, and significantly (over 15% accuracy) outperforms five state-of-the-art (SOTA) detection methods. The visualization results display our detection metric is more distinguishable between AEs and non-AEs. Meanwhile, Themis realizes a larger Area Under Curve (AUC) in a threshold-resilient manner, while only introducing similar to 0.04s overhead.
What problem does this paper attempt to address?