CASN: Class-Aware Score Network for Textual Adversarial Detection.

Rong Bao,Rui Zheng,Liang Ding,Qi Zhang,Dacheng Tao
DOI: https://doi.org/10.18653/v1/2023.acl-long.40
2023-01-01
Abstract:Adversarial detection aims to detect adversarial samples that threaten the security of deep neural networks, which is an essential step toward building robust AI systems. Density-based estimation is widely considered as an effective technique by explicitly modeling the distribution of normal data and identifying adversarial ones as outliers. However, these methods suffer from significant performance degradation when the adversarial samples lie close to the non-adversarial data manifold. To address this limitation, we propose a score-based generative method to implicitly model the data distribution. Our approach utilizes the gradient of the log-density data distribution and calculates the distribution gap between adversarial and normal samples through multi-step iterations using Langevin dynamics. In addition, we use supervised contrastive learning to guide the gradient estimation using label information, which avoids collapsing to a single data manifold and better preserves the anisotropy of the different labeled data distributions. Experimental results on three text classification tasks upon four advanced attack algorithms show that our approach is a significant improvement (+15.2 F1 score on average against previous SOTA) over previous detection methods.
What problem does this paper attempt to address?