Geometric Universality of Adversarial Examples in Deep Learning

Haosheng Zou,Hang Su,Tianyu Pang,Jun Zhu
2018-01-01
Abstract:We consider the problem of adversarial examples in deep learning and attempt to provide geometric insights on their universality. Specifically, we define adversarial directions and prove relevant results towards universality of adversarial examples with few theoretical assumptions. Our results raise attention to fully-connected layers as the last layer of most neural networks, which may be prone to adversarial examples, demanding further research in this regard. A longer version with full proofs and discussions is provided with the submission email and also here. Consider the softmax regression layer at the end of many popular neural networks for visual classification tasks (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Szegedy et al., 2016; He et al., 2016) and the hidden space of the input neurons to the softmax layer. Denote the hidden space of the input to softmax layer H⊆ Rl, and let h∈ H be this input vector, and l is the number of neurons in the final hidden layer. We further denote m the number of classes. We define softmax function S (z): Rm↦→ Rm as S (z) where z is the logits. Then the overall softmax layer could be denoted S (WT h+ b). The neural network classifier first maps input images x to the hidden representation h with the complex multi-layer non-linear function g: X↦→ H, h= g (x), and then perform softmax regression to obtain a predicted label y= arg maxi∈[m] S (Wt h+ b) i. We only show results with the case H= Rl here.
What problem does this paper attempt to address?