Adversarial Robustness under Long-Tailed Distribution Supplementary Material

Tong Wu,Ziwei Liu,Qingqiu Huang,Yu Wang,Dahua Lin
2021-01-01
Abstract:We adopt the WideResNet-34-10 as the model architecture. The initial learning rate is set as 0.1 with a decay factor of 10 at 60 and 75 epochs, totally 80 epochs. We use the last epoch for evaluation without early-stop for all the methods. We use the SGD momentum optimizer with weight decay set as 2×10−4. We use a batch size of 64 for all the experiments in the main paper. The adversarial training is applied with the maximal permutation of 8/255 and a step size of 2/255 (0.031 and 0.0078 are used for implementation). The number of iterations in the inner maximization is set as 5, and a study on the effect of PGD steps in AT is reported in Sec. B.2. There are multiple hyper-parameters involved, where those that control margins or boundary adjustment are the most critical. Specifically, we adopt m0 = 0.1 for CIFAR-10-LT and m0 ∈ {0.2, 0.3} for CIFAR-100-LT for different emphasis (i.e., the trade-off between natural and robust accuracy). τb − τm = 1.2 in Eqn.10 would basically produce a good result via training stage re-balancing, while τb − τm = 0 with τp = 1.5 would also work well based on pure boundary adjustment at inference time. The optimal value of τp relies mainly on τb − τm. The ablation study includes detailed comparisons. Other hyper-parameters are less sensitive and have relatively small impact on the performance, where we adopt s = 10, γ ∈ {1/32, 1/16}, and we set α = 6, 3 in Eqn.12 for CIFAR-10-LT and CIFAR100-LT, respectively.
What problem does this paper attempt to address?