A Closer Look at the Adversarial Robustness of Deep Equilibrium Models

Zonghan Yang,Tianyu Pang,Yang Liu
2023-06-02
Abstract:Deep equilibrium models (DEQs) refrain from the traditional layer-stacking paradigm and turn to find the fixed point of a single layer. DEQs have achieved promising performance on different applications with featured memory efficiency. At the same time, the adversarial vulnerability of DEQs raises concerns. Several works propose to certify robustness for monotone DEQs. However, limited efforts are devoted to studying empirical robustness for general DEQs. To this end, we observe that an adversarially trained DEQ requires more forward steps to arrive at the equilibrium state, or even violates its fixed-point structure. Besides, the forward and backward tracks of DEQs are misaligned due to the black-box solvers. These facts cause gradient obfuscation when applying the ready-made attacks to evaluate or adversarially train DEQs. Given this, we develop approaches to estimate the intermediate gradients of DEQs and integrate them into the attacking pipelines. Our approaches facilitate fully white-box evaluations and lead to effective adversarial defense for DEQs. Extensive experiments on CIFAR-10 validate the adversarial robustness of DEQs competitive with deep networks of similar sizes.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the vulnerability of Deep Equilibrium Models (DEQs) when facing adversarial attacks. Specifically, although DEQs perform well in different applications and have significant memory efficiency, their sensitivity to adversarial samples has raised concerns. The paper points out that existing research mainly focuses on the certified robustness of monotonic DEQs, while relatively little research has been done on the empirical robustness of general DEQs. In addition, DEQs after adversarial training require more forward steps to reach an equilibrium state and may even violate their fixed - point structure. Due to the existence of black - box solvers, the forward and backward trajectories of DEQs are inconsistent, leading to gradient confusion, which makes existing attack methods unable to effectively evaluate or adversarially train DEQs. To address these problems, the authors developed methods for estimating the intermediate gradients of DEQs and integrated them into the attack pipeline to achieve a fully white - box evaluation and improve the adversarial defense capabilities of DEQs. Through extensive experimental verification, these methods can effectively evaluate and enhance the adversarial robustness of DEQs, making them competitive when compared with deep networks with similar numbers of parameters. Specifically, the main contributions of the paper include: 1. **Identifying challenges**: A detailed analysis of the challenges in training robust DEQs, including the convergence of black - box solvers and the inconsistency of forward and backward trajectories. 2. **Intermediate gradient estimation**: Two methods for estimating the intermediate gradients of DEQs are proposed: the Simultaneous Adjoint method and the Unrolling method. 3. **White - box attack and defense**: A white - box attack method based on intermediate gradients is designed, and defense strategies using intermediate states, such as early exit and state aggregation, are proposed. 4. **Experimental verification**: A large number of experiments were carried out on the CIFAR - 10 dataset to verify the effectiveness of the proposed methods and demonstrate the competitiveness of DEQs in terms of adversarial robustness. In summary, this paper aims to promote the reliability and security of DEQs in practical applications by improving the evaluation and defense methods of DEQs' adversarial robustness.