Are Transformer-Based Models More Robust Than CNN-based Models?

Zhendong Liu,Shuwei Qian,Changhong Xia,Chongjun Wang
DOI: https://doi.org/10.1016/j.neunet.2023.12.045
IF: 7.8
2024-01-01
Neural Networks
Abstract:As the deployment of artificial intelligence (AI) models in real-world settings grows, their open-environment robustness becomes increasingly critical. This study aims to dissect the robustness of deep learning models, particularly comparing transformer-based models against CNN-based models. We focus on unraveling the sources of robustness from two key perspectives: structural and process robustness. Our findings suggest that transformer-based models generally outperform convolution-based models in robustness across multiple metrics. However, we contend that these metrics may not wholly represent true model robustness, such as the mean of corruption error. To better understand the underpinnings of this robustness advantage, we analyze models through the lens of Fourier transform and game interaction. From our insights, we propose a calibrated evaluation metric for robustness against real-world data, and a blur-based method to enhance robustness performance. Our approach achieves state-of-the-art results, with mCE scores of 2.1% on CIFAR-10-C, 12.4% on CIFAR-100-C, and 24.9% on TinyImageNet-C.
What problem does this paper attempt to address?