DynaSlim: Dynamic Slimming for Vision Transformers.

Da Shi,Jingsheng Gao,Ting Liu,Yuzhuo Fu
DOI: https://doi.org/10.1109/icme55011.2023.00251
2023-01-01
Abstract:Vision transformers (ViTs) have achieved significant performance on various vision tasks. However, high computational and memory costs hinder their edge deployment. Existing compression methods employ static constraints between accuracy and efficiency during sparsification. The static constraints restrict the sparsification efficiency and their initialization relies heavily on human expertise. We propose a dynamic slimming strategy for ViT, DynaSlim, to achieve an adaptive accuracy-efficiency constraint during sparsification. We first equip fine-grained, adjustable sparsity weights, the scaling factor between accuracy and efficiency, for multiple dimensions, including input tokens, Multihead Self-Attention (MSA) and Multilayer Perceptron (MLP). We then employ the heuristic search for these non-differentiable factors and combine the search with regularization-based sparsification to obtain the optimal sparsed model. Finally, we compress and retrain the sparsed model under various budgets to get our resulting submodels. Experiments show that our DynaSlim outperforms previous state-of-the-art methods under different budgets. For example, we reduce both parameters and FLOPs of DeiT-B by 39% while increasing its accuracy by 1.9% on ImageNet-1K. Moreover, we demonstrate the transferability of our compressed models on several downstream datasets.
What problem does this paper attempt to address?