Automatic Hyper-Parameter Search for Vision Transformer Pruning

Jun Feng,Shuai Zhao,Liangying Peng,Sichen Pan,Hao Chen,Zhongxu Li,Gongwu Ke,Gaoli Wang,Youqun Long
DOI: https://doi.org/10.1109/prai59366.2023.10332058
2023-01-01
Abstract:In recent years, the high computational cost of the popular Vision Transformer (ViT) has made it difficult to deploy on lightweight devices. As a result, many pruning techniques have been developed to reduce the size and complexity of ViT models. However, most of these techniques focus on pruning the model as a whole, without considering the differences among its internal modules. Specifically, they apply a uniform pruning ratio to all modules. In our work, we observe that using different pruning ratios for the Multi-Head Self Attention (MHSA) and Feed-Forward Network (FFN) modules can result in improved compression performance for the Vision Transformer (ViT). In this way, we propose a new compression algorithm that applies distinct pruning ratios to each of these modules and automatically searches for optimal pruning ratio parameters. To further enhance the precision of this algorithm, we introduce an improved approach that employs iterative pruning and binary search strategies to identify the optimal parameters at a finer granularity, thereby minimizing the model's accuracy loss during the pruning process. We evaluated the effectiveness of our approach on two commonly used datasets, CIFAR-10 and Mini-ImageNet. Our method was compared to the state-of-the-art (SOTA) method, CP-ViT, which uses a fixed pruning ratio. We found that when the pruned model accuracy was nearly the same, our method achieved a significant reduction in FLOPs, with our method achieving 56.91% of the FLOPs of the fixed pruning ratio method on CIFAR-10. These results demonstrate that our method can be more effective in reducing model complexity while maintaining accuracy.
What problem does this paper attempt to address?