Fast Hybrid Search for Automatic Model Compression
Guilin LiÂ,Lang Tang,Xiawu Zheng
DOI: https://doi.org/10.3390/electronics13040688
IF: 2.9
2024-02-08
Electronics
Abstract:Neural network pruning has been widely studied for model compression and acceleration, to facilitate model deployment in resource-limited scenarios. Conventional methods either require domain knowledge to manually design the pruned model architecture and pruning algorithm, or AutoML-based methods to search the pruned model architecture but still prune all layers with a single pruning algorithm. However, many pruning algorithms have been proposed and they all differ regarding the importance they attribute to the criterion of filters. Therefore, we propose a hybrid search method, searching for the pruned model architecture and the pruning algorithm at the same time, which automatically finds the pruning ratio and pruning algorithm for each convolution layer. Moreover, to be more efficient, we divide the search process into two phases. Firstly, we search in a huge space with adaptive batch normalization, which is a fast but relatively inaccurate model evaluation method; secondly, we search based on the previous results and evaluate models by fine-tuning, which is more accurate. Therefore, our proposed hybrid search method is efficient, and achieves a clear improvement in performance compared to current state-of-the-art methods, including AMC, MetaPruning, and ABCPruner. For example, when pruning MobileNet, we achieve a 59.8% test accuracy on ImageNet with only 49 M FLOPs, which is 2.6% higher than MetaPruning.
engineering, electrical & electronic,computer science, information systems,physics, applied