Differential Weight Quantization for Multi-Model Compression

Wenhong Duan,Zhenhua Liu,Chuanmin Jia,Shanshe Wang,Siwei Ma,Wen Gao
DOI: https://doi.org/10.1109/tmm.2022.3208530
IF: 7.3
2022-01-01
IEEE Transactions on Multimedia
Abstract:Low bit-width quantization can effectively reduce the storage and computational costs of deep neural networks. Existing quantization methods are commonly designed for single model compression. For multi-model compression scenarios, multiple models for the same task or similar tasks need to be compressed simultaneously in multimedia tasks, such as compressing image super-resolution models for different scales and transferring of different models in multimedia. However, single model quantization methods do not consider the correlations among the weights of different models, which limits the further compression for the above multi-model compression scenarios. To sufficiently excavate the potential of compression on multi-model, we propose a novel quantization scheme for multi-model compression, namely differential weight quantization (DWQ), which focuses on the weights increment between the target model and the reference model. Specifically, DWQ is achieved by increment computation, increment quantization and fine-tuning, which utilizes the reference model to guide the subsequent quantization on the target model. Due to the correlations between the weights of different models, the distribution of weights increment is more centralized compared with original weights, which can achieve a higher compression ratio by lower bit representation on weights increment. Moreover, the progressive training method is proposed to accelerate the convergence and reduce quantization loss on the DWQ framework. Extensive experiments validate the effectiveness of DWQ based on weight-sharing and parameterized clipping activation (PACT) quantization technologies on multiple tasks. The proposed framework can achieve 2x compression improvement and reduce 30% computational complexity with comparable performance in the popular multimedia tasks.
What problem does this paper attempt to address?