Research on Model Compression for Embedded Platform through Quantization and Pruning

Xiao Hu,Hao Wen
DOI: https://doi.org/10.1088/1742-6596/2078/1/012047
2021-11-01
Journal of Physics: Conference Series
Abstract:Abstract So far, artificial intelligence has gone through decades of development. Although artificial intelligence technology is not yet mature, it has already been applied in many walks of life. With the explosion of IoT technology in 2019, artificial intelligence has ushered in a new climax. It can be said that the development of IoT technology has led to the development of artificial intelligence once again. But the traditional deep learning model is very complex and redundant. The hardware environment of IoT can not afford the time and resources cost by the model which runs on the GPU originally, so model compression without decreasing accuracy rate so much is applicable in this situation. In this paper, we experimented with using two tricks for model compression: Pruning and Quantization. By utilizing these methods, we got a remarkable improvement in model simplification while retaining a relatively close accuracy.
What problem does this paper attempt to address?