Neural Network Compression Framework for Fast Model Inference

Alexander Kozlov,Ivan Lazarevich,Vasily Shamporov,Nikolay Lyalyushkin,Yury Gorbachev
DOI: https://doi.org/10.1007/978-3-030-80129-8_17
2021-01-01
Abstract:We present a new PyTorch-based framework for neural network compression with fine-tuning named Neural Network Compression Framework (NNCF) (https://github.com/openvinotoolkit/nncf) . It leverages recent advances of various network compression methods and implements some of them, namely quantization, sparsity, filter pruning and binarization. These methods allow producing more hardware-friendly models that can be efficiently run on general-purpose hardware computation units (CPU, GPU) or specialized deep learning accelerators. We show that the implemented methods and their combinations can be successfully applied to a wide range of architectures and tasks to accelerate inference while preserving the original model’s accuracy. The framework can be used in conjunction with the supplied training samples or as a standalone package that can be seamlessly integrated into the existing training code with minimal adaptations.
What problem does this paper attempt to address?