Guaranteed Quantization Error Computation for Neural Network Model Compression

Wesley Cooke,Zihao Mo,Weiming Xiang
2023-04-27
Abstract:Neural network model compression techniques can address the computation issue of deep neural networks on embedded devices in industrial systems. The guaranteed output error computation problem for neural network compression with quantization is addressed in this paper. A merged neural network is built from a feedforward neural network and its quantized version to produce the exact output difference between two neural networks. Then, optimization-based methods and reachability analysis methods are applied to the merged neural network to compute the guaranteed quantization error. Finally, a numerical example is proposed to validate the applicability and effectiveness of the proposed approach.
Machine Learning,Artificial Intelligence,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The paper aims to address the issue of output error calculation during the quantization and compression of neural networks. Specifically, the paper proposes a method to calculate the output error between a quantized neural network and the original neural network. By constructing a new neural network that merges a feedforward neural network and its quantized version, the precise output difference between the two can be calculated. Then, optimization methods and reachability analysis methods are used to process this merged neural network to compute the guaranteed quantization error. The authors validated the effectiveness and applicability of the proposed method through numerical examples and demonstrated the memory size comparison of the models before and after quantization. Future research will extend to more complex neural network architectures, such as convolutional neural networks. In summary, the goal of this study is to provide an effective and verifiable method to assess the performance loss due to quantization during the neural network quantization and compression process.