Magnificent Minified Models

Rich Harang,Hillary Sanders
2023-06-17
Abstract:This paper concerns itself with the task of taking a large trained neural network and 'compressing' it to be smaller by deleting parameters or entire neurons, with minimal decreases in the resulting model accuracy. We compare various methods of parameter and neuron selection: dropout-based neuron damage estimation, neuron merging, absolute-value based selection, random selection, OBD (Optimal Brain Damage). We also compare a variation on the classic OBD method that slightly outperformed all other parameter and neuron selection methods in our tests with substantial pruning, which we call OBD-SD. We compare these methods against quantization of parameters. We also compare these techniques (all applied to a trained neural network), with neural networks trained from scratch (random weight initialization) on various pruned architectures. Our results are only barely consistent with the Lottery Ticket Hypothesis, in that fine-tuning a parameter-pruned model does slightly better than retraining a similarly pruned model from scratch with randomly initialized weights. For neuron-level pruning, retraining from scratch did much better in our experiments.
Machine Learning
What problem does this paper attempt to address?
The paper primarily explores how to compress large neural networks through various methods to reduce their size and the number of parameters while maintaining model accuracy as much as possible. Specifically, the study compares several methods for parameter and neuron selection, including dropout-based neuron damage estimation, neuron merging, absolute value selection, random selection, and Optimal Brain Damage (OBD). It also proposes an improved version of the OBD method—OBD-SD. Additionally, the paper compares the effects of quantization techniques with these compression methods. The research finds that under extensive pruning, the OBD-SD method slightly outperforms all other methods. However, at the neuron level pruning, retraining the pruned model from scratch performs better. The results only slightly support the Lottery Ticket Hypothesis in the context of parameter-level pruning, indicating that fine-tuning the pruned model performs slightly better than retraining the model from scratch with randomly initialized weights. In summary, the paper aims to address the following issues: 1. **How to effectively compress neural networks**: By removing parameters or entire neurons to reduce model size while minimizing the negative impact on model accuracy. 2. **Comparison of different pruning methods**: Evaluating the effectiveness of various pruning methods and determining best practices. 3. **Performance comparison between pruning and retraining**: Verifying whether fine-tuning a pruned model is superior to retraining the model from scratch.