Enhancing Accuracy and Parameter-Efficiency of Neural Representations for Network Parameterization

Hongjun Choi,Jayaraman J. Thiagarajan,Ruben Glatt,Shusen Liu
2024-06-29
Abstract:In this work, we investigate the fundamental trade-off regarding accuracy and parameter efficiency in the parameterization of neural network weights using predictor networks. We present a surprising finding that, when recovering the original model accuracy is the sole objective, it can be achieved effectively through the weight reconstruction objective alone. Additionally, we explore the underlying factors for improving weight reconstruction under parameter-efficiency constraints, and propose a novel training scheme that decouples the reconstruction objective from auxiliary objectives such as knowledge distillation that leads to significant improvements compared to state-of-the-art approaches. Finally, these results pave way for more practical scenarios, where one needs to achieve improvements on both model accuracy and predictor network parameter-efficiency simultaneously.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper mainly discusses the trade-off between accuracy and efficiency in neural network weight parameterization. It was found that the accuracy of the original model can be effectively restored by only reconstructing the weight target, and the performance can be further improved by repeating the reconstruction process multiple times. The authors propose a new training scheme that divides the training objective into the reconstruction stage and the knowledge distillation stage, decoupling different learning objectives, which significantly outperforms existing methods. This method allows for improving the efficiency of the prediction network while maintaining high accuracy. The paper also found that using only the reconstruction loss can achieve better network performance than the original model, and higher compression rates can be achieved without sacrificing performance through multiple rounds of reconstruction. Additionally, using a high-capacity teacher network can further optimize the balance between compression and performance. Experimental results demonstrate the effectiveness of these strategies on multiple datasets and network architectures.