Compressed models are NOT miniature versions of large models

Rohit Raj Rai,Rishant Pal,Amit Awekar
2024-07-18
Abstract:Large neural models are often compressed before deployment. Model compression is necessary for many practical reasons, such as inference latency, memory footprint, and energy consumption. Compressed models are assumed to be miniature versions of corresponding large neural models. However, we question this belief in our work. We compare compressed models with corresponding large neural models using four model characteristics: prediction errors, data representation, data distribution, and vulnerability to adversarial attack. We perform experiments using the BERT-large model and its five compressed versions. For all four model characteristics, compressed models significantly differ from the BERT-large model. Even among compressed models, they differ from each other on all four model characteristics. Apart from the expected loss in model performance, there are major side effects of using compressed models to replace large neural models.
Machine Learning,Information Retrieval
What problem does this paper attempt to address?
The paper attempts to address the question of whether compressed models can be considered as miniature versions of large neural models (LNM). The researchers challenge this common assumption by comparing the performance of compressed models and their corresponding large neural models on four key characteristics: prediction errors, data representation, data distribution, and vulnerability to adversarial attacks. The experiments used the BERT-large model and its five compressed versions for comparison, and found significant differences between the compressed models and the large models in these four aspects, with even different behaviors observed among the compressed models themselves. Therefore, the main contribution of this paper is to reveal the potential side effects of using compressed models as substitutes for large models, emphasizing the need for more careful analysis of model behavior in practical applications.