Task-Agnostic Structured Pruning of Speech Representation Models

Haoyu Wang,Siyuan Wang,Wei-Qiang Zhang,Hongbin Suo,Yulong Wan
2023-07-09
Abstract:Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been shown to significantly improve many speech tasks. However, their large memory and strong computational requirements hinder their industrial applicability. Structured pruning is a hardware-friendly model compression technique but usually results in a larger loss of accuracy. In this paper, we propose a fine-grained attention head pruning method to compensate for the performance degradation. In addition, we also introduce the straight through estimator into the L0 regularization to further accelerate the pruned model. Experiments on the SUPERB benchmark show that our model can achieve comparable performance to the dense model in multiple tasks and outperforms the Wav2vec 2.0 base model on average, with 72% fewer parameters and 2 times faster inference speed.
Audio and Speech Processing,Computation and Language,Sound
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to reduce the number of parameters and improve the inference speed while maintaining the performance of self - supervised pre - trained speech representation models, in order to overcome the high memory and computational requirements of these models in industrial applications. Specifically, the paper proposes a fine - grained attention head pruning method, aiming to compensate for the performance degradation caused by structured pruning, and introduces the Straight Through Estimator (STE) to further accelerate the pruned model. Experimental results show that the proposed model achieves performance comparable to that of the dense model on multiple tasks, while reducing the number of parameters by 72% and doubling the inference speed.