SparseWAV: Fast and Accurate One-Shot Unstructured Pruning for Large Speech Foundation Models

Tianteng Gu,Bei Liu,Hang Shao,Yanmin Qian
DOI: https://doi.org/10.21437/interspeech.2024-607
2024-01-01
Abstract:Self-supervised speech representation learning has shown remarkable capability in automatic speech recognition. However, it requires substantial computations and storage capacity. Pruning is an effective method for model compression. In this work, we propose SparseWAV, a fast and accurate unstructured pruning framework designed for large speech foundation models, which can efficiently remove unimportant parameters without sacrificing performance. It adaptively determines the sparsity ratio for each weight matrix within pre-trained models and updates the remaining parameters to compensate for the eliminated ones. Experiments on Librispeech demonstrate the proposed method can remove 80% of the parameters of pre-trained large speech foundation models with negligible performance loss. Compared to previous works, our resulting models achieves up to 30% improvement in performance under similar parameters. Meanwhile, the compression algorithm's time consumption is reduced by up to 1080x.
What problem does this paper attempt to address?