Speech Enhancement with Multi-granularity Vector Quantization.

Xiaoying Zhao,Qiushi Zhu,Jie Zhang,Yeping Zhou,Peiqi Liu
DOI: https://doi.org/10.1109/apsipaasc58517.2023.10317485
2023-01-01
Abstract:Neural network based speech enhancement (SE) has developed rapidly in the last decade. Meanwhile, the self-supervised pre-trained model and vector quantization (VQ) has achieved excellent performance on many speech-related tasks, while they are less explored on SE. As it was shown that utilizing a VQ module to discretize noisy speech representation is beneficial for speech denoising, in this work we therefore study the impact of using VQ at different layers with different number of codebooks. Different VQ modules indeed enable to extract multiple-granularity speech features. Following an attention mechanism, the contextual features extracted by a pre-trained model are fused with the local features extracted by the encoder, such that both global and local information are preserved to reconstruct the enhanced speech. Experimental results on the Valentini dataset show that the proposed model can improve the SE performance, where the impact of choosing pre-trained models is also revealed.
What problem does this paper attempt to address?