BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation

Zihan Zhang,Xianjun Xia,Chuanzeng Huang,Yijian Xiao,Lei Xie
2024-06-10
Abstract:Audio packet loss is an inevitable problem in real-time speech communication. A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed. Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS. This paper presents its updated version, BS-PLCNet 2, to reduce computational complexity and improve performance further. Specifically, to compensate for the missing future information, in the wide-band module, we design a dual-path encoder structure (with non-causal and causal path) and leverage an intra-model knowledge distillation strategy to distill the future information from the non-causal teacher to the casual student. Moreover, we introduce a lightweight post-processing module after packet loss restoration to recover speech distortions and remove residual noise in the audio signal. With only 40% of original parameters in BS-PLCNet, BS-PLCNet 2 brings 0.18 PLCMOS improvement on the ICASSP 2024 PLC challenge blind set, achieving state-of-the-art performance on this dataset.
Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the degradation of voice quality caused by the loss of audio data packets in real - time voice communication. Specifically, the paper proposes an improved version of the Band - Split Packet - Loss Concealment Network (BS - PLCNet 2), aiming to improve performance while reducing computational complexity. BS - PLCNet 2 achieves this goal through the following points: 1. **Optimization of network structure**: The network structure of the original BS - PLCNet is optimized, especially by using depth - wise separable convolution in the convolution operation to reduce computational complexity. 2. **Intra - model knowledge distillation**: A dual - path convolution structure (including non - causal and causal paths) is introduced, and through the intra - model knowledge distillation strategy, the knowledge of the non - causal model is transferred to the causal model, so as to utilize future information without increasing computational complexity. 3. **Two - stage post - processing module**: After recovering the lost data packets, a lightweight post - processing module is introduced to recover voice distortion and remove residual noise, further improving voice quality. Through these improvements, BS - PLCNet 2 has achieved a higher PLCMOS score than the original BS - PLCNet on the blind test set of the ICASSP 2024 PLC Challenge, and the computational complexity and the number of parameters are reduced by approximately 61.9% and 60% respectively. This makes BS - PLCNet 2 more suitable for real - time applications while maintaining high performance.