A Light-weighted One-stage Framework for Speech Enhancement

Zhuangqi Chen,Pingjian Zhang
DOI: https://doi.org/10.1109/ijcnn52387.2021.9533590
2021-01-01
Abstract:Recent studies in deep learning based speech enhancement have seen great progress. However, it remains a challenging problem to balance between high accuracy and complexity of the speech enhancement models. To address this issue, we propose a novel speech enhancement framework that consists of a two-stage training module and a co-worker based speech enhancement network (Co-worker-SENet). In the training phase, we first train a teacher model to extract basic features. Then, a student model learns from the teacher at some early steps and further refines the features. Both the teacher and student models are Co-worker-SENet where a stack of feature extraction (FE) blocks is used to learn and refine the features. The FE block consists of multiple cheap workers, which extract features independently. Those features are then fused as the output of the FE block. We conduct extensive experiments on the commonly used VoiceBank-Demand dataset, and the experimental results show that the two-stage training framework can effectively improve the performance of the one-stage method and performs comparably to other state-of-the-art approaches with simple and cheap operations.
What problem does this paper attempt to address?