GTCRN: A Speech Enhancement Model Requiring Ultralow Computational Resources

Xiaobin Rong,Tianchi Sun,Xu Zhang,Yuxiang Hu,Changbao Zhu,Jing Lu
DOI: https://doi.org/10.1109/icassp48485.2024.10448310
2024-01-01
Abstract:While modern deep learning-based models have significantly outperformed traditional methods in the area of speech enhancement, they often necessitate a lot of parameters and extensive computational power, making them impractical to be deployed on edge devices in real-world applications. In this paper, we introduce Grouped Temporal Convolutional Recurrent Network (GTCRN), which incorporates grouped strategies to efficiently simplify a competitive model, DPCRN. Additionally, it leverages subband feature extraction modules and temporal recurrent attention modules to enhance its performance. Remarkably, the resulting model demands ultralow computational resources, featuring only 23.7 K parameters and 39.6 MMACs per second. Experimental results show that our proposed model not only surpasses RNNoise, a typical lightweight model with similar computational burden, but also achieves competitive performance when compared to recent baseline models with significantly higher computational resources requirements.
What problem does this paper attempt to address?