Lite-RTSE: Exploring a Cost-Effective Lite DNN Model for Real-Time Speech Enhancement in RTC Scenarios

Xingwei Liang,Lu Zhang,Zhiyong Wu,Ruifeng Xu
DOI: https://doi.org/10.1109/lsp.2023.3330124
2023-01-01
IEEE Signal Processing Letters
Abstract:The noise reduction performance of DNN-based monaural speech enhancement (SE) methods has been significantly improved in recent years, while the complexity of the model has also been increased several times. Therefore, it is highly desirable to explore more ‘cost-effective’ speech enhancement methods for a wider range of hardware platforms. In this letter, we investigate low-cost model design strategies and propose a lite real-time speech enhancement (Lite-RTSE) model. This real-time SE model achieves efficient speech enhancement by leveraging the low-dimensional long short-term memory (LSTM) units and a novel multi-order convolution block. A two-stage complex spectrum reconstruction scheme of ‘masking + residual’ contributes to better quality and intelligibility of enhanced speech. Experimental results show that Lite-RTSE model is able to achieve competitive speech denoising performance compared with state-of-the-art SE models, while only containing 1.56 M parameters at 0.55 G multiply-accumulate operations per second (MAC/S).
What problem does this paper attempt to address?