Incorporation of a modified temporal cepstrum smoothing in both signal-to-noise ratio and speech presence probability estimation for speech enhancement
Dahan Wang,Zhongshu Hou,Yuxiang Hu,Changbao Zhu,Jing Lu,Jingdong Chen
DOI: https://doi.org/10.1121/10.0026223
2024-06-01
Abstract:Numerous advanced and lightweight signal processing methods have been presented for single-channel speech enhancement (SE). It is imperative to carefully explore how to efficiently combine, integrate, and balance these methods. This paper proposes a more effective and less resource-intensive SE system, focused on the integration and adaptation of several approaches, especially the temporal cepstrum smoothing (TCS). First, a more robust fundamental frequency estimator is employed within TCS, mitigating the performance limitations caused by the inaccuracy of the original estimator. Additionally, a harmonic enhancement mechanism is introduced, effectively recovering the weak harmonic components. By incorporation of the modified TCS in the a posteriori speech presence probability estimation, the unbiased minimum mean square error noise power spectral density estimator can be refined. The modified TCS is also utilized for the a priori signal-to-noise ratio estimation. Moreover, this paper enhances the log-spectral amplitude estimator by applying both super-Gaussian speech priors and speech presence uncertainty for further improvement. Experimental evaluations demonstrate that the proposed method yields an improvement in speech quality while maintaining modest computational and storage requirements. Furthermore, the proposed system exhibits comparable performance to several baseline systems based on lightweight deep neural networks.