Abstract:Single-channel speech enhancement is a popular problem in speech enhancement and related fields, but the traditional research direction is to improve the data structure, which always faces the problem of heavily relying on training data sets. This paper proposes and tests a unique method of improving the traditional speech enhancement algorithm based on the features of speech and hearing. This method is designed based on the fundamental frequency (F0) and harmonic features of speech to emphasize the F0 (EFF) like the human auditory system (neural lateral inhibition mechanism). Therefore, it does not depend on the training data and has good scalability, which can be easily embedded in the traditional algorithm. In this paper, the Chinese speech library, the English speech library, and a variety of noise are used in the experiments. When tested, this method can improve the performance of the original algorithm in speech enhancement, especially in the case of low SNR. However, since it is an additional process, therefore the increase in intelligibility of speech may not be high sometimes compared with the increase in perception quality in the case of high SNR, like the STOI scores. But the auditory perception index, like the PESQ scores, are significantly improved, and the WSS scores are reduced as desired. After embedding the algorithm into DNN-based or SNMF-based speech enhancement algorithms, the enhancement in the PESQ scores is improved by about 5% on average, and WSS scores are reduced, while having less negative impact on the increase in STOI in the case of high SNR. The EFF is not related to the training process of the model, but it can improve the PESQ and STOI scores, and lowers the WSS scores. This suggests that fundamental frequency is an important feature in speech processing that affects speech quality. It is necessary to actively introduce fundamental frequency as a feature in speech processing. The proposed algorithm is tested on both the Chinese and English speech datasets for extensive evaluation. The results show significant improvement compared to traditional algorithms.

Effects of noise spectrum estimation algorithms on speech intelligibility

Enhancement Algorithm for Low Signal to Noise Ratio Speech

Assessing Level-Dependent Segmental Contribution to the Intelligibility of Speech Processed by Single-Channel Noise-Suppression Algorithms

Spectral-change Enhancement with Prior SNR for the Hearing Impaired

Improved Speech Enhancement Algorithm Based on Short-Time Spectral Analysis

Effect of enhancement of spectral changes on speech intelligibility and clarity preferences for the hearing impaired.

Effect of Individually Tailored Spectral Change Enhancement on Speech Intelligibility and Quality for Hearing-Impaired Listeners

Noise Estimation Using Mean Square Cross Prediction Error for Speech Enhancement

Speech Enhancement Based on Short-Time Spectral Amplitude Estimates in Low SNR

Evaluation of Frequency-Lowering Algorithms for Intelligibility of Chinese Speech in Hearing-Aid Users

Speech Enhancement Algorithm Based on Spectral Subtraction

Sub-band Adaptive Noise Reduction Algorithm to Improve Speech Intelligibility

Speech Enhancement Based on Estimation of Priori SNR Using Iterative Spectral Gain Method

Speech enhancement based on emphasizing the fundamental frequency integrated with SNMF/DNN

A Speech Enhancement Algorithm Based on Computational Auditory Scene Analysis

Monaural Speech Enhancement using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure

On intrusive speech quality measures and a global SNR based metric

Noise reduction using wavelet thresholding of multitaper estimators and geometric approach to spectral subtraction for speech coding strategy

Improved Speech Intelligibility in Noise with a Single-Microphone Noise Reduction Technique

A Spectral Domain Compounded Speech Enhancement Algorithm Based on Parameter Adaptive Spectral Method According to A Priori SNR

Speech Enhancement for Non-Stationary Noise Environments