Abstract:In recent years, speech recognition technology has become a more common notion. Speech quality and intelligibility are critical for the convenience and accuracy of information transmission in speech recognition. The speech processing systems used to converse or store speech are usually designed for an environment without any background noise. However, in a real-world atmosphere, background intervention in the form of background noise and channel noise drastically reduces the performance of speech recognition systems, resulting in imprecise information transfer and exhausting the listener. When communication systems' input or output signals are affected by noise, speech enhancement techniques try to improve their performance. To ensure the correctness of the text produced from speech, it is necessary to reduce the external noises involved in the speech audio. Reducing the external noise in audio is difficult as the speech can be of single, continuous or spontaneous words. In automatic speech recognition, there are various typical speech enhancement algorithms available that have gained considerable attention. However, these enhancement algorithms work well in simple and continuous audio signals only. Thus, in this study, a hybridized speech recognition algorithm to enhance the speech recognition accuracy is proposed. Non-linear spectral subtraction, a well-known speech enhancement algorithm, is optimized with the Hidden Markov Model and tested with 6660 medical speech transcription audio files and 1440 Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) audio files. The performance of the proposed model is compared with those of various typical speech enhancement algorithms, such as iterative signal enhancement algorithm, subspace-based speech enhancement, and non-linear spectral subtraction. The proposed cascaded hybrid algorithm was found to achieve a minimum word error rate of 9.5% and 7.6% for medical speech and RAVDESS speech, respectively. The cascading of the speech enhancement and speech-to-text conversion architectures results in higher accuracy for enhanced speech recognition. The evaluation results confirm the incorporation of the proposed method with real-time automatic speech recognition medical applications where the complexity of terms involved is high.

An Iterative Post-processing Approach for Speech Enhancement

Global variance equalization for improving deep neural network based speech enhancement

A Speech Enhancement Algorithm Based on Computational Auditory Scene Analysis

Deep Neural Network Based Noised Asian Speech Enhancement and Its Implementation on a Hearing Aid App.

Masking and Inpainting: A Two-Stage Speech Enhancement Approach for Low SNR and Non-Stationary Noise

Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement

Dynamic noise aware training for speech enhancement based on deep neural networks.

An automatic mixing speech enhancement system for multi-track audio

Speech enhancement from fused features based on deep neural network and gated recurrent unit network

Exploring Conventional Enhancement and Separation Methods for Multi‐speech Enhancement in Indoor Environments

Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments

A Refining Underlying Information Framework for Monaural Speech Enhancement

A regression approach to speech enhancement based on deep neural networks

On Speech Pre-emphasis as a Simple and Inexpensive Method to Boost Speech Enhancement

A Speech Enhancement Algorithm Using Computational Auditory Scene Analysis with Spectral Subtraction

Speech Enhancement with Perceptually-motivated Optimization and Dual Transformations

Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

Improved Speech Enhancement Algorithm Based on Short-Time Spectral Analysis

LSTM-Based Iterative Mask Estimation and Post-Processing for Multi-Channel Speech Enhancement

A Supervised Speech Enhancement Method for Smartphone-Based Binaural Hearing Aids

A Hybrid Speech Enhancement Algorithm for Voice Assistance Application