Abstract:In recent years, speech recognition technology has become a more common notion. Speech quality and intelligibility are critical for the convenience and accuracy of information transmission in speech recognition. The speech processing systems used to converse or store speech are usually designed for an environment without any background noise. However, in a real-world atmosphere, background intervention in the form of background noise and channel noise drastically reduces the performance of speech recognition systems, resulting in imprecise information transfer and exhausting the listener. When communication systems' input or output signals are affected by noise, speech enhancement techniques try to improve their performance. To ensure the correctness of the text produced from speech, it is necessary to reduce the external noises involved in the speech audio. Reducing the external noise in audio is difficult as the speech can be of single, continuous or spontaneous words. In automatic speech recognition, there are various typical speech enhancement algorithms available that have gained considerable attention. However, these enhancement algorithms work well in simple and continuous audio signals only. Thus, in this study, a hybridized speech recognition algorithm to enhance the speech recognition accuracy is proposed. Non-linear spectral subtraction, a well-known speech enhancement algorithm, is optimized with the Hidden Markov Model and tested with 6660 medical speech transcription audio files and 1440 Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) audio files. The performance of the proposed model is compared with those of various typical speech enhancement algorithms, such as iterative signal enhancement algorithm, subspace-based speech enhancement, and non-linear spectral subtraction. The proposed cascaded hybrid algorithm was found to achieve a minimum word error rate of 9.5% and 7.6% for medical speech and RAVDESS speech, respectively. The cascading of the speech enhancement and speech-to-text conversion architectures results in higher accuracy for enhanced speech recognition. The evaluation results confirm the incorporation of the proposed method with real-time automatic speech recognition medical applications where the complexity of terms involved is high.

A hybrid discriminant fuzzy DNN with enhanced modularity bat algorithm for speech recognition

Speaker recognition using Improved Butterfly Optimization Algorithm with hybrid Long Short Term Memory network

Speech Intelligibility Based Enhancement System Using Modified Deep Neural Network and Adaptive Multi-band Spectral Subtraction

A Hybrid Speech Enhancement Algorithm for Voice Assistance Application

An optimized attention based hybrid deep learning framework for automatic speaker identification from speech signals

Multimodal Speech Recognition Using EEG and Audio Signals: A Novel Approach for Enhancing ASR Systems

Employing Hybrid Deep Neural Networks on Dari Speech

Robust Speech Recognition With Speech Enhanced Deep Neural Networks

Analysis of influencing features with spectral feature extraction and multi-class classification using deep neural network for speech recognition system

A robust accent classification system based on variational mode decomposition

Deep learning restores speech intelligibility in multi-talker interference for cochlear implant users

Automated Dysarthria Severity Classification: A Study on Acoustic Features and Deep Learning Techniques

DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning

State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition

Speech recognition method based on DNN-LSTM combined with Wiener filtering algorithm

Efficient Automatic Speech Recognition from EEG Signals Using Optimal Deep Learning Approach

Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners

Development of High Accuracy Classifier for the Speaker Recognition System

Efficient Feature-Aware Hybrid Model of Deep Learning Architectures for Speech Emotion Recognition

Dementia Detection from Speech Using Machine Learning and Deep Learning Architectures

Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition