An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks
Jolad, Bhuvaneshwari
DOI: https://doi.org/10.1007/s10772-023-10019-y
2023-02-22
International Journal of Speech Technology
Abstract:Dysarthric speech is the noisy or source distortion speech. Reasonable speech enhancement is required to obtain higher communication quality for non-stationary noises. Owing to complexities in speech rate of dysarthric persons, understanding their speech is more critical and complex task. The generic recognition systems do not perform well in speech recognition. Hence, this paper proposes a Fractional Competitive Crow Search Algorithm-based Speech Enhancement Generative Adversarial Network (FCCSA-SEGAN) for enhancing the speech signal. Initially, at the pre-processing stage, the noise from the speech signal is removed using spectral subtraction method. Then, pre-processed signal is fed to speech enhancement, where signal quality is enhanced by the Speech Enhancement Generative Adversarial Network (SEGAN), which is trained by the developed FCCA. By the incorporation of Fractional Calculus (FC) and Competitive Crow Search Algorithm (CSSA), proposed FCCA is obtained, in which CSSA is hybridization of Crow Search Algorithm (CSA) and Competitive Swarm Optimizer (CSO). After that, the features, such as Multiple Kernel Weighted Mel Frequency Cepstral Coefficient (MKMFCC), Linear Prediction Cepstral Coefficient (LPCC), spectral flux, spectral crest, spectral centroid, and pitch chroma are extracted. Moreover, to increase the dimensionality of signal samples, noises are added to the original signal through data augmentation phase. Finally, using Competitive Crow Search Algorithm-based Hierarchical Attention Network (CCSA-based HAN), speech recognition process is done. In addition, the performance of the proposed method is evaluated using the UA speech database and the accuracy, sensitivity, and specificity of 0.930, 0.933, and 0.934 are obtained by the proposed method. By the proposed speech enhancement approach, higher Perceptual Evaluation of Speech Quality (PESQ) and lower Root Mean Square Error (RMSE) of 3.14, and 0.022 are attained.