Residual Convolutional Neural Network-Based Dysarthric Speech Recognition

Kumar, Niraj
DOI: https://doi.org/10.1007/s13369-024-08919-5
IF: 2.807
2024-03-28
Arabian Journal for Science and Engineering
Abstract:People with dysarthric speech face problems communicating with others and voice-based smart devices. This paper presents the development of a spatial residual convolutional neural network (RCNN)-based dysarthric speech recognition (DSR) system to improve communication for individuals with dysarthric speech. The RCNN model is simplified to an optimal number of layers. The system utilizes a speaker-adaptive approach, incorporating transfer learning to leverage knowledge learned from healthy individuals and a new data augmentation technique to address voice hoarseness in patients. The dysarthric speech is preprocessed using a novel voice cropping technique based on erosion and dilation methods to eliminate unnecessary pauses and hiccups in the time domain. The isolated word recognition accuracy improved by nearly 8.16% for patients with very low intelligibility and 4.74% for patients with low intelligibility speech compared to previously reported results. The proposed DSR system gives the lowest word error rate of 24.09% on the UASpeech dysarthric speech datasets of 15 dysarthric speakers.
multidisciplinary sciences
What problem does this paper attempt to address?