Abstract:Dysarthria, a speech disorder often caused by neurological damage, compromises the control of vocal muscles in patients, making their speech unclear and communication troublesome. Recently, voice-driven methods have been proposed to improve the speech intelligibility of patients with dysarthria. However, most methods require a significant representation of both the patient's and target speaker's corpus, which is problematic. This study aims to propose a data augmentation-based voice conversion (VC) system to reduce the recording burden on the speaker. We propose dysarthria voice conversion 3.1 (DVC 3.1) based on a data augmentation approach, including text-to-speech and StarGAN-VC architecture, to synthesize a large target and patient-like corpus to lower the burden of recording. An objective evaluation metric of the Google automatic speech recognition (Google ASR) system and a listening test were used to demonstrate the speech intelligibility benefits of DVC 3.1 under free-talk conditions. The DVC system without data augmentation (DVC 3.0) was used for comparison. Subjective and objective evaluation based on the experimental results indicated that the proposed DVC 3.1 system enhanced the Google ASR of two dysarthria patients by approximately [62.4%, 43.3%] and [55.9%, 57.3%] compared to unprocessed dysarthria speech and the DVC 3.0 system, respectively. Further, the proposed DVC 3.1 increased the speech intelligibility of two dysarthria patients by approximately [54.2%, 22.3%] and [63.4%, 70.1%] compared to unprocessed dysarthria speech and the DVC 3.0 system, respectively. The proposed DVC 3.1 system offers significant potential to improve the speech intelligibility performance of patients with dysarthria and enhance verbal communication quality.

VoiceFixer: Toward General Speech Restoration With Neural Vocoder

VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration

AudioVSR: Enhancing Video Speech Recognition with Audio Data

MaskSR: Masked Language Model for Full-band Speech Restoration

Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction

VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion

Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules

Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

Improving Model Stability and Training Efficiency in Fast, High Quality Expressive Voice Conversion System

Restoring degraded speech via a modified diffusion model

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

High-Resolution Speech Restoration with Latent Diffusion Model

Improving the Efficiency of Dysarthria Voice Conversion System Based on Data Augmentation

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration

Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for Full-band Speech Restoration with Improved Intelligibility

Non-Parallel Voice Conversion with Autoregressive Conversion Model and Duration Adjustment

Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech

Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

Preserving background sound in noise-robust voice conversion via multi-task learning

Speech Reconstruction from Silent Lip and Tongue Articulation by Diffusion Models and Text-Guided Pseudo Target Generation