Advancing Voice Biometrics for Dysarthria Speakers Using Multitaper LFCC and Voice Conversion Data Augmentation

Shinimol Salim,Waquar Ahmad
DOI: https://doi.org/10.1109/tifs.2024.3484661
IF: 7.231
2024-11-08
IEEE Transactions on Information Forensics and Security
Abstract:Patients with dysarthria and physical impairments face challenges with traditional user interfaces. An Automatic Speaker Verification (ASV) system can enhance accessibility by replacing complex authentication methods and enabling voice biometrics in various applications for patients with dysarthria. This study focuses on enhancing accessibility of patients with dysarthria through an ASV system. In this study, a noval low variance Multitaper Linear Frequency Cepstral Coefficients (MTLFCC) feature is proposed. An ASV system for patients with dysarthria is implemented using the voice conversion data augmentation within a DNN framework. An extensive analysis is conducted to compare various multitaper techniques and taper weight choices using the Thomson multitaper method, specifically verifying patients with dysarthria as speakers. The impact of voice conversion through a cycle-consistent generative adversarial network (Cycle GAN) is also examined by modifying the acoustic attributes of control speech to make it perceptually similar to dysarthria speech and its implications for dysarthria ASV. Furthermore, the system performance is analyzed for different severity level of dysarthria to gain insight into how the selected multitaper parameters influence the outcomes. This study pioneers the use of MTLFCC features for ASV in the context of dysarthria, offering a novel approach to improve accessibility for this group.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?