Abstract:Commercial automatic speech recognition (ASR) systems underperform for d/Deaf and hard‐of‐hearing (d/Dhh) individuals, especially those with "low" and "medium" speech intelligibility classification, prelingual onset of hearing loss, and sign language as primary communication mode. There is a need for ASR systems ethically trained on heterogeneous d/Dhh speech data. Objective To evaluate the performance of commercial automatic speech recognition (ASR) systems on d/Deaf and hard‐of‐hearing (d/Dhh) speech. Methods A corpus containing 850 audio files of d/Dhh and normal hearing (NH) speech from the University of Memphis Speech Perception Assessment Laboratory was tested on four speech‐to‐text application program interfaces (APIs): Amazon Web Services, Microsoft Azure, Google Chirp, and OpenAI Whisper. We quantified the Word Error Rate (WER) of API transcriptions for 24 d/Dhh and nine NH participants and performed subgroup analysis by speech intelligibility classification (SIC), hearing loss (HL) onset, and primary communication mode. Results Mean WER averaged across APIs was 10 times higher for the d/Dhh group (52.6%) than the NH group (5.0%). APIs performed significantly worse for "low" and "medium" SIC (85.9% and 46.6% WER, respectively) as compared to "high" SIC group (9.5% WER, comparable to NH group). APIs performed significantly worse for speakers with prelingual HL relative to postlingual HL (80.5% and 37.1% WER, respectively). APIs performed significantly worse for speakers primarily communicating with sign language (70.2% WER) relative to speakers with both oral and sign language communication (51.5%) or oral communication only (19.7%). Conclusion Commercial ASR systems underperform for d/Dhh individuals, especially those with "low" and "medium" SIC, prelingual onset of HL, and sign language as primary communication mode. This contrasts with Big Tech companies' promises of accessibility, indicating the need for ASR systems ethically trained on heterogeneous d/Dhh speech data. Level of Evidence 3 Laryngoscope, 2024

Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets

On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech

Towards a Single ASR Model That Generalizes to Disordered Speech

PDAssess: A Privacy-preserving Free-speech Based Parkinson's Disease Daily Assessment System

An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing

Consistency Based Unsupervised Self-training For ASR Personalisation

Disordered Speech Recognition Considering Low Resources and Abnormal Articulation

Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition

Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility

Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Hypernetworks for Personalizing ASR to Atypical Speech

Personalized Speech Classification Scheme for the Smart Speaker Accessibility Improvement of the Speech-Impaired people

Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech

Speech Audiometry at Home: Automated Listening Tests via Smart Speakers With Normal-Hearing and Hearing-Impaired Listeners

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation

Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition

Personalized Speech Recognition for Children with Test-Time Adaptation

Towards Automatic Data Augmentation for Disordered Speech Recognition