Abstract:Assessing the intelligibility of dysarthric speech, characterized by intricate speaking rhythms presents formidable challenges. Current techniques for objectively testing speech intelligibility are burdensome and subjective, particularly struggling with dysarthric spoken utterances. To tackle these hurdles, our method conducts an ablation analysis across speakers afflicted with speech impairment. We utilize a unified approach that incorporates both auditory and visual elements to improve the classification of dysarthric spoken utterances. In our quest to enhance spoken utterance recognition, we propose employing two distinctive extractive transformer-based approaches. Initially, we employ SepFormer to refine the speech signal, prioritizing the enhancement of signal clarity. Subsequently, we feed the improved audio samples into Swin transformer after converting them into log mel spectrograms. Additionally, we harness the power of the Swin transformer for visual classification, trained on a dataset of 14 million annotated images from ImageNet. The pre-trained scores from the Swin transformer are utilized as input for the deep bidirectional long short-term memory with gated recurrent unit (deep BiLSTM-GRU) model, facilitating the classification of spoken utterances. Our proposed deep BiLSTM-GRU model for spoken utterance classification yields impressive results on the EasyCall speech corpus, encompassing cognitive characteristics across spoken utterances ranging from 10 to 20, delivered by both healthy individuals and those with dysarthria. Notably, our results showcase an accuracy of 98.56% for 20 utterances in male speakers, 95.11% in female speakers, and 97.64% in combined male and female speakers. Across diverse scenarios, our approach consistently achieves remarkable accuracy, surpassing other contemporary methods, all without necessitating data augmentation.

Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

Coh-Metrix Model-Based Automatic Assessment of Interpreting Quality

Towards automatic assessment of spontaneous spoken English

Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

Automated speech scoring of dialogue response by Japanese learners of English as a foreign language

A Machine Learning Assessment System for Spoken English Based on Linear Predictive Coding

Mixtures of Deep Neural Experts for Automated Speech Scoring

EvalYaks: Instruction Tuning Datasets and LoRA Fine-tuned Models for Automated Scoring of CEFR B2 Speaking Assessment Transcripts

A Computer-Assisted Tool for Automatically Measuring Non-Native Japanese Oral Proficiency

Investigation of the effects of automatic scoring technology on human raters' performances in L2 speech proficiency assessment

Auto-scoring of Student Speech: Proprietary vs. Open-source Solutions

Automatic Scoring on English Passage Reading Quality

SpeechLMScore: Evaluating speech generation using speech language model

Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil

BERT-Based Automatic Scoring Model for Speech-Oriented Text Modality

Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring

Impact of ASR Performance on Free Speaking Language Assessment

Multi-Modal Multi-Scale Speech Expression Evaluation In Computer-Assisted Language Learning

A deep learning approach to dysarthric utterance classification with BiLSTM-GRU, speech cue filtering, and log mel spectrograms

ENHANCING SUBJECTIVE ANSWER EVALUATION THROUGH MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING