Addressing Cold Start Problem for End-to-end Automatic Speech Scoring

Jungbae Park,Seungtaek Choi

2023-06-26

Abstract:Integrating automatic speech scoring/assessment systems has become a critical aspect of second-language speaking education. With self-supervised learning advancements, end-to-end speech scoring approaches have exhibited promising results. However, this study highlights the significant decrease in the performance of speech scoring systems in new question contexts, thereby identifying this as a cold start problem in terms of items. With the finding of cold-start phenomena, this paper seeks to alleviate the problem by following methods: 1) prompt embeddings, 2) question context embeddings using BERT or CLIP models, and 3) choice of the pretrained acoustic model. Experiments are conducted on TOEIC speaking test datasets collected from English-as-a-second-language (ESL) learners rated by professional TOEIC speaking evaluators. The results demonstrate that the proposed framework not only exhibits robustness in a cold-start environment but also outperforms the baselines for known content.

Computation and Language,Sound,Audio and Speech Processing

What problem does this paper attempt to address?

The paper primarily aims to address the cold start problem in Automatic Speech Scoring (ASA) systems. Specifically: 1. **Research Background**: With the development of globalization and online education, automatic speech evaluation systems are increasingly used among second language (ESL) learners. However, the performance of existing systems significantly declines in new problem contexts, which is known as the cold start problem. 2. **Problem Definition**: When new questions or content are added to the speech scoring system, the system cannot effectively score them, leading to a performance drop. This issue is particularly prominent in scenarios like the TOEIC speaking test, where each question has specific contextual requirements. 3. **Solution**: - A scoring strategy is proposed to validate the system's performance on unknown content. - Various methods are introduced to mitigate the cold start problem, including: - **Prompt Embeddings**: Used to understand the specific requirements of each question. - **Question Context Embeddings**: Utilizing BERT or CLIP models to extract text or image features. - **Selection of Pre-trained Acoustic Models**: Experiments show that using pre-trained models with language understanding capabilities (such as Whisper) can significantly improve performance in cold start environments. 4. **Experimental Results**: Experiments on the TOEIC speaking test dataset reveal that the proposed framework not only performs well in cold start environments but also outperforms baseline methods on known content. In summary, this paper aims to enhance the performance of automatic speech scoring systems when facing new questions by introducing the aforementioned methods, thereby solving the cold start problem.

Addressing Cold Start Problem for End-to-end Automatic Speech Scoring

Quality-aware Aggregated Conformal Prediction for Silent Speech Recognition

An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

Towards automatic assessment of spontaneous spoken English

Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning

Fast and Robust Unsupervised Contextual Biasing for Speech Recognition

SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations

Diffusion-Inspired Cold Start with Sufficient Prior in Computerized Adaptive Testing

An ASR-free Fluency Scoring Approach with Self-Supervised Learning

Automated Scoring for Reading Comprehension via In-context BERT Tuning

Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring

Automatic spoken English test for Chinese learners

A Deep Learning-Based Time-Domain Approach for Non-Intrusive Speech Quality Assessment.

Contrastive Learning for Cold-Start Recommendation

Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information

Hybrid Approach to Automated Essay Scoring: Integrating Deep Learning Embeddings with Handcrafted Linguistic Features for Improved Accuracy

SpeechBERT: An Audio-and-text Jointly Learned Language Model for End-to-end Spoken Question Answering

Adapting an ASR Foundation Model for Spoken Language Assessment

SCOREQ: Speech Quality Assessment with Contrastive Regression