Addressing Cold Start Problem for End-to-end Automatic Speech Scoring

Jungbae Park,Seungtaek Choi
2023-06-26
Abstract:Integrating automatic speech scoring/assessment systems has become a critical aspect of second-language speaking education. With self-supervised learning advancements, end-to-end speech scoring approaches have exhibited promising results. However, this study highlights the significant decrease in the performance of speech scoring systems in new question contexts, thereby identifying this as a cold start problem in terms of items. With the finding of cold-start phenomena, this paper seeks to alleviate the problem by following methods: 1) prompt embeddings, 2) question context embeddings using BERT or CLIP models, and 3) choice of the pretrained acoustic model. Experiments are conducted on TOEIC speaking test datasets collected from English-as-a-second-language (ESL) learners rated by professional TOEIC speaking evaluators. The results demonstrate that the proposed framework not only exhibits robustness in a cold-start environment but also outperforms the baselines for known content.
Computation and Language,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The paper primarily aims to address the cold start problem in Automatic Speech Scoring (ASA) systems. Specifically: 1. **Research Background**: With the development of globalization and online education, automatic speech evaluation systems are increasingly used among second language (ESL) learners. However, the performance of existing systems significantly declines in new problem contexts, which is known as the cold start problem. 2. **Problem Definition**: When new questions or content are added to the speech scoring system, the system cannot effectively score them, leading to a performance drop. This issue is particularly prominent in scenarios like the TOEIC speaking test, where each question has specific contextual requirements. 3. **Solution**: - A scoring strategy is proposed to validate the system's performance on unknown content. - Various methods are introduced to mitigate the cold start problem, including: - **Prompt Embeddings**: Used to understand the specific requirements of each question. - **Question Context Embeddings**: Utilizing BERT or CLIP models to extract text or image features. - **Selection of Pre-trained Acoustic Models**: Experiments show that using pre-trained models with language understanding capabilities (such as Whisper) can significantly improve performance in cold start environments. 4. **Experimental Results**: Experiments on the TOEIC speaking test dataset reveal that the proposed framework not only performs well in cold start environments but also outperforms baseline methods on known content. In summary, this paper aims to enhance the performance of automatic speech scoring systems when facing new questions by introducing the aforementioned methods, thereby solving the cold start problem.