An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

Tien-Hong Lo,Fu-An Chao,Tzu-I Wu,Yao-Ting Sung,Berlin Chen
2024-04-12
Abstract:Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar performance compared to traditional methods. However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distribution of learner proficiency levels and non-uniform score intervals between different CEFR proficiency levels. To address these challenges, we explore the use of two novel modeling strategies: metric-based classification and loss reweighting, leveraging distinct SSL-based embedding features. Extensive experimental results on the ICNALE benchmark dataset suggest that our approach can outperform existing strong baselines by a sizable margin, achieving a significant improvement of more than 10% in CEFR prediction accuracy.
Sound,Artificial Intelligence,Audio and Speech Processing
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the three main data - related challenges in Automated Speaking Assessment (ASA): 1. **Limited labeled data**: Existing ASA systems lack sufficient labeled data, which restricts the training and generalization ability of the model. 2. **Uneven distribution of learner proficiency**: The data of learners at different CEFR (Common European Framework of Reference for Languages) levels are extremely unevenly distributed, resulting in poor performance of the model when dealing with rare categories. 3. **Uneven scoring intervals between different CEFR levels**: For example, the gap between B2 and B1 is not equal to the gap between B1 and A2. This non - uniformity makes it difficult for traditional regression methods to handle effectively. To solve these problems, the author proposes two novel modeling strategies: - **Metric - based Classification**: By introducing Prototypical Networks and using different similarity functions (such as cosine similarity and squared Euclidean distance), the data imbalance problem is alleviated, and the non - uniform scoring intervals between different CEFR levels are effectively handled. - **Loss Re - weighting**: The loss function is re - weighted according to the frequency distribution of CEFR levels and its reciprocal to increase the model's attention to rare categories. The experimental results show that these strategies significantly improve the accuracy of CEFR prediction on the ICNALE benchmark dataset. Compared with the existing strong baseline models, the accuracy is improved by more than 10%. Specifically, the W2V - PT(SED)+LW model in the best configuration improves the accuracy from 77.88% to 92.63%. In addition, the paper also explores the impact of different initialization methods on model performance, and further verifies the effectiveness and robustness of the proposed methods through the analysis of confusion matrices, the classification performance of learners with different native languages, and embedding visualization.