CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues

Rena Gao,Jingxuan Wu,Carsten Roever,Xuetong Wu,Jing Wu,Long Lv,Jey Han Lau
2024-08-29
Abstract:We develop CNIMA (Chinese Non-Native Interactivity Measurement and Automation), a Chinese-as-a-second-language labelled dataset with 10K dialogues. We annotate CNIMA using an evaluation framework -- originally introduced for English-as-a-second-language dialogues -- that assesses micro-level features (e.g.\ backchannels) and macro-level interactivity labels (e.g.\ topic management) and test the framework's transferability from English to Chinese. We found the framework robust across languages and revealed universal and language-specific relationships between micro-level and macro-level features. Next, we propose an approach to automate the evaluation and find strong performance, creating a new tool for automated second language assessment. Our system can be adapted to other languages easily as it uses large language models and as such does not require large-scale annotated training data.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address several key issues in Second Language (SL) dialogue assessment. Specifically: 1. **Insufficient Dialogue Datasets**: Existing SL assessment methods mainly focus on written language proficiency, lacking datasets that specifically target the unique linguistic features of spoken dialogues, especially in open-domain conversations. 2. **Insufficient Interactivity Assessment**: Current assessment systems (such as TOEFL, PTE Academic, etc.) primarily focus on grammatical accuracy, pronunciation standardization, and lexical richness, while rarely emphasizing the assessment of interactivity in dialogues, such as topic management, social role performance, and the ways of initiating and ending conversations. 3. **Lack of Automated Assessment Tools**: Although some studies have proposed frameworks for assessing SL dialogue interactivity, there is a lack of automated assessment processes, with most predictive models relying on manually annotated micro-features. To address these issues, the authors developed a Chinese Non-Native Interactivity Measurement and Automation (CNIMA) dataset for non-native Chinese dialogues and proposed a fully automated assessment method based on this dataset to evaluate the interactivity of learners of Chinese as a second language in open-domain dialogues. Additionally, the study validated the applicability of previously proposed English SL dialogue assessment frameworks in the Chinese context and revealed the general and specific relationships between micro-features and macro-interactivity labels across different languages. Experiments showed that the system could reliably predict the overall quality score of dialogues and demonstrated strong cross-language adaptability.