Dawen Zhang,Thong Hoang,Shidong Pan,Yongquan Hu,Zhenchang Xing,Mark Staples,Xiwei Xu,Qinghua Lu,Aaron Quigley
Abstract:Language tests measure a person's ability to use a language in terms of listening, speaking, reading, or writing. Such tests play an integral role in academic, professional, and immigration domains, with entities such as educational institutions, professional accreditation bodies, and governments using them to assess candidate language proficiency. Recent advances in Artificial Intelligence (AI) and the discipline of Natural Language Processing have prompted language test providers to explore AI's potential applicability within language testing, leading to transformative activity patterns surrounding language instruction and learning. However, with concerns over AI's trustworthiness, it is imperative to understand the implications of integrating AI into language testing. This knowledge will enable stakeholders to make well-informed decisions, thus safeguarding community well-being and testing integrity. To understand the concerns and effects of AI usage in language tests, we conducted interviews and surveys with English test-takers. To the best of our knowledge, this is the first empirical study aimed at identifying the implications of AI adoption in language tests from a test-taker perspective. Our study reveals test-taker perceptions and behavioral patterns. Specifically, we identify that AI integration may enhance perceptions of fairness, consistency, and availability. Conversely, it might incite mistrust regarding reliability and interactivity aspects, subsequently influencing the behaviors and well-being of test-takers. These insights provide a better understanding of potential societal implications and assist stakeholders in making informed decisions concerning AI usage in language testing.
Computers and Society,Artificial Intelligence,Computation and Language,Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to understand the impact of the use of artificial intelligence (AI) in language testing and its impact on test - takers. Specifically, the research aims to explore the impacts in terms of trust, transparency, consistency or interpretability caused by the application of AI in language testing from the perspective of test - takers. These factors are crucial for the wide application of AI in language testing and the potential benefits it brings to test - takers. Through empirical analysis, the researchers hope to reveal the impact of the application of AI in English language testing on test - takers and provide valuable insights for researchers and developers to integrate AI models into their evaluation systems more effectively.
### Research Background and Related Work
1. **Language Testing**
- Language tests are used to evaluate an individual's ability to use a certain language in listening, speaking, reading or writing. These tests play an important role in academic, professional and immigration fields.
- Common language tests include TOEFL, IELTS, PTE and Duolingo English Test (DET), which are widely recognized and used globally.
2. **Automated Scoring Systems**
- Automated scoring systems use AI technology to evaluate candidates' performance, aiming to improve the accuracy, reliability and efficiency of scoring.
- For example, ETS's e - rater system is used to evaluate GRE essays, while PTE relies entirely on AI for scoring.
3. **Application of AI in Language Testing**
- Although the application of AI in language testing has achieved some success, there is relatively little research on its trust, transparency, consistency and interpretability.
- These factors are crucial for ensuring the fairness and reliability of AI systems, especially in scenarios involving high - risk decisions.
### Methodology
1. **Framework Overview**
- The research is divided into three stages: planning and preparation, interviews and online surveys.
- In the planning and preparation stage, questionnaires were designed, including demographic questions and open - ended questions, to collect the background and experience of test - takers.
- In the interview stage, in - depth exchanges with test - takers were carried out to obtain their views on the application of AI in language testing.
- In the online survey stage, data was collected through a wide range of samples to further verify the interview results.
2. **Interviews**
- The interviewees included test - takers from different countries and backgrounds who participated in different language tests such as TOEFL, IELTS, PTE and DET.
- The interview content covered the test - takers' views on the fairness, consistency and reliability of AI scoring systems.
3. **Online Surveys**
- Online surveys recruited participants through channels such as social media and language schools, and a total of 99 valid questionnaires were collected.
- The survey content included test - takers' background information, views on different language tests and attitudes towards the application of AI in language testing.
### Results
1. **Test - takers' Concerns**
- **Fairness and Consistency**: Most test - takers believe that AI scoring systems have improved the fairness and consistency of language tests. However, some test - takers pointed out that AI may be biased against certain accents or pronunciations, resulting in unfair scoring.
- **Reliability**: Some test - takers are worried about the reliability of AI scoring systems, especially for those who often get low scores.
- **Fairness of Human Scoring Systems**: In contrast, test - takers are more critical of the fairness of human scoring systems, believing that the emotions and preferences of human scorers may affect the fairness of scoring.
2. **Impact of AI in Language Testing**
- **Positive Impact**: Many test - takers believe that AI scoring systems have improved the fairness and consistency of tests and reduced human errors.
- **Negative Impact**: Some test - takers are worried about the reliability and interactivity of AI scoring systems, believing that this may affect the test - takers' mental state and performance.
### Conclusion
This research, through empirical analysis, reveals the impact brought by the application of AI in language testing from the perspective of test - takers. The research results provide valuable insights for developers and researchers, which are helpful for applying AI technology more effectively in language testing while ensuring its fairness, reliability and transparency.