Abstract:The recent release of very large language models such as PaLM and GPT-4 has made an unprecedented impact in the popular media and public consciousness, giving rise to a mixture of excitement and fear as to their capabilities and potential uses, and shining a light on natural language processing research which had not previously received so much attention. The developments offer great promise for education technology, and in this paper we look specifically at the potential for incorporating large language models in AI-driven language teaching and assessment systems. We consider several research areas and also discuss the risks and ethical considerations surrounding generative AI in education technology for language learners. Overall we find that larger language models offer improvements over previous models in text generation, opening up routes toward content generation which had not previously been plausible. For text generation they must be prompted carefully and their outputs may need to be reshaped before they are ready for use. For automated grading and grammatical error correction, tasks whose progress is checked on well-known benchmarks, early investigations indicate that large language models on their own do not improve on state-of-the-art results according to standard evaluation metrics. For grading it appears that linguistic features established in the literature should still be used for best performance, and for error correction it may be that the models can offer alternative feedback styles which are not measured sensitively with existing methods. In all cases, there is work to be done to experiment with the inclusion of large language models in education technology for language learners, in order to properly understand and report on their capacities and limitations, and to ensure that foreseeable risks such as misinformation and harmful bias are mitigated.

Evaluating large language models in analysing classroom dialogue

Evaluating Large Language Models in Analysing Classroom Dialogue

Analyzing Large Language Models for Classroom Discussion Assessment

A Large Language Model Approach to Educational Survey Feedback Analysis

Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course

LLM4DS: Evaluating Large Language Models for Data Science Code Generation

Practical and Ethical Challenges of Large Language Models in Education: A Systematic Scoping Review

A Closer Look into Using Large Language Models for Automatic Evaluation

Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education

Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis

Spoken Language Intelligence of Large Language Models for Language Learning

From Voices to Validity: Leveraging Large Language Models (LLMs) for Textual Analysis of Policy Stakeholder Interviews

Exploring the Dialogue Comprehension Ability of Large Language Models

An Evaluation of Large Language Models in Bioinformatics Research

On the application of Large Language Models for language teaching and assessment technology

Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts

LLMEval: A Preliminary Study on How to Evaluate Large Language Models

An Examination of the Use of Large Language Models to Aid Analysis of Textual Data

From Text to Insight: Leveraging Large Language Models for Performance Evaluation in Management

Large Language Models as Data Preprocessors