Abstract:Educational technology innovations leveraging large language models (LLMs) have shown the potential to automate the laborious process of generating and analysing textual content. While various innovations have been developed to automate a range of educational tasks (e.g., question generation, feedback provision, and essay grading), there are concerns regarding the practicality and ethicality of these innovations. Such concerns may hinder future research and the adoption of LLMs-based innovations in authentic educational contexts. To address this, we conducted a systematic scoping review of 118 peer-reviewed papers published since 2017 to pinpoint the current state of research on using LLMs to automate and support educational tasks. The findings revealed 53 use cases for LLMs in automating education tasks, categorised into nine main categories: profiling/labelling, detection, grading, teaching support, prediction, knowledge representation, feedback, content generation, and recommendation. Additionally, we also identified several practical and ethical challenges, including low technological readiness, lack of replicability and transparency, and insufficient privacy and beneficence considerations. The findings were summarised into three recommendations for future studies, including updating existing innovations with state-of-the-art models (e.g., GPT-3/4), embracing the initiative of open-sourcing models/systems, and adopting a human-centred approach throughout the developmental process. As the intersection of AI and education is continuously evolving, the findings of this study can serve as an essential reference point for researchers, allowing them to leverage the strengths, learn from the limitations, and uncover potential research opportunities enabled by ChatGPT and other generative AI models.

Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights

Leveraging Large Language Models for Multiple Choice Question Answering

Generating AI Literacy MCQs: A Multi-Agent LLM Approach

Automated Educational Question Generation at Different Bloom's Skill Levels using Large Language Models: Strategies and Evaluation

Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT

Math Multiple Choice Question Generation via Human-Large Language Model Collaboration

A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education

The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models

Comparison of Large Language Models for Generating Contextually Relevant Questions

Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education

Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students

Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications

Practical and Ethical Challenges of Large Language Models in Education: A Systematic Scoping Review

Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models

Towards AI-Assisted Multiple Choice Question Generation and Quality Evaluation at Scale: Aligning with Bloom’s Taxonomy

Can multiple-choice questions really be useful in detecting the abilities of LLMs?

AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs

Comparison of Large Language Models in Generating Machine Learning Curricula in High Schools

Leveraging GenAI for an Intelligent Tutoring System for R: A Quantitative Evaluation of Large Language Models

Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education

Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions