Abstract:This paper assesses the potential for the large language models (LLMs) GPT-4 and GPT-3.5 to aid in deriving insight from education feedback surveys. Exploration of LLM use cases in education has focused on teaching and learning, with less exploration of capabilities in education feedback analysis. Survey analysis in education involves goals such as finding gaps in curricula or evaluating teachers, often requiring time-consuming manual processing of textual responses. LLMs have the potential to provide a flexible means of achieving these goals without specialized machine learning models or fine-tuning. We demonstrate a versatile approach to such goals by treating them as sequences of natural language processing (NLP) tasks including classification (multi-label, multi-class, and binary), extraction, thematic analysis, and sentiment analysis, each performed by LLM. We apply these workflows to a real-world dataset of 2500 end-of-course survey comments from biomedical science courses, and evaluate a zero-shot approach (i.e., requiring no examples or labeled training data) across all tasks, reflecting education settings, where labeled data is often scarce. By applying effective prompting practices, we achieve human-level performance on multiple tasks with GPT-4, enabling workflows necessary to achieve typical goals. We also show the potential of inspecting LLMs' chain-of-thought (CoT) reasoning for providing insight that may foster confidence in practice. Moreover, this study features development of a versatile set of classification categories, suitable for various course types (online, hybrid, or in-person) and amenable to customization. Our results suggest that LLMs can be used to derive a range of insights from survey text.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges of qualitative data analysis in educational feedback surveys. Specifically, the authors explored the application potential of large language models (LLMs), especially GPT - 4 and GPT - 3.5, in the analysis of educational feedback surveys. Traditional methods of educational feedback analysis mainly rely on quantitative data, while qualitative feedback (such as student comments), although of higher value, is very time - consuming and labor - intensive in the analysis process. In addition, manual coding and analysis of a large number of open - ended survey responses or student feedback comments often fail to provide specific improvement suggestions and have difficulties in maintaining consistent coding of large - scale educational data sets. The paper points out that although automated methods (such as machine - learning models) have been used for qualitative data analysis in recent years, these methods usually require a large amount of technical resources, model fine - tuning of pre - labeled data, the use of independent models for natural - language - processing tasks, or the need for specialized software, which makes it difficult for most educators to widely adopt these methods. Therefore, the main question of this research is: Have large language models developed to the point where they can be effectively applied to various tasks in survey analysis? To answer this main question, the research proposes the following related research questions: - Research Question 1 (RQ1): Can large language models be used to perform multiple unstructured text analysis tasks of educational survey responses, including multi - label classification, multi - class classification, binary classification, extraction, inductive topic analysis, and sentiment analysis? - Research Question 2 (RQ2): Can the chain of thought of large language models (i.e., the intermediate steps of how they arrive at answers) be captured to provide a certain degree of transparency, thereby building confidence in practical use? Can examples of potential uses be demonstrated? - Research Question 3 (RQ3): In all tasks, a zero - sample approach (without providing manually - labeled examples) is adopted. This scenario simulates the actual situation in many educational environments. Can performance comparable to that of manual labeling be achieved? Through these questions, the authors aim to evaluate the feasibility and quality of large language models in the analysis of educational feedback surveys, especially whether they can achieve human - level performance without pre - labeled data.

A Large Language Model Approach to Educational Survey Feedback Analysis

Large Language Models for Education: A Survey and Outlook

Evaluating Large Language Models in Analysing Classroom Dialogue

Large Language Models for Education: A Survey

Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course

Large Language Models in Computer Science Education: A Systematic Literature Review

Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education

Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts

Practical and Ethical Challenges of Large Language Models in Education: A Systematic Scoping Review

Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges

A large language model-assisted education tool to provide feedback on open-ended responses

The Role of Large Language Models in Medical Education: Applications and Implications

Exploring the Responses of Large Language Models to Beginner Programmers' Help Requests

Harnessing the potential of large language models in medical education: promise and pitfalls

On the application of Large Language Models for language teaching and assessment technology

LARGE LANGUAGE MODEL-BASED ARTIFICIAL INTELLIGENCE IN THE LANGUAGE CLASSROOM: PRACTICAL IDEAS FOR TEACHING

Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students

An Exploration of Higher Education Course Evaluation by Large Language Models

Evaluating Language Models for Generating and Judging Programming Feedback

Exploring the Potential of Large Language Models to Generate Formative Programming Feedback