A Large Language Model Approach to Educational Survey Feedback Analysis

Michael J. Parker,Caitlin Anderson,Claire Stone,YeaRim Oh
DOI: https://doi.org/10.1007/s40593-024-00414-0
2024-06-27
Abstract:This paper assesses the potential for the large language models (LLMs) GPT-4 and GPT-3.5 to aid in deriving insight from education feedback surveys. Exploration of LLM use cases in education has focused on teaching and learning, with less exploration of capabilities in education feedback analysis. Survey analysis in education involves goals such as finding gaps in curricula or evaluating teachers, often requiring time-consuming manual processing of textual responses. LLMs have the potential to provide a flexible means of achieving these goals without specialized machine learning models or fine-tuning. We demonstrate a versatile approach to such goals by treating them as sequences of natural language processing (NLP) tasks including classification (multi-label, multi-class, and binary), extraction, thematic analysis, and sentiment analysis, each performed by LLM. We apply these workflows to a real-world dataset of 2500 end-of-course survey comments from biomedical science courses, and evaluate a zero-shot approach (i.e., requiring no examples or labeled training data) across all tasks, reflecting education settings, where labeled data is often scarce. By applying effective prompting practices, we achieve human-level performance on multiple tasks with GPT-4, enabling workflows necessary to achieve typical goals. We also show the potential of inspecting LLMs' chain-of-thought (CoT) reasoning for providing insight that may foster confidence in practice. Moreover, this study features development of a versatile set of classification categories, suitable for various course types (online, hybrid, or in-person) and amenable to customization. Our results suggest that LLMs can be used to derive a range of insights from survey text.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges of qualitative data analysis in educational feedback surveys. Specifically, the authors explored the application potential of large language models (LLMs), especially GPT - 4 and GPT - 3.5, in the analysis of educational feedback surveys. Traditional methods of educational feedback analysis mainly rely on quantitative data, while qualitative feedback (such as student comments), although of higher value, is very time - consuming and labor - intensive in the analysis process. In addition, manual coding and analysis of a large number of open - ended survey responses or student feedback comments often fail to provide specific improvement suggestions and have difficulties in maintaining consistent coding of large - scale educational data sets. The paper points out that although automated methods (such as machine - learning models) have been used for qualitative data analysis in recent years, these methods usually require a large amount of technical resources, model fine - tuning of pre - labeled data, the use of independent models for natural - language - processing tasks, or the need for specialized software, which makes it difficult for most educators to widely adopt these methods. Therefore, the main question of this research is: Have large language models developed to the point where they can be effectively applied to various tasks in survey analysis? To answer this main question, the research proposes the following related research questions: - Research Question 1 (RQ1): Can large language models be used to perform multiple unstructured text analysis tasks of educational survey responses, including multi - label classification, multi - class classification, binary classification, extraction, inductive topic analysis, and sentiment analysis? - Research Question 2 (RQ2): Can the chain of thought of large language models (i.e., the intermediate steps of how they arrive at answers) be captured to provide a certain degree of transparency, thereby building confidence in practical use? Can examples of potential uses be demonstrated? - Research Question 3 (RQ3): In all tasks, a zero - sample approach (without providing manually - labeled examples) is adopted. This scenario simulates the actual situation in many educational environments. Can performance comparable to that of manual labeling be achieved? Through these questions, the authors aim to evaluate the feasibility and quality of large language models in the analysis of educational feedback surveys, especially whether they can achieve human - level performance without pre - labeled data.