A Comparative Analysis of Student Performance Predictions in Online Courses using Heterogeneous Knowledge Graphs

Thomas Trask,Dr. Nicholas Lytle,Michael Boyle,Dr. David Joyner,Dr. Ahmed Mubarak
2024-05-19
Abstract:As online courses become the norm in the higher-education landscape, investigations into student performance between students who take online vs on-campus versions of classes become necessary. While attention has been given to looking at differences in learning outcomes through comparisons of students' end performance, less attention has been given in comparing students' engagement patterns between different modalities. In this study, we analyze a heterogeneous knowledge graph consisting of students, course videos, formative assessments and their interactions to predict student performance via a Graph Convolutional Network (GCN). Using students' performance on the assessments, we attempt to determine a useful model for identifying at-risk students. We then compare the models generated between 5 on-campus and 2 fully-online MOOC-style instances of the same course. The model developed achieved a 70-90\% accuracy of predicting whether a student would pass a particular problem set based on content consumed, course instance, and modality.
Computers and Society,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the predictive difference in student performance between online courses and traditional on - campus courses. Specifically, by analyzing a Heterogeneous Knowledge Graph (HKG) which contains students, course videos, formative assessments and their interactions, the researchers use Graph Convolutional Network (GCN) to predict students' academic performance and identify students who may be at risk. The focus of the study is to compare the differences in student participation patterns and performance prediction models of the same course in different modes (i.e., online mode and on - campus mode). ### Research Background As online education has become the norm in higher education, it is necessary to study the differences in student performance between online courses and traditional on - campus courses. Although previous studies have focused on the differences in students' final grades, relatively few studies have been conducted on students' participation patterns in different learning modes. In this paper, by constructing a heterogeneous knowledge graph containing students, course videos, formative assessments and their interactions, Graph Convolutional Network (GCN) is used to predict students' performance and attempt to identify high - risk students who may need early intervention. ### Research Methods 1. **Data Sources**: The research data comes from the GTX1301 "Introduction to Python" course at Georgia Tech, which is offered in both online and on - campus versions. The data set includes click - stream data, student feature matrices, video/assessment feature matrices, course content edge matrices, student - content edge matrices and student - page edge matrices. 2. **Model Construction**: The researchers used PyTorch Geometric (PyG) to construct a Graph Convolutional Network (GCN), which contains two Sage Convolution layers and a ReLU activation function. After the input node data is processed by standard embedding and linear transformation, the dot product between user and page nodes is calculated to predict whether a student will pass a particular page. 3. **Model Training**: The GCN model is trained with 64 hidden layers and 4 output layers. The training data is divided into 80% training set, 10% validation set and 10% test set. During the training process, PyTorch's binary cross - entropy loss function and Adam optimizer are used. ### Research Results - **Prediction Accuracy**: The GCN model achieved AUC scores of 58% - 90% in predicting whether students will pass a particular set of questions. - **Modal Differences**: There are significant differences in the prediction models of online courses and on - campus courses. The AUC score of the on - campus course in 2021 reached 90%, while the AUC score in 2022 was 82%. The lower AUC score of the on - campus course in 2022 may be due to the lack of data in the fall semester of 2022. - **Repeatability and Transferability**: In the training of a single course instance, the AUC score of the GCN model fluctuates greatly. This may be due to the floor effect when dividing the data, especially in on - campus courses with a small number of users. In addition, the differences in data shapes between different course instances lead to poor transferability of the model between different courses. ### Conclusions and Limitations - **Conclusions**: The study extends the previous click - stream GCN model by adding student - assessment interactions, but mainly focuses on one course. Future research needs to further study different courses, topics and degree paths. - **Limitations**: There is limited understanding of the demographic characteristics of students participating in these courses. Future plans include developing more powerful graph network implementations, including more demographic information. In addition, a scalable method needs to be developed to infer the relationships between different course contents to improve the transferability of the model. ### Formula Display - **Dot Product Calculation**: \[ \text{Edge Prediction}=\text{dot}(u, p) \] where \(u\) is the user node and \(p\) is the page node. - **Loss Function**: \[ \text{Loss}=\text{BCEWithLogitsLoss}(\hat{y}, y) \] where \(\hat{y}\) is the model's predicted value and \(y\) is the true label. - **AUC Calculation**