Integrating behavior analysis with machine learning to predict online learning performance: A scientometric review and empirical study

Jin Yuan,Xuelan Qiu,Jinran Wu,Jiesi Guo,Weide Li,You-Gan Wang
2024-03-28
Abstract:The interest in predicting online learning performance using ML algorithms has been steadily increasing. We first conducted a scientometric analysis to provide a systematic review of research in this area. The findings show that most existing studies apply the ML methods without considering learning behavior patterns, which may compromise the prediction accuracy and precision of the ML methods. This study proposes an integration framework that blends learning behavior analysis with ML algorithms to enhance the prediction accuracy of students' online learning performance. Specifically, the framework identifies distinct learning patterns among students by employing clustering analysis and implements various ML algorithms to predict performance within each pattern. For demonstration, the integration framework is applied to a real dataset from edX and distinguishes two learning patterns, as in, low autonomy students and motivated students. The results show that the framework yields nearly perfect prediction performance for autonomous students and satisfactory performance for motivated students. Additionally, this study compares the prediction performance of the integration framework to that of directly applying ML methods without learning behavior analysis using comprehensive evaluation metrics. The results consistently demonstrate the superiority of the integration framework over the direct approach, particularly when integrated with the best-performing XGBoosting method. Moreover, the framework significantly improves prediction accuracy for the motivated students and for the worst-performing random forest method. This study also evaluates the importance of various learning behaviors within each pattern using LightGBM with SHAP values. The implications of the integration framework and the results for online education practice and future research are discussed.
Computers and Society,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address two main issues in predicting online learning performance: 1. **Deficiencies in Existing Research Methods**: Most existing research does not consider students' learning behavior patterns when applying machine learning (ML) methods. This can lead to decreased accuracy and precision in predictions. For example, many studies directly apply ML algorithms to learning datasets without analyzing students' behavioral characteristics. 2. **Improving Prediction Performance**: To enhance the accuracy of online learning performance predictions, this paper proposes a framework that combines learning behavior analysis with ML algorithms. Specifically, the framework identifies different learning patterns through cluster analysis and applies various ML algorithms within each pattern for prediction. ### Solution To address the above issues, the paper proposes an integrated framework that includes the following steps: 1. **Cluster Analysis**: Use the K-means algorithm to perform cluster analysis on online learning behaviors to identify different learning patterns or categories. 2. **ML Algorithm Application**: Apply various ML algorithms (such as logistic regression, decision trees, random forests, K-nearest neighbors, multilayer perceptron, support vector classifier, and XGBoost) within each identified learning pattern to predict students' online learning performance. 3. **Performance Evaluation**: Compare the prediction performance of the integrated framework with the direct application of ML methods (without considering learning behavior analysis) using comprehensive evaluation metrics (such as accuracy, precision, recall, etc.). ### Experimental Results - **Low Autonomy Students**: The framework's prediction performance for low autonomy students is nearly perfect. - **Active Students**: The framework's prediction performance for active students is satisfactory. - **Overall Comparison**: Compared to the direct application of ML methods, the integrated framework shows significant advantages in prediction performance, especially when using the optimal XGBoost method. ### Conclusion This study not only improves the accuracy of online learning performance predictions but also enhances the understanding of students' learning behaviors, providing valuable insights for online education practices and future research.