ClickTree: A Tree-based Method for Predicting Math Students' Performance Based on Clickstream Data

Narjes Rohani,Behnam Rohani,Areti Manataki
2024-03-02
Abstract:The prediction of student performance and the analysis of students' learning behavior play an important role in enhancing online courses. By analysing a massive amount of clickstream data that captures student behavior, educators can gain valuable insights into the factors that influence academic outcomes and identify areas of improvement in courses. In this study, we developed ClickTree, a tree-based methodology, to predict student performance in mathematical assignments based on students' clickstream data. We extracted a set of features, including problem-level, assignment-level and student-level features, from the extensive clickstream data and trained a CatBoost tree to predict whether a student successfully answers a problem in an assignment. The developed method achieved an AUC of 0.78844 in the Educational Data Mining Cup 2023 and ranked second in the competition. Furthermore, our results indicate that students encounter more difficulties in the problem types that they must select a subset of answers from a given set as well as problem subjects of Algebra II. Additionally, students who performed well in answering end-unit assignment problems engaged more with in-unit assignments and answered more problems correctly, while those who struggled had higher tutoring request rate. The proposed method can be utilized to improve students' learning experiences, and the above insights can be integrated into mathematical courses to enhance students' learning outcomes.
Computers and Society,Human-Computer Interaction,Machine Learning,Applications
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to address the issue of predicting student performance in mathematics assignments. Specifically, the researchers developed a tree-based method called ClickTree, which utilizes clickstream data to predict student performance in math assignments. By analyzing a large amount of clickstream data, educators can gain insights into the factors affecting student performance and identify areas in the curriculum that need improvement. ### Research Background In recent years, a large amount of log data on student interactions has been collected from online courses. This data provides researchers with valuable information to analyze student behavior and its impact on academic performance. By analyzing clickstream data, educators can gain a deeper understanding of students' learning habits, navigation patterns, and engagement levels. This knowledge helps in timely intervention and support for students who may be struggling or not actively participating. Additionally, clickstream data analysis can help educators identify effective learning patterns and resources, thereby influencing student performance. ### Research Methods 1. **Data Description**: - The dataset comes from the Educational Data Mining Cup 2023 (EDMcup 2023) and includes student clickstream data from the ASSISTments online learning platform. - The dataset includes information about courses, assignments, questions, and tutoring. - The dataset records 56,577 unit assignments and 57,361 questions from 36,296 students. 2. **Feature Extraction**: - Various types of features were extracted from the clickstream data, including question-level, assignment-level, and student-level features. - Features include counts of various operations (such as starting an assignment, completing an assignment, etc.), average operation counts, weighted average operation counts, and more. 3. **Performance Predictor**: - A CatBoost classifier was used to predict student scores in unit-end assignments. - CatBoost is an efficient gradient boosting method that can handle categorical features without suffering from the curse of dimensionality and target leakage issues. ### Research Results 1. **Which types of questions and subjects are more difficult for students?** - The average scores for "Exact Match (case insensitive)", "Select All That Apply", and "Ordering" question types were lower, indicating that these types of questions are more difficult for students. 2. **What are the different behavior patterns between successful and struggling students?** - Successful students showed higher engagement in unit assignments and answered more questions correctly, while struggling students had a higher rate of seeking help. 3. **Can AI use clickstream data to accurately predict student scores in unit-end assignments?** - The ClickTree method achieved an AUC score of approximately 79% in the EDMcup 2023 competition, ranking second, demonstrating its potential in predicting student performance in math courses. 4. **What are the most important features for predicting student scores?** - Feature importance analysis identified the most important features for predicting student scores. ### Conclusion The study successfully predicted student performance in math assignments by developing the ClickTree method using clickstream data. The research results not only help improve students' online learning experience but also provide valuable insights for teachers to design more effective teaching methods and support strategies.