Novel Feature-Based Difficulty Prediction Method for Mathematics Items Using XGBoost-Based SHAP Model

Xifan Yi,Jianing Sun,Xiaopeng Wu
DOI: https://doi.org/10.3390/math12101455
IF: 2.4
2024-05-09
Mathematics
Abstract:The level of difficulty of mathematical test items is a critical aspect for evaluating test quality and educational outcomes. Accurately predicting item difficulty during test creation is thus significantly important for producing effective test papers. This study used more than ten years of content and score data from China's Henan Provincial College Entrance Examination in Mathematics as an evaluation criterion for test difficulty, and all data were obtained from the Henan Provincial Department of Education. Based on the framework established by the National Center for Education Statistics (NCES) for test item assessment methodology, this paper proposes a new framework containing eight features considering the uniqueness of mathematics. Next, this paper proposes an XGBoost-based SHAP model for analyzing the difficulty of mathematics tests. By coupling the XGBoost method with the SHAP method, the model not only evaluates the difficulty of mathematics tests but also analyzes the contribution of specific features to item difficulty, thereby increasing transparency and mitigating the "black box" nature of machine learning models. The model has a high prediction accuracy of 0.99 for the training set and 0.806 for the test set. With the model, we found that parameter-level features and reasoning-level features are significant factors influencing the difficulty of subjective items in the exam. In addition, we divided senior secondary mathematics knowledge into nine units based on Chinese curriculum standards and found significant differences in the distribution of the eight features across these different knowledge units, which can help teachers place different emphasis on different units during the teaching process. In summary, our proposed approach significantly improves the accuracy of item difficulty prediction, which is crucial for intelligent educational applications such as knowledge tracking, automatic test item generation, and intelligent paper generation. These results provide tools that are better aligned with and responsive to students' learning needs, thus effectively informing educational practice.
mathematics
What problem does this paper attempt to address?
The paper aims to address the problem of predicting the difficulty of math exam questions. Specifically: 1. **Proposed a new feature extraction method**: Based on the National Center for Education Statistics (NCES) test item evaluation framework, the paper proposes a new framework containing 8 characteristics to assess the difficulty of math questions. These characteristics take into account the uniqueness of the mathematics discipline. 2. **Developed an XGBoost-based SHAP model**: To improve the prediction accuracy and interpretability of the model, the paper combines the XGBoost algorithm with the SHAP method to analyze the impact of each characteristic on the difficulty of the questions. This method not only predicts the difficulty of the questions but also reveals the contribution of each characteristic to the difficulty prediction, thereby enhancing the transparency of the model. 3. **Analyzed the distribution of characteristics in different knowledge units**: The paper further divides the subjective math questions of the Chinese college entrance examination into 9 knowledge units and studies the performance differences of these 8 characteristics in different knowledge units. This is instructive for teachers to focus on teaching according to the characteristics of different units during the teaching process. In summary, the main purpose of this study is to develop an efficient and interpretable model to predict the difficulty of math questions and to improve the test design process through feature importance analysis, thereby enhancing the effectiveness of intelligent education applications such as knowledge point tracking and automatic test generation.