Abstract:Researchers frequently use machine-learning methods in many fields. In the area of detecting fraud in testing, there have been relatively few studies that have used these methods to identify potential testing fraud. In this study, a technical review of a recently developed state-of-the-art algorithm, Extreme Gradient Boosting (XGBoost), is provided and the utility of XGBoost in detecting examinees with potential item preknowledge is investigated using a real data set that includes examinees who engaged in fraudulent testing behavior, such as illegally obtaining live test content before the exam. Four different XGBoost models were trained using different sets of input features based on (a) only dichotomous item responses, (b) only nominal item responses, (c) both dichotomous item responses and response times, and (d) both nominal item responses and response times. The predictive performance of each model was evaluated using the area under the receiving operating characteristic curve and several classification measures such as the false-positive rate, true-positive rate, and precision. For comparison purposes, the results from two person-fit statistics on the same data set were also provided. The results indicated that XGBoost successfully classified the honest test takers and fraudulent test takers with item preknowledge. Particularly, the classification performance of XGBoost was reasonably good when the response time information and item responses were both taken into account.

A Machine-Learning-Based Approach for Detecting Item Preknowledge in Computerized Adaptive Testing

Detecting Item Preknowledge Using a Predictive Checking Method

Multimodal Data Fusion to Detect Preknowledge Test-Taking Behavior Using Machine Learning

Detecting Item Preknowledge Using Revisits With Speed and Accuracy

Detecting Preknowledge Cheating via Innovative Measures: A Mixture Hierarchical Model for Jointly Modeling Item Responses, Response Times, and Visual Fixation Counts

Sequential Detection of Compromised Items Using Response Times in Computerized Adaptive Testing

Detection of Item Preknowledge Using Response Times

Detecting Aberrant Behavior and Item Preknowledge: A Comparison of Mixture Modeling Method and Residual Method

Detection of Item Preknowledge Using Likelihood Ratio Test and Score Test

Detecting Examinees With Item Preknowledge in Large-Scale Testing Using Extreme Gradient Boosting (XGBoost)

Robustness of Computer Adaptive Tests to the Presence of Item Preknowledge: A Simulation Study

A Mixture Response Model for Identifying Item Preknowledge

Assessing Preknowledge Cheating via Innovative Measures: A Multiple-Group Analysis of Jointly Modeling Item Responses, Response Times, and Visual Fixation Counts

Using Item Scores and Distractors to Detect Item Compromise and Preknowledge

Two New Models for Item Preknowledge

Comparing the Performance of Eight Item Preknowledge Detection Statistics

Monte Carlo detection of examinees with item preknowledge

A Robust Computerized Adaptive Testing Approach in Educational Question Retrieval

Are Exam Questions Known in Advance? Using Local Dependence to Detect Cheating

Concurrent Use of Response Time and Response Accuracy for Detecting Examinees with Item Preknowledge

Survey of Computerized Adaptive Testing: A Machine Learning Perspective