Abstract:Educational and Psychological Measurement, Ahead of Print. In educational assessment, cut scores are often defined through standard setting by a group of subject matter experts. This study aims to investigate the impact of several factors on classification accuracy using the receiver operating characteristic (ROC) analysis to provide statistical and theoretical evidence when the cut score needs to be refined. Factors examined in the study include the sample distribution relative to the cut score, prevalence of the positive event, and cost ratio. Forty item responses were simulated for examinees of four sample distributions. In addition, the prevalence and cost ratio between false negatives and false positives were manipulated to examine their impacts on classification accuracy. The optimal cut score is identified using the Youden Index J. The results showed that the optimal cut score identified by the evaluation criterion tended to pull the cut score closer to the mode of the proficiency distribution. In addition, depending on the prevalence of the positive event and cost ratio, the optimal cut score shifts accordingly. With the item parameters used to simulate the data and the simulated sample distributions, it was found that when passing the exam is a low-prevalence event in the population, increasing the cut score operationally improves the classification; when passing the exam is a high-prevalence event, then cut score should be reduced to achieve optimality. As the cost ratio increases, the optimal cut score suggested by the evaluation criterion decreases. In three out of the four sample distributions examined in this study, increasing the cut score enhanced the classification, irrespective of the cost ratio when the prevalence in the population is 50%. This study provides statistical evidence when the cut score needs to be refined for policy reasons.

Assessing Item Fit Using Expected Score Curve Under Restricted Recalibration

Applying Unidimensional and Multidimensional Item Response Theory Models in Testlet-Based Reading Assessment

A two‐step item bank calibration strategy based on 1‐bit matrix completion for small‐scale computerized adaptive testing

Practical Significance of Item Misfit in Educational Assessments

New Robust Scale Transformation Methods in the Presence of Outlying Common Items.

Detecting uniform differential item functioning for continuous response computerized adaptive testing

The Impact of Item Preknowledge on Scaling and Equating: Item Response Theory True and Observed Score Equating Methods

Modeling Item-Level Heterogeneous Treatment Effects With the Explanatory Item Response Model: Leveraging Large-Scale Online Assessments to Pinpoint the Impact of Educational Interventions

Online Item Calibration for Q-Matrix in CD-CAT.

Methods for online calibration of Q-matrix and item parameters for polytomous responses in cognitive diagnostic computerized adaptive testing

Evaluating Robust Scale Transformation Methods with Multiple Outlying Common Items under IRT True Score Equating.

Efficiency of computerized adaptive testing with a cognitively designed item bank

Improved Scoring of the Center for Epidemiologic Studies Depression Scale - Revised: An Item Response Theory Analysis

Chemodiversity of Ursane‐ and Oleanane‐Type Triterpenes in Amazonian Burseraceae Oleoresins

Using ROC Analysis to Refine Cut Scores Following a Standard Setting Process

Nonlinear Sequential Designs for Logistic Item Response Theory Models with Applications to Computerized Adaptive Tests

Redefining Item Response Models for Small Samples

Flexible Bayesian modelling in dichotomous item response theory using mixtures of skewed item curves

Improvement and application of back random response detection: Based on cumulative sum and change point analysis

Item Quality Control in Educational Testing: Change Point Model, Compound Risk, and Sequential Detection

Using Bayesian item response theory for multicohort repeated measure design to estimate individual latent change scores.