Abstract:Differential item functioning (DIF) is an important issue in large scale standardized testing. DIF refers to the unexpected difference in item performances among groups of equally proficient examinees, usually classified by ethnicity or gender. Its presence could seriously affect the validity of inferences drawn from a test. Various statistical methods have been proposed to detect and estimate DIF. This dissertation addresses DIF analysis in the context of computerized adaptive testing (CAT), whose item selection algorithm adapts to the ability level of each individual examinee. In a CAT, a DIF item may be more consequential and more detrimental be cause fewer items are administered in a CAT than in a traditional paper-and-pencil test and because the remaining sequence of items presented to examinees depends in part on their responses to the DIF item. Consequently, an efficient, stable and flexible method to detect and estimate CAT DIF becomes necessary and increasingly important. We propose simultaneous implementations of online calibration and DIF testing. The idea is to perform online calibration of an item of interest separately in the focal and reference groups. Under any specific parametric IRT model, we can use the (online) estimated latent traits as covariates and fit a nonlinear regression model to each of the two groups. Because of the use of the estimated, not the true t, the regression fit has to adjust for the covariate “measurement errors”. It turns out that this situation fits nicely into the framework of nonlinear error-in-variable modelling, which has been extensively studied in statistical literature. We develop two bias-correction methods using asymptotic expansion and conditional score theory. After correcting the bias caused by measurement error, one can perform a significance test to detect DIF with the parameter estimates for different groups. This dissertation also discusses some general techniques to handle measurement error modelling with different IRT models, including the three-parameter normal ogive model and polytomous response models. Several methods of estimating DIF are studied as well. Large sample properties are established to justify the proposed methods. Extensive simulation studies show that the resulting methods perform well in terms of Type-I error rate control, accuracy in estimating DIF and power against both unidirectional and crossing DIF.

Detecting uniform differential item functioning for continuous response computerized adaptive testing

Statistical Detection and Estimation of Differential Item Functioning in Computerized Adaptive Testing

Detecting Differential Item Functioning Using Response Time

Effect of Differential Item Functioning on Computer Adaptive Testing Under Different Conditions

A Comparison of Differential Item Functioning Detection Methods in Cognitive Diagnostic Models

Examining Differential Item Functioning In A Computer-Based English Proficiency Test

Investigating Differential Item Functioning Across Interaction Variables in Listening Comprehension Assessment

Exploring the Evidence to Interpret Differential Item Functioning via Response Process Data

Efficiency of computerized adaptive testing with a cognitively designed item bank

Detecting DIF in Multidimensional Forced Choice Measures Using the Thurstonian Item Response Theory Model

Methods for online calibration of Q-matrix and item parameters for polytomous responses in cognitive diagnostic computerized adaptive testing

Applying Unidimensional and Multidimensional Item Response Theory Models in Testlet-Based Reading Assessment

Using Interpretable Machine Learning for Differential Item Functioning Detection in Psychometric Tests

Enhancing Precision in Predicting Magnitude of Differential Item Functioning: An M-DIF Pretrained Model Approach

A Monte Carlo Study of an Iterative Wald Test Procedure for DIF Analysis

Statistical inference for the penalized EM algorithm to test differential item functioning

DIF Statistical Inference Without Knowing Anchoring Items

Improving the Assessment of Differential Item Functioning in Large-Scale Programs With Dual-Scale Purification of Rasch Models: The PISA Example

DIF Analysis with Unknown Groups and Anchor Items

A Robust Computerized Adaptive Testing Approach in Educational Question Retrieval

Differential Item Functioning via Robust Scaling