Abstract:Differential item functioning (DIF) is an important issue in large scale standardized testing. DIF refers to the unexpected difference in item performances among groups of equally proficient examinees, usually classified by ethnicity or gender. Its presence could seriously affect the validity of inferences drawn from a test. Various statistical methods have been proposed to detect and estimate DIF. This dissertation addresses DIF analysis in the context of computerized adaptive testing (CAT), whose item selection algorithm adapts to the ability level of each individual examinee. In a CAT, a DIF item may be more consequential and more detrimental be cause fewer items are administered in a CAT than in a traditional paper-and-pencil test and because the remaining sequence of items presented to examinees depends in part on their responses to the DIF item. Consequently, an efficient, stable and flexible method to detect and estimate CAT DIF becomes necessary and increasingly important. We propose simultaneous implementations of online calibration and DIF testing. The idea is to perform online calibration of an item of interest separately in the focal and reference groups. Under any specific parametric IRT model, we can use the (online) estimated latent traits as covariates and fit a nonlinear regression model to each of the two groups. Because of the use of the estimated, not the true t, the regression fit has to adjust for the covariate “measurement errors”. It turns out that this situation fits nicely into the framework of nonlinear error-in-variable modelling, which has been extensively studied in statistical literature. We develop two bias-correction methods using asymptotic expansion and conditional score theory. After correcting the bias caused by measurement error, one can perform a significance test to detect DIF with the parameter estimates for different groups. This dissertation also discusses some general techniques to handle measurement error modelling with different IRT models, including the three-parameter normal ogive model and polytomous response models. Several methods of estimating DIF are studied as well. Large sample properties are established to justify the proposed methods. Extensive simulation studies show that the resulting methods perform well in terms of Type-I error rate control, accuracy in estimating DIF and power against both unidirectional and crossing DIF.

A Review of Some of the History of Factorial Invariance and Differential Item Functioning

Studying Factorial Invariance With Nominal Items: A Note on a Latent Variable Modeling Procedure

Factorial Invariance of the Questionnaire about Interpersonal Difficulties for Adolescents Across Spanish and Chinese Adolescent Samples

Exploring the Evidence to Interpret Differential Item Functioning via Response Process Data

Differential Item Functioning via Robust Scaling

Investigating Differential Item Functioning Across Interaction Variables in Listening Comprehension Assessment

DIF Analysis with Unknown Groups and Anchor Items

DIF Statistical Inference Without Knowing Anchoring Items

A Monte Carlo Study of an Iterative Wald Test Procedure for DIF Analysis

Detecting DIF in Multidimensional Forced Choice Measures Using the Thurstonian Item Response Theory Model

Statistical Detection and Estimation of Differential Item Functioning in Computerized Adaptive Testing

A Comparison of Differential Item Functioning Detection Methods in Cognitive Diagnostic Models

Effect of Differential Item Functioning on Computer Adaptive Testing Under Different Conditions

Comparing Attitudes Across Groups: An IRT-Based Item-Fit Statistic for the Analysis of Measurement Invariance

Improving measurement-invariance assessments: correcting entrenched testing deficiencies

Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework

Measurement Invariance and Differential Item Functioning in Latent Class Analysis With Stepwise Multiple Indicator Multiple Cause Modeling

Fairness and Comparability in Achievement Motivation Items: A Differential Item Functioning Analysis

A Comparison of Confirmatory Factor Analysis and Network Models for Measurement Invariance Assessment When Indicator Residuals are Correlated

A Perspective on the Mathematical and Psychometric Aspects of Factor Indeterminacy

Detecting Differential Item Functioning Using Response Time