Abstract:Differential item functioning (DIF) is an important issue in large scale standardized testing. DIF refers to the unexpected difference in item performances among groups of equally proficient examinees, usually classified by ethnicity or gender. Its presence could seriously affect the validity of inferences drawn from a test. Various statistical methods have been proposed to detect and estimate DIF. This dissertation addresses DIF analysis in the context of computerized adaptive testing (CAT), whose item selection algorithm adapts to the ability level of each individual examinee. In a CAT, a DIF item may be more consequential and more detrimental be cause fewer items are administered in a CAT than in a traditional paper-and-pencil test and because the remaining sequence of items presented to examinees depends in part on their responses to the DIF item. Consequently, an efficient, stable and flexible method to detect and estimate CAT DIF becomes necessary and increasingly important. We propose simultaneous implementations of online calibration and DIF testing. The idea is to perform online calibration of an item of interest separately in the focal and reference groups. Under any specific parametric IRT model, we can use the (online) estimated latent traits as covariates and fit a nonlinear regression model to each of the two groups. Because of the use of the estimated, not the true t, the regression fit has to adjust for the covariate “measurement errors”. It turns out that this situation fits nicely into the framework of nonlinear error-in-variable modelling, which has been extensively studied in statistical literature. We develop two bias-correction methods using asymptotic expansion and conditional score theory. After correcting the bias caused by measurement error, one can perform a significance test to detect DIF with the parameter estimates for different groups. This dissertation also discusses some general techniques to handle measurement error modelling with different IRT models, including the three-parameter normal ogive model and polytomous response models. Several methods of estimating DIF are studied as well. Large sample properties are established to justify the proposed methods. Extensive simulation studies show that the resulting methods perform well in terms of Type-I error rate control, accuracy in estimating DIF and power against both unidirectional and crossing DIF.

The Impact of Item Model Parameter Variations on Person Parameter Estimation in Computerized Adaptive Testing With Automatically Generated Items

Efficiency of computerized adaptive testing with a cognitively designed item bank

Applying Unidimensional and Multidimensional Item Response Theory Models in Testlet-Based Reading Assessment

Ability Assessment Based on CAT in Adaptive Learning System

Methods for online calibration of Q-matrix and item parameters for polytomous responses in cognitive diagnostic computerized adaptive testing

Detecting uniform differential item functioning for continuous response computerized adaptive testing

Measurement precision and user experience with adaptive versus non-adaptive psychometric tests

Modeling Rapid Guessing Behaviors in Computer-Based Testlet Items

Effects of the Quantity and Magnitude of Cross-Loading and Model Specification on MIRT Item Parameter Recovery

To Weight or Not to Weight? Balancing Influence of Initial Items in Adaptive Testing

Addressing Selection Bias in Computerized Adaptive Testing: A User-Wise Aggregate Influence Function Approach

A Robust Computerized Adaptive Testing Approach in Educational Question Retrieval

Statistical Detection and Estimation of Differential Item Functioning in Computerized Adaptive Testing

Benefits of the Curious Behavior of Bayesian Hierarchical Item Response Theory Models—An in-Depth Investigation and Bias Correction

A-Stratified Multistage Computerized Adaptive Testing

Item Selection Methods for Computer Adaptive Testing With Passages

The effects of computer self-efficacy, training satisfaction and test anxiety on attitude and performance in computerized adaptive testing

Enhancing Effort-Moderated Item Response Theory Models by Evaluating a Two-Step Estimation Method and Multidimensional Variations on the Model

The Influence of Ability Level and Big Five Personality Traits on Examinees' Test-Taking Behaviour in Computerised Adaptive Testing

Improving the 3PLM-Based Computerized Adaptive Testing System with Multi-Agent Item Bank.

Location-Matching Adaptive Testing for Polytomous Technology-Enhanced Items