Abstract:Background: A common feature of performance assessments is the use of human assessors to render judgements on student performance. From a measurement perspective, variability among assessors when assessing students may be viewed as a concern because it negatively impacts score reliability and validity. However, from a contextual perspective, variability among assessors is considered both meaningful and expected. A qualitative examination of assessor cognition when assessing student performance can assist in exploring what components are amenable to improvement through enhanced rater training, and the extent of variability when viewing assessors as contributing their individual expertise. Therefore, the purpose of this study was to explore assessor cognition as a source of score variability in a performance assessment of practice-based competencies. Method: A mixed-method sequential explanatory study design was used where findings from the qualitative strand assisted in the interpretation of results from the quantitative strand. Scores from one objective structured clinical examination (OSCE) were obtained for 95 occupational therapy students. Two Generalizability studies were conducted to examine the relative contribution of assessors as a source of score variability and to estimate the reliability of domain and holistic scores. Think-aloud interviews were conducted with eight participants assessing a subset of student performances from the OSCE in which they participated. Findings from the analysis of think-aloud data and consideration of assessors' background characteristics were used to assist in the interpretation of variance component estimates involving assessors, and score reliability. Results: Results from two generalizability analyses indicated the highest-order interaction-error term involving assessors accounted for the second-highest proportion of variance, after student variation. Score reliability was higher in the holistic vs. analytic scoring framework. Verbal analysis of assessors' think-aloud interviews provided evidential support for the quantitative results. Conclusions: This study provides insight into the nature and extent of assessor variability during a performance assessment of practice-based competencies. Study findings are interpretable from the measurement and contextual perspectives on assessor cognition. An integrated understanding is important to elucidate the meaning underlying the numerical score because the defensibility of inferences made about students' proficiencies rely on score quality, which in turn relies on expert judgements.

A Multi-Faceted Approach to Scrutinizing the Reliability of a Measure of STEM Teacher Strategic Knowledge

Investigating the reliability of CET-SET using Multi-Facet Rasch Model

A Thematic Review on the Combination of Statistical Tools and Measuring Instruments for Analyzing Knowledge and Students’ Achievement in Science

Proceed with Caution: Interactive Rules and Teacher Work Sample Scoring Strategies, an Ethnomethodological Study

Exploring assessor cognition as a source of score variability in a performance assessment of practice-based competencies

STUDY OF SOURCES OF SCORE VARIABILITY IN PERFORMANCE ASSESSMENT USING MFRM : A CASE OF SPEAKING TESTIN PETS BAND3 ?

A reliability generalization study of the STEM-CIS scale: Exploring moderator effects

Investigating the Reliability of CET6 Essay Scoring:An Application of Generalizability Theory and Many-facet Rasch Model

Adapting a self-efficacy scale to the task of teaching scientific reasoning: collecting evidence for its psychometric quality using Rasch measurement

A Criterion-Referenced Approach to Student Ratings of Instruction

The stability of kindergarten teachers’ effectiveness: A generalizability study comparing the Framework For Teaching and the Classroom Assessment Scoring System

Evaluating Special Education Instructional Practices Using Observation Rubrics: Investigating the Reliability of School Administrator Ratings

Investigating Score Dependability in English/Chinese Interpreter Certification Performance Testing: A Generalizability Theory Approach

Moving beyond Alpha: A Primer on Alternative Sources of Single-Administration Reliability Evidence for Quantitative Chemistry Education Research

Reliability Generalization of the Motivated Strategies for Learning Questionnaire: A Meta-Analytic View of Reliability Estimates

A note on the score reliability for the Satisfaction With Life Scale: an RG study

Revalidating a measurement instrument of spatial thinking ability for junior and high school students

An important component to investigating STEM persistence: the development and validation of the science identity (SciID) scale

Technical adequacy of measuring teachers' knowledge of dyslexia

Estimating Critical Values for Strength of Alignment Among Curriculum, Assessments, and Instruction

Validation of an instrument for assessing teacher knowledge of basic language constructs of literacy