Abstract:Situational Judgment Tests(SJTs)have gained popularity for their unique testing content and high face validity.However,traditional SJT formats,particularly those employing multiple-choice(MC)options,have encountered scrutiny due to their susceptibility to test-taking strategies.In contrast,open-ended and constructed response(CR)formats present a propitious means to address this issue.Nevertheless,their extensive adoption encounters hurdles primarily stemming from the financial implications associated with manual scoring.In response to this challenge,we propose an open-ended SJT employing a written-constructed response format for the assessment of teacher competency.This study established a scoring framework leveraging natural language processing(NLP)technology to automate the assessment of response texts,subsequently subjecting the system's validity to rigorous evaluation.The study constructed a comprehensive teacher competency model encompassing four distinct dimensions:student-oriented,problem-solving,emotional intelligence,and achievement motivation.Additionally,an open-ended situational judgment test was developed to gauge teachers'aptitude in addressing typical teaching dilemmas.A dataset comprising responses from 627 primary and secondary school teachers was collected,with manual scoring based on predefined criteria applied to 6,000 response texts from 300 participants.To expedite the scoring process,supervised learning strategies were employed,facilitating the categorization of responses at both the document and sentence levels.Various deep learning models,including the convolutional neural network(CNN),recurrent neural network(RNN),long short-term memory(LSTM),C-LSTM,RNN+attention,and LSTM+attention,were implemented and subsequently compared,thereby assessing the concordance between human and machine scoring.The validity of automatic scoring was also verified. This study reveals that the open-ended situational judgment test exhibited an impressive Cronbach's alpha coefficient of 0.91 and demonstrated a good fit in the validation factor analysis through the use of Mplus.Criterion-related validity was assessed,revealing significant correlations between test results and various educational facets,including instructional design,classroom evaluation,homework design,job satisfaction,and teaching philosophy.Among the diverse machine scoring models evaluated,CNNs have emerged as the top-performing model,boasting a scoring accuracy ranging from 70％to 88％,coupled with a remarkable degree of consistency with expert scores(r=0.95,QWK=0.82).The correlation coefficients between human and computer ratings for the four dimensions-student-oriented,problem-solving,emotional intelligence,and achievement motivation-approximated 0.9.Furthermore,the model showcased an elevated level of predictive accuracy when applied to new text datasets,serving as compelling evidence of its robust generalization capabilities. This study ventured into the realm of automated scoring for open-ended situational judgment tests,employing rigorous psychometric methodologies.To affirm its validity,the study concentrated on a specific facet:the evaluation of teacher competency traits.Fine-grained scoring guidelines were formulated,and state-of-the-art NLP techniques were used for text feature recognition and classification.The primary findings of this investigation can be summarized as follows:(1)Open-ended SJTs can establish precise scoring criteria grounded in crucial behavioral response elements;(2)Sentence-level text classification outperforms document-level classification,with CNNs exhibiting remarkable accuracy in response categorization;and(3)The scoring model consistently delivers robust performance and demonstrates a remarkable degree of alignment with human scoring,thereby hinting at its potential to partially supplant manual scoring procedures.

Optimizing the Validity of Situational Judgment Tests: the Importance of Scoring Methods

The Validity of Situational Judgment Tests: A Review of Recent Research

Automated Scoring of Open-Ended Situational Judgment Tests

Assessing distinguishable social skills in medical admission: does construct-driven development solve validity issues of situational judgment tests?

Opening the Black Box: Agreement and Reliability of a Situational Judgment Test Across Multiple Institutions

Evaluating Different Scoring Methods for Multiple Response Items Providing Partial Credit

Predicting undergraduate OSCE performance using traditional and construct-driven situational judgment tests at admission

Use of situational judgment tests to predict job performance: a clarification of the literature

Exploring the Utility and Fairness of a Multi-institutional Situational Judgment Test

Developing a Situational Judgement Test to Assess Clinical Judgement in Fourth-Year Medical Students: A Pilot Study

Testing the Efficacy of a New Faking-mitigation Strategy for Optimizing the Utility of SJTs

Examining the Process of Responding to Circumplex Scales of Interpersonal Values Items: Should Ideal Point Scoring Methods Be Considered?

Multilevel Rasch Modeling of Two-Tier Multiple Choice Test: A Case Study Using Lawson’s Classroom Test of Scientific Reasoning

Effects of a Constructed Response Retest Strategy on Faking, Test Perceptions, and Criterion-Related Validity of Situational Judgment Tests

Investigating the reliability of CET-SET using Multi-Facet Rasch Model

81 An Examination of a Multiple Cutoff Performance Validity Index

The Research Trends of Situational Judgment Test

One Score to Rule Them All? Comparing the Predictive and Concurrent Validity of 30 Hearts and Flowers Scoring Approaches

Comparing computer-based and paper-based rating modes in an English writing test

Exploring assessor cognition as a source of score variability in a performance assessment of practice-based competencies

Evaluating a Brief Big Five Personality Test in a Diverse Chinese Sample: The Role of Midpoint Designs and Reversely-Worded Items