Correcting for Rater Effects in Operating Room Surgical Skills Assessment

Ryan Chou,Hajira Naz,Kofi D.O. Boahene,Jessica H. Maxwell,John R. Wanamaker,Patrick J. Byrne,Ira D. Papel,Theda C. Kontis,Gregory D. Hager,Lisa E. Ishii,Sonya Malekzadeh,S. Swaroop Vedula,Masaru Ishii
DOI: https://doi.org/10.1002/lary.31391
IF: 2.97
2024-03-13
The Laryngoscope
Abstract:Rater effects in surgical skill assessment make scores unfair and unreliable. This study estimates rater effects in septoplasty skill, which can influence assessment scores as much as residents' actual skill does, and computes reliable rater‐adjusted surgical skill scores. Objective To estimate and adjust for rater effects in operating room surgical skills assessment performed using a structured rating scale for nasal septoplasty. Methods We analyzed survey responses from attending surgeons (raters) who supervised residents and fellows (trainees) performing nasal septoplasty in a prospective cohort study. We fit a structural equation model with the rubric item scores regressed on a latent component of skill and then fit a second model including the rating surgeon as a random effect to model a rater‐effects‐adjusted latent surgical skill. We validated this model against conventional measures including the level of expertise and post‐graduation year (PGY) commensurate with the trainee's performance, the actual PGY of the trainee, and whether the surgical goals were achieved. Results Our dataset included 188 assessments by 7 raters and 41 trainees. The model with one latent construct for surgical skill and the rater as a random effect was the best. Rubric scores depended on how severe or lenient the rater was, sometimes almost as much as they depended on trainee skill. Rater‐adjusted latent skill scores increased with attending‐estimated skill levels and PGY of trainees, increased with the actual PGY, and appeared constant over different levels of achievement of surgical goals. Conclusion Our work provides a method to obtain rater effect adjusted surgical skill assessments in the operating room using structured rating scales. Our method allows for the creation of standardized (i.e., rater‐effects‐adjusted) quantitative surgical skill benchmarks using national‐level databases on trainee assessments. Level of Evidence N/A Laryngoscope, 2024
medicine, research & experimental,otorhinolaryngology
What problem does this paper attempt to address?