Gender bias in resident evaluations: Natural language processing and competency evaluation

Jane Andrews,David Chartash,Seonaid Hay
DOI: https://doi.org/10.1111/medu.14593
2021-07-30
Medical Education
Abstract:<section class="article-section__content"><h3 class="article-section__sub-title section1"> Background</h3><p>Research shows that female trainees experience evaluation penalties for gender non-conforming behavior during medical training. Studies of medical education evaluations and performance scores do reflect a gender bias, though studies are of varying methodology and results have not been consistent.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Objective</h3><p>We sought to examine the differences in word use, competency themes, and length within written evaluations of internal medicine residents at scale, considering the impact of both faculty and resident gender.</p><p>We hypothesized that female internal medicine residents receive more negative feedback, and different thematic feedback than male residents.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Methods</h3><p>This study utilized a corpus of 3864 individual responses to positive and negative questions over the course of six years (2012-2018) within Yale University School of Medicine's internal medicine residency. Researchers developed a sentiment model to assess the valence of evaluation responses. We then used natural language processing (NLP) to evaluate whether female versus male residents received more positive or negative feedback and if that feedback focused on different Accreditation Council for Graduate Medical Education (ACGME) core competencies based on their gender. Evaluator-evaluatee gender dyad was analyzed to see how it impacted quantity and quality of feedback.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Results</h3><p>We found that female and male residents did not have substantively different numbers of positive or negative comments. While certain competencies were discussed more than others, gender did not seem to influence which competencies were discussed. Neither gender trainee received more written feedback, though female evaluators tended to write longer evaluations.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Conclusions</h3><p>We conclude that when examined at scale, quantitative gender differences are not as prevalent as has been seen in qualitative work. We suggest that further investigation of linguistic phenomena (such as context) is warranted to reconcile this finding with prior work.</p></section>
education, scientific disciplines,health care sciences & services
What problem does this paper attempt to address?