Abstract:Medical school faculty and administrators regularly assess sentiment in student‐generated textual data, such as instructor and course evaluations. Machine learning models (MLMs) that automate and systematize sentiment analysis are commercially available. However, the congruency of MLMs has not been extensively tested. We compare sentiment polarity derived from human analysis and five MLMs to test the hypothesis they yield significantly correlated output. Student evaluations (n=116) of the neuroscience module at the UCF College of Medicine were collected and anonymized. Students were asked to evaluate the strengths of the module (n=108) and provide suggestions for improvement (n=102). Responses were subjected to sentiment analysis by five commercially available MLMs and two module faculty reviewers. Sentiment was classified as either positive (1), neutral (0), or negative (‐1). Sentiment polarity as assessed by the reviewers was significantly correlated (r=0.66, p<0.05). Reviewer assessments were congruent for 73.8% (n=155) of responses. Congruence was greatest for responses on strengths of the module (92.6%, n=100) compared to suggestions for improvement (54.4%, n=103). Congruency among the MLMs was 38.1% (n=80) for all responses, 60.1% (n=80) for module strengths (n=65), and 14.7% (n=15) for suggestions for improvement. Correlation matrix showed moderate correlations among the reviewers and MLMs (range of r=0.41‐0.62, p<0.05). Congruence among all reviewers and MLMs occurred for 34.3% (n=72) responses, with maximal incongruence occurring for only 2.4% (n=5) of responses. Again, congruence was greatest for responses on module strengths (58.3%, n=63) compared to suggestions for improvement (8.8%, n=9). All methods assessed the responses on strength of the module to be more positive than suggestions for improvement. With all methods combined, responses on strengths of the module scored an average of 0.79 compared to suggestions for improvement, which scored ‐0.34. Sentiment polarity derived from human analysis and MLMs is significantly correlated, although the coefficients reflect only modest linear relationships. MLMs are less congruent than the human observers. All methods demonstrate greater congruence when assessing responses as more positive (i.e., module strengths) than negative (i.e., suggestions for improvement). Additional refinement of MLMs may be necessary before for they can be applied with consistency in medical education settings.

Toward automatic evaluation of medical abstracts: The current value of sentiment analysis and machine learning for classification of the importance of PubMed abstracts of randomized trials for stroke

Sentiment analysis in medical settings: New opportunities and challenges

Machine Learning Techniques for Sentiment Analysis of COVID-19-Related Twitter Data

Machine learning in medicine: a practical introduction to natural language processing

Abstract WP88: Can Machine Learning Assist With Large Scale Medical Literature Review?

Further Characterization of a Speract Receptor on Sea Urchin Spermatozoa a

Sentiment Analysis Machine Learning Model Congruence: A Case Study Using Neuroscience Module Evaluations

Validating GAN-BioBERT: A Methodology For Assessing Reporting Trends In Clinical Trials

Machine Learning Based Sentiment Text Classification for Evaluating Treatment Quality of Discharge Summary

"Not by Our Feeling, But by Other's Seeing": Sentiment Analysis Technique in Cardiology-An Exploratory Review

Estimating the severity of dental and oral problems via sentiment classification over clinical reports

Automatic Annotation of PubMed Articles with MeSH Qualifiers

Distinguishing Clinical Sentiment: The Importance of Domain Adaptation in Psychiatric Patient Health Records

"Hey..! This medicine made me sick": Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques

The Value of Applying Machine Learning in Predicting the Time of Symptom Onset in Stroke Patients: Systematic Review and Meta-Analysis

Comprehension of polarity of articles by citation sentiment analysis using TF-IDF and ML classifiers

The Graves lecture, 1966

Rethinking Difference: A Feminist Reframing of Gender/Race/Class for the Improvement of Women's Health Research

Using Machine Learning to Predict the Sentiment of Online Reviews: A New Framework for Comparative Analysis

Automated medical literature screening using artificial intelligence: a systematic review and meta-analysis

[Active immunotherapy in patients with malignant melanoma (preliminary report)].