Self-Logical Consistent GPT-4 Enables Human-Level Classification of Patient Feedback

Zeno Loi,David Morquin,Xavier Derzko,Xavier Corbier,Sylvie Gauthier,Patrice Taourel,Emilie Prin-Lombardo,Grégoire Mercier,Kévin Yauy
DOI: https://doi.org/10.1101/2024.07.11.24310210
2024-10-26
Abstract:Patient satisfaction feedback is crucial for hospital service quality, but human-led reviews are time-consuming and traditional natural language processing remains ineffective. Large Language Models (LLM) offer potential, but their tendency to generate illogical thoughts limits their use in healthcare. Here we describe Self-Logical Consistency Assessment (SLCA), a method ensuring a reproducible LLM classification explained by a logically-structured chain of thought. In an analysis targeting extrinsic faithfulness hallucinations, SLCA mitigated the 16% GPT-4 hallucination rate, leaving only three residual cases across 12,600 classifications from 100 diverse patient feedbacks. In a benchmark designed to evaluate classification accuracy, SLCA applied to GPT-4 outperformed best algorithms, with a 88% precision rate and a 71% recall rate across 49,140 classifications from 1,170 sampled patient feedbacks. This method provides a reliable, scalable solution for improving hospital services and shows potential for accurate, explainable text classifications without fine-tuning.
Health Systems and Quality Improvement
What problem does this paper attempt to address?