P598 Validating and Enhancing Real-time Disease Severity Classification in Ulcerative Colitis: Artificial Intelligence as a Second Opinion Trigger

B Z S Lo,B Møller,C Igel,S Wildt,I Vind,F Bendtsen,J Burisch,I Bulat
DOI: https://doi.org/10.1093/ecco-jcc/jjad212.0728
2024-01-01
Journal of Crohn's and Colitis
Abstract:Abstract Background Endoscopic ulcerative colitis (UC) severity classification shows high interobserver variance. Our prior study proved AI matches central reading scoring still images. To be clinically useful, assessing longer segments is vital. Our aim: a new model for real-time or video-based severity evaluation and demonstrate the supporting value it might offer. Methods Data was Mayo Endoscopic Subscore (MES)-scored using 2561 images and 53 videos from 645 patients to train a convolutional neural network. Through open-set-recognition, the model differentiated scoreable from unscoreable endoscopy sections. The validation included 140 videoclips from 44 UC patients. Six IBD-experts and 16 non-IBD experts independently rated these clips, with the majority IBD-expert score serving as ground truth. We assessed its value as a second opinion for non-IBD experts and conducted an alpha test with real-time endoscopic support on a real-world patient. Results The model achieved an overall accuracy of 0.82 and 0.84 for MES 0, 0.81 for MES 1, 0.72 for MES 2, and 0.96 for MES 3. No significant distinction between individual experts or ground truth vs. the AI model was observed (figure 1). When employed as a trigger for second opinions, non-IBD experts' performance improved by 10% (table 1). On average, 26-32 % (range on an individual level: 17-39 %) of the time (depending on the evaluated seniority) the framework disagreed with the primary physician. In those cases, the model was correct in an average of 57-59 % (range on an individual level: 33-76%), and the second physician’s opinion was correct in an average of 64-70 % (range on an individual level: 60-77%) of the time according to the ground truth. The alpha test successfully integrated the model into the endoscopic column for real-time classification. It accurately discerned MES 0 and MES 1 frames, aligning with the endoscopist's assessment. Conclusion Our innovative AI model exhibits significant potential for enhancing UC severity classification accuracy, rivalling IBD-experts and notably improving non-specialists' proficiency. It is designed for clinical implementation and has demonstrated clinical feasibility in an alpha test. Figure 1: Table 1:
gastroenterology & hepatology
What problem does this paper attempt to address?