Multi-instance learning based artificial intelligence model to assist vocal fold leukoplakia diagnosis: A multicentre diagnostic study

Mei-Ling Wang,Cheng-Wei Tie,Jian-Hui Wang,Ji-Qing Zhu,Bing-Hong Chen,Ying Li,Sen Zhang,Lin Liu,Li Guo,Long Yang,Li-Qun Yang,Jiao Wei,Feng Jiang,Zhi-Qiang Zhao,Gui-Qi Wang,Wei Zhang,Quan-Mao Zhang,Xiao-Guang Ni
DOI: https://doi.org/10.1016/j.amjoto.2024.104342
Abstract:Objective: To develop a multi-instance learning (MIL) based artificial intelligence (AI)-assisted diagnosis models by using laryngoscopic images to differentiate benign and malignant vocal fold leukoplakia (VFL). Methods: The AI system was developed, trained and validated on 5362 images of 551 patients from three hospitals. Automated regions of interest (ROI) segmentation algorithm was utilized to construct image-level features. MIL was used to fusion image level results to patient level features, then the extracted features were modeled by seven machine learning algorithms. Finally, we evaluated the image level and patient level results. Additionally, 50 videos of VFL were prospectively gathered to assess the system's real-time diagnostic capabilities. A human-machine comparison database was also constructed to compare the diagnostic performance of otolaryngologists with and without AI assistance. Results: In internal and external validation sets, the maximum area under the curve (AUC) for image level segmentation models was 0.775 (95 % CI 0.740-0.811) and 0.720 (95 % CI 0.684-0.756), respectively. Utilizing a MIL-based fusion strategy, the AUC at the patient level increased to 0.869 (95 % CI 0.798-0.940) and 0.851 (95 % CI 0.756-0.945). For real-time video diagnosis, the maximum AUC at the patient level reached 0.850 (95 % CI, 0.743-0.957). With AI assistance, the AUC improved from 0.720 (95 % CI 0.682-0.755) to 0.808 (95 % CI 0.775-0.839) for senior otolaryngologists and from 0.647 (95 % CI 0.608-0.686) to 0.807 (95 % CI 0.773-0.837) for junior otolaryngologists. Conclusions: The MIL based AI-assisted diagnosis system can significantly improve the diagnostic performance of otolaryngologists for VFL and help to make proper clinical decisions.
What problem does this paper attempt to address?