Real‐Time Laryngeal Cancer Boundaries Delineation on White Light and Narrow‐Band Imaging Laryngoscopy with Deep Learning

Claudio Sampieri,Muhammad Adeel Azam,Alessandro Ioppi,Chiara Baldini,Sara Moccia,Dahee Kim,Alessandro Tirrito,Alberto Paderno,Cesare Piazza,Leonardo S. Mattos,Giorgio Peretti
DOI: https://doi.org/10.1002/lary.31255
IF: 2.97
2024-01-06
The Laryngoscope
Abstract:A custom‐made algorithm called SegMENT‐Plus was trained on 3933 laryngeal carcinoma images obtained by 557 patients. The model achieved Dice similarity coefficient of 0.827, Intersection over the union of 0.828, accuracy of 0.972, and inference speed of 25.6 fps, thus reaching real‐time performances. SegMENT‐Plus performed similarly on two external validation datasets. The performances of the model showed no significant differences from those obtained by two residents. The implementation of artificial intelligence during laryngoscopy can support clinicians in delineating the superficial extent of laryngeal cancer. SegMENT‐Plus showed reliable results, with performances equal to those of two otolaryngology residents and with computation speed. Objective To investigate the potential of deep learning for automatically delineating (segmenting) laryngeal cancer superficial extent on endoscopic images and videos. Methods A retrospective study was conducted extracting and annotating white light (WL) and Narrow‐Band Imaging (NBI) frames to train a segmentation model (SegMENT‐Plus). Two external datasets were used for validation. The model's performances were compared with those of two otolaryngology residents. In addition, the model was tested on real intraoperative laryngoscopy videos. Results A total of 3933 images of laryngeal cancer from 557 patients were used. The model achieved the following median values (interquartile range): Dice Similarity Coefficient (DSC) = 0.83 (0.70–0.90), Intersection over Union (IoU) = 0.83 (0.73–0.90), Accuracy = 0.97 (0.95–0.99), Inference Speed = 25.6 (25.1–26.1) frames per second. The external testing cohorts comprised 156 and 200 images. SegMENT‐Plus performed similarly on all three datasets for DSC (p = 0.05) and IoU (p = 0.07). No significant differences were noticed when separately analyzing WL and NBI test images on DSC (p = 0.06) and IoU (p = 0.78) and when analyzing the model versus the two residents on DSC (p = 0.06) and IoU (Senior vs. SegMENT‐Plus, p = 0.13; Junior vs. SegMENT‐Plus, p = 1.00). The model was then tested on real intraoperative laryngoscopy videos. Conclusion SegMENT‐Plus can accurately delineate laryngeal cancer boundaries in endoscopic images, with performances equal to those of two otolaryngology residents. The results on the two external datasets demonstrate excellent generalization capabilities. The computation speed of the model allowed its application on videolaryngoscopies simulating real‐time use. Clinical trials are needed to evaluate the role of this technology in surgical practice and resection margin improvement. Level of Evidence III Laryngoscope, 2024
medicine, research & experimental,otorhinolaryngology
What problem does this paper attempt to address?