Deep learning models for bolus segmentation in videofluoroscopic swallow studies

Wuqi Li,Shitong Mao,Amanda S. Mahoney,Sandra Petkovic,James L. Coyle,Ervin Sejdić
DOI: https://doi.org/10.1007/s11554-023-01398-1
IF: 2.293
2024-01-08
Journal of Real-Time Image Processing
Abstract:One of the benefits of the videofluoroscopic swallow study (VFSS) is the visualization of the bolus transit during the swallowing process. This X-ray imaging technique allows clinicians to observe the occurrence of penetration and aspiration of a bolus into the airway, and to characterize possible post-swallow residue. This study aims to develop and analyze deep learning models for bolus segmentation in videofluoroscopic swallow study. This study utilized various encoder–decoder-based deep learning models to automatically segment a bolus. The models were developed with 6424 VFSS images from 270 swallow studies obtained from 28 patients (15 males, mean age: 5987 ± 1488 years; 13 females, mean age: 5708 ± 1721 years) suspected of dysphagia (swallowing difficulties). The data were split at patient level with a proportion of 80%, 10%, and 10% for training, validation, and testing, respectively. Model performance was mainly evaluated by dice score coefficient (DSC) and intersection-over-union (IoU). The InceptionResNetV2 encoder in the UNet + + architecture achieved the best performance with 81.16% of DSC and 68.29% of IoU, while the inference speed was 49.34 ms per image on a designated device. In addition, the UNet + + with MobileNetV2 encoder achieved a considerably faster inference speed of 10.08 ms per image and slightly lower performance of 80.98% and 68.04% for DSC and IOU, respectively. Our study demonstrated effective and accurate methods of segmenting and tracking a bolus on all frames of VFSS exams in real time, indicating the potential to reduce human error and contribute objective analysis to early dysphagia diagnosis and management.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?