Automatic Annotation of Cervical Vertebrae in Videofluoroscopy Images Via Deep Learning

Zhenwei Zhang,Shitong Mao,James Coyle,Ervin Sejdic
DOI: https://doi.org/10.1016/j.media.2021.102218
2021-01-01
Abstract:Judging swallowing kinematic impairments via videofluoroscopy represents the gold standard for the de-tection and evaluation of swallowing disorders. However, the efficiency and accuracy of such a biome-chanical kinematic analysis vary significantly among human judges affected mainly by their training and experience. Here, we showed that a novel machine learning algorithm can with high accuracy automati-cally detect key anatomical points needed for a routine swallowing assessment in real-time. We trained a novel two-stage convolutional neural network to localize and measure the vertebral bodies using 1518 swallowing videofluoroscopies from 265 patients. Our network model yielded high accuracy as the mean distance between predicted points and annotations was 4.20 +/- 5.54 pixels. In comparison, human inter-rater error was 4.35 +/- 3.12 pixels. Furthermore, 93% of predicted points were less than five pixels from annotated pixels when tested on an independent dataset from 70 subjects. Our model offers more choices for speech language pathologists in their routine clinical swallowing assessments as it provides an effi-cient and accurate method for anatomic landmark localization in real-time, a task previously accom-plished using an off-line time-sinking procedure. (c) 2021 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?