Automatic Landmark Localization in 3D Medical CT Images: Few-Shot Learning Through Optimized Data Pre-Processing and Network Design.

Yifan Wang,Thomas Lenarz,Andrej Kral,Samuel John
DOI: https://doi.org/10.1145/3632047.3632048
2024-01-01
Abstract:As surgeons increasingly rely on computed tomography (CT) scans, including low-dose cone beam CT, it has become critical to effectively understand and annotate these medical images, not only for improved treatment planning, but also for robotic and implant surgery. However, in clinical practice, accurate localization and annotation of specific landmarks still relies heavily on manual efforts. Consequently, we present an automatic landmark localization pipeline with few-shot learning, specifically designed for 3D CT scans. In this paper, we focus on the cochlea structure with its three landmarks (apex point, basal point, round window center). The pipeline leverages an optimized data processing in conjunction with a novel designed 2D Convolutional Neural Network (CNN) model. To evaluate the effectiveness of our pipeline in a few-shot learning context, we used 31 volumetric CT scans along with their corresponding annotated landmarks, where 20 volumes are reserved for testing exclusively, while with varying quantities volumes for training, ranging from 1 up to 11. A comparative analysis was conducted among models trained on different numbers of CT volumes. We began by using the weighted F1 score to evaluate the landmark classification performance within the extracted 2D sub-images. A model trained with only 5 CT volumes achieves its peak performance with median weighted F1 scores of around 0.99 (apex), 0.985 (basal), 0.98 (round window center), 0.98 (background). By manually providing an initial point placed near the cochlea, the automatic localization for all three landmarks within the 3D CT volume was then accomplished using a sliding window approach. Compared to manually defined ground truth, the 5-volume-trained model attained an average Euclidean distance error of 0.70 mm (apex), 1.15 mm (basal) and 0.84 mm (round window center) on 3D CT volumes from test set. This demonstrates the efficiency and accuracy of this pipeline.
What problem does this paper attempt to address?