Abstract:Purpose: Semantic segmentation and landmark detection are fundamental tasks of medical image processing, facilitating further analysis of anatomical objects. Although deep learning-based pixel-wise classification has set a new-state-of-the-art for segmentation, it falls short in landmark detection, a strength of shape-based approaches. Methods: In this work, we propose a dense image-to-shape representation that enables the joint learning of landmarks and semantic segmentation by employing a fully convolutional architecture. Our method intuitively allows the extraction of arbitrary landmarks due to its representation of anatomical correspondences. We benchmark our method against the state-of-the-art for semantic segmentation (nnUNet), a shape-based approach employing geometric deep learning and a CNN-based method for landmark detection. Results: We evaluate our method on two medical dataset: one common benchmark featuring the lungs, heart, and clavicle from thorax X-rays, and another with 17 different bones in the paediatric wrist. While our method is on pair with the landmark detection baseline in the thorax setting (error in mm of $2.6\pm0.9$ vs $2.7\pm0.9$), it substantially surpassed it in the more complex wrist setting ($1.1\pm0.6$ vs $1.9\pm0.5$). Conclusion: We demonstrate that dense geometric shape representation is beneficial for challenging landmark detection tasks and outperforms previous state-of-the-art using heatmap regression. While it does not require explicit training on the landmarks themselves, allowing for the addition of new landmarks without necessitating retraining.}

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper aims to address two fundamental tasks in medical image processing: semantic segmentation and landmark detection. Although pixel-level classification based on deep learning has made significant progress in semantic segmentation, there are still shortcomings in landmark detection, while shape methods perform well in landmark detection. Therefore, the authors propose a dense image-to-shape representation method to achieve joint learning of semantic segmentation and landmark detection through a fully convolutional architecture. ### Specific Problem Description 1. **Semantic Segmentation**: Identifying anatomical structures in images, such as lungs, heart, clavicles, etc. 2. **Landmark Detection**: Determining specific points on anatomical structures, such as key points of bones. ### Shortcomings of Existing Methods - **Deep Learning Methods**: Perform well in semantic segmentation but are less effective in landmark detection. - **Shape Methods**: Perform well in landmark detection but lack pixel-level classification capability for anatomical structures. ### Proposed Method The authors propose a dense image-to-shape representation method, implemented through the following steps: 1. **Generating Dense UV Maps**: Calculate the average shape from all landmarks in the training set as a non-deformable template. Generate identity UV maps for each anatomical structure by indexing the bounding box. Align the landmarks to the template through affine transformation and convert the sparse displacement field to a dense displacement field through bilinear interpolation, ultimately generating dense UV maps. 2. **Extracting Landmarks from UV Maps**: Sample the UV values in the predicted UV maps to find the UV values closest to the known positions, thereby determining the locations of the landmarks. 3. **Loss Function**: Supervise the model training by combining binary cross-entropy loss, L1 norm loss, and total variation regularization term. ### Experimental Results - **JSRT Dataset**: On chest X-rays, this method outperforms existing methods in landmark detection, especially in low-contrast areas. - **GRAZPEDWRI-DX Dataset**: On 17 bones and 720 landmarks of children's wrists, this method significantly outperforms the heatmap regression method, particularly in the accuracy of landmark detection. ### Conclusion By proposing a dense image-to-shape representation method, the authors achieve joint learning of semantic segmentation and landmark detection. This method performs excellently on the complex children's wrist dataset, allowing new landmarks to be added without explicitly training the landmarks themselves, demonstrating high flexibility and robustness.

DenseSeg: Joint Learning for Semantic Segmentation and Landmark Detection Using Dense Image-to-Shape Representation