HYATT-Net is Grand: A Hybrid Attention Network for Performant Anatomical Landmark Detection

Xiaoqian Zhou,Zhen Huang,Heqin Zhu,Qingsong Yao,S.Kevin Zhou
2024-12-16
Abstract:Anatomical landmark detection (ALD) from a medical image is crucial for a wide array of clinical applications. While existing methods achieve quite some success in ALD, they often struggle to balance global context with computational efficiency, particularly with high-resolution images, thereby leading to the rise of a natural question: where is the performance limit of ALD? In this paper, we aim to forge performant ALD by proposing a {\bf HY}brid {\bf ATT}ention {\bf Net}work (HYATT-Net) with the following designs: (i) A novel hybrid architecture that integrates CNNs and Transformers. Its core is the BiFormer module, utilizing Bi-Level Routing Attention for efficient attention to relevant image regions. This, combined with Attention Residual Module(ARM), enables precise local feature refinement guided by the global context. (ii) A Feature Fusion Correction Module that aggregates multi-scale features and thus mitigates a resolution loss. Deep supervision with a mean-square error loss on multi-resolution heatmaps optimizes the model. Experiments on five diverse datasets demonstrate state-of-the-art performance, surpassing existing methods in accuracy, robustness, and efficiency. The HYATT-Net provides a promising solution for accurate and efficient ALD in complex medical images. Our codes and data are already released at: \url{<a class="link-external link-https" href="https://github.com/ECNUACRush/HYATT-Net" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?