Abstract:Objectives: Accurate extraction of regions of interest (ROI) with variable shapes and scales is one of the primary challenges in medical image segmentation. Current U-based networks mostly aggregate multi-stage encoding outputs as an improved multi-scale skip connection. Although this design has been proven to provide scale diversity and contextual integrity, there remain several intuitive limits: (i) the encoding outputs are resampled to the same size simply, which destruct the fine-grained information. The advantages of utilization of multiple scales are insufficient. (ii) Certain redundant information proportional to the feature dimension size is introduced and causes multi-stage interference. And (iii) the precision of information delivery relies on the up-sampling and down-sampling layers, but guidance on maintaining consistency in feature locations and trends between them is lacking. Methods: To improve these situations, this paper proposed a U-based CNN network named HAD-Net, by assembling a new hyper-scale shifted aggregating module (HSAM) paradigm and progressive reusing attention (PRA) for skip connections, as well as employing a novel pair of dual-branch parameter-free sampling layers, i.e. max-diagonal pooling (MDP) and max-diagonal un-pooling (MDUP). That is, the aggregating scheme additionally combines five subregions with certain offsets in the shallower stage. Since the lower scale-down ratios of subregions enrich scales and fine-grain context. Then, the attention scheme contains a partial-to-global channel attention (PGCA) and a multi-scale reusing spatial attention (MRSA), it builds reusing connections internally and adjusts the focus on more useful dimensions. Finally, MDP and MDUP are explored in pairs to improve texture delivery and feature consistency, enhancing information retention and avoiding positional confusion. Results: Compared to state-of-the-art networks, HAD-Net has achieved comparable and even better performances with Dice of 90.13%, 81.51%, and 75.43% for each class on BraTS20, 89.59% Dice and 98.56% AUC on Kvasir-SEG, as well as 82.17% Dice and 98.05% AUC on DRIVE. Conclusions: The scheme of HSAM+PRA+MDP+MDUP has been proven to be a remarkable improvement and leaves room for further research.

HYATT-Net is Grand: A Hybrid Attention Network for Performant Anatomical Landmark Detection

Real-Time Facial Landmark Detection by Attention-driven Lightweight Network

CephaNN: A Multi-Head Attention Network for Cephalometric Landmark Detection.

Dual-attention transformer-based hybrid network for multi-modal medical image segmentation

Learn Fine-Grained Adaptive Loss for Multiple Anatomical Landmark Detection in Medical Images

You Only Learn Once: Universal Anatomical Landmark Detection

A universal medically assisted model for anatomical landmark detection in radioactive images

ALA-Net: Adaptive Lesion-Aware Attention Network for 3D Colorectal Tumor Segmentation

HD 2 A-Net: A novel dual gated attention network using comprehensive hybrid dilated convolutions for medical image segmentation

DCA: Densely Cross-scale Attention Network for Anatomically-plausible Medical Image Segmentation.

Feature Aggregation and Refinement Network for 2D Anatomical Landmark Detection

Automatic Localization of Landmarks in Craniomaxillofacial CBCT Images Using a Local Attention-Based Graph Convolution Network.

Anatomical Landmarks Annotation on 2D Lateral Cephalograms with Channel Attention

HAD-Net: an Attention U-based Network with Hyper-Scale Shifted Aggregating and Max-Diagonal Sampling for Medical Image Segmentation

MTANet: Multi-Task Attention Network for Automatic Medical Image Segmentation and Classification

FDGR-Net: Feature Decouple and Gated Recalibration Network for Medical Image Landmark Detection

U-Net with Hierarchical Bottleneck Attention for Landmark Detection in Fundus Images of the Degenerated Retina

HCT-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation

High-Order Attention Networks for Medical Image Segmentation

Anatomical Landmark Detection Using a Feature-Sharing Knowledge Distillation-Based Neural Network

DATR: Domain-adaptive transformer for multi-domain landmark detection