Abstract:The understanding of long-range pixel–pixel dependencies plays a vital role in image segmentation. The use of a CNN plus an attention mechanism still has room for improvement, since existing transformer-based architectures require many thousands of annotated training samples to model long-range spatial dependencies. This paper presents a smooth attention branch (SAB), a novel architecture that simplifies the understanding of long-range pixel–pixel dependencies for biomedical image segmentation in small datasets. The SAB is essentially a modified attention operation that implements a subnetwork via reshaped feature maps instead of directly calculating a softmax value over the attention score for each input. The SAB fuses multilayer attentive feature maps to learn visual attention in multilevel features. We also introduce position blurring and inner cropping specifically for small-scale datasets to prevent overfitting. Furthermore, we redesign the skip pathway for the reduction of the semantic gap between every captured feature of the contracting and expansive path. We evaluate the architecture of U-Net with the SAB (SAB-Net) by comparing it with the original U-Net and widely used transformer-based models across multiple biomedical image segmentation tasks related to the Brain MRI, Heart MRI, Liver CT, Spleen CT, and Colonoscopy datasets. Our training set was made of random 100 images of the original training set, since our goal was to adopt attention mechanisms for biomedical image segmentation tasks with small-scale labeled data. An ablation study conducted on the brain MRI test set demonstrated that every proposed method achieved an improvement in biomedical image segmentation. Integrating the proposed methods helped the resulting models consistently achieve outstanding performance on the above five biomedical segmentation tasks. In particular, the proposed method with U-Net improved its segmentation performance over that of the original U-Net by 13.76% on the Brain MRI dataset. We proposed several novel methods to address the need for modeling long-range pixel–pixel dependencies in small-scale biomedical image segmentation. The experimental results illustrated that each method could improve the medical image segmentation accuracy to various degrees. Moreover, SAB-Net, which integrated all proposed methods, consistently achieved outstanding performance on the five biomedical segmentation tasks.

J-CaPA : Joint Channel and Pyramid Attention Improves Medical Image Segmentation

MixFormer: a Mixed CNN-Transformer Backbone for Medical Image Segmentation

BMCS-Net: A Bi-directional multi-scale cascaded segmentation network based on transformer-guided feature Aggregation for medical images

TSCA-Net: Transformer based spatial-channel attention segmentation network for medical images

Integrating prior knowledge into a bibranch pyramid network for medical image segmentation

CascadeMedSeg: integrating pyramid vision transformer with multi-scale fusion for precise medical image segmentation

Hybrid Attention Mechanism of Feature Fusion for Medical Image Segmentation

A Multi-Scale Cross-Fusion Medical Image Segmentation Network Based on Dual-Attention Mechanism Transformer

PCCA-Model: an attention module for medical image segmentation

TPAFNet: Transformer-Driven Pyramid Attention Fusion Network for 3D Medical Image Segmentation

2-D general network based on channel-space attention for medical image segmentation

Cross Pyramid Transformer makes U-net stronger in medical image segmentation

CSCA U-Net: A channel and space compound attention CNN for medical image segmentation

Pyramid Medical Transformer for Medical Image Segmentation

A Hybrid Enhanced Attention Transformer Network for Medical Ultrasound Image Segmentation

Multi-scale Feature Pyramid Fusion Network for Medical Image Segmentation

Attention Mechanism Trained with Small Datasets for Biomedical Image Segmentation

A Multi-Scale Context Aware Attention Model for Medical Image Segmentation

Rethinking Attention Gated with Hybrid Dual Pyramid Transformer-CNN for Generalized Segmentation in Medical Imaging

MambaClinix: Hierarchical Gated Convolution and Mamba-Based U-Net for Enhanced 3D Medical Image Segmentation

CASF-Net: Cross-attention and Cross-scale Fusion Network for Medical Image Segmentation