HTC-Net: A hybrid CNN-transformer framework for medical image segmentation

Hui Tang,Yuanbin Chen,Tao Wang,Yuanbo Zhou,Longxuan Zhao,Qinquan Gao,Min Du,Tao Tan,Xinlin Zhang,Tong Tong
DOI: https://doi.org/10.1016/j.bspc.2023.105605
IF: 5.1
2023-10-23
Biomedical Signal Processing and Control
Abstract:Automated medical image segmentation is a crucial step in clinical analysis and diagnosis, as it can improve diagnostic efficiency and accuracy. Deep convolutional neural networks (DCNNs) have been widely used in the medical field, achieving excellent results. The high complexity of medical images poses a significant challenge for many networks in balancing local and global information, resulting in unstable segmentation outcomes. To address the challenge, we designed a hybrid CNN-Transformer network to capture both the local and global information. More specifically, deep convolutional neural networks are introduced to exploit the local information. At the same time, we designed a trident multi-layer fusion (TMF) block for the Transformer to fuse contextual information from higher-level (global) features dynamically. Moreover, considering the inherent characteristic of medical image segmentation (e.g., irregular shapes and discontinuous boundaries), we developed united attention (UA) blocks to focus on important feature learning. To evaluate the effectiveness of our proposed approach, we performed experiments on two publicly available datasets, ISIC-2017, and Kvasir-SEG, and compared our results with state-of-the-art approaches. The experimental results demonstrate the superior performance of our approach. The codes are available at https://github.com/Tanghui2000/HTC-Net .
engineering, biomedical
What problem does this paper attempt to address?
### The Problem Addressed by This Paper This paper aims to address the key challenge in medical image segmentation, which is how to effectively capture global information while preserving local information to improve segmentation accuracy and stability. Specifically: 1. **Combining Local and Global Information**: - The paper proposes a hybrid CNN-Transformer framework (HTC-Net) that utilizes CNN to extract local information and dynamically fuses contextual information from high-level features through a designed Trident Multi-layer Fusion (TMF) block. 2. **Improving Feature Representation Capability**: - A Unified Attention (UA) module is proposed, which combines spatial attention and channel attention to learn richer feature representations, thereby better capturing important information and enhancing network performance. 3. **Validating the Effectiveness of the Method**: - Experiments were conducted on two public datasets, ISIC-2017 and Kvasir-SEG, demonstrating the superior performance of the proposed HTC-Net across multiple evaluation metrics. Through these improvements, HTC-Net can achieve higher accuracy and robustness in medical image segmentation tasks.