HT-Net: hierarchical context-attention transformer network for medical ct image segmentation

Mingjun Ma,Haiying Xia,Yumei Tan,Haisheng Li,Shuxiang Song
DOI: https://doi.org/10.1007/s10489-021-03010-0
IF: 5.3
2022-01-15
Applied Intelligence
Abstract:Convolutional neural networks (CNNs) have been a prevailing technique in the field of medical CT image processing. Although encoder-decoder CNNs exploit locality for efficiency, they cannot adequately model remote pixel relationships. Recent works prove it possible to stack self-attention or transformer layers to effectively learn long-range dependencies. Transformers have been extended to computer vision tasks by creating and treating image patches as embeddings. However, transformer-based architectures lack global semantic information interaction and require large-scale dataset for training, making it difficult to effectively train with limited data samples. To address these issues, we propose a hierarchical context-attention transformer network (HT-Net), which integrates the multi-scale, transformer and hierarchical context extraction modules in skip-connections. The multi-scale module captures richer CT semantic information, enabling transformers to better encode feature maps of tokenized image patches from different stages of CNN as input attention sequences.The hierarchical context attention module complements global information and re-weights the pixels to capture semantic context. Extensive experiments on three datasets demonstrate that the proposed HT-Net outperforms state-of-the-art approaches.
computer science, artificial intelligence
What problem does this paper attempt to address?