H2Former: An Efficient Hierarchical Hybrid Transformer for Medical Image Segmentation
Along He,Kai Wang,Tao Li,Chengkun Du,Shuang Xia,Huazhu Fu
DOI: https://doi.org/10.1109/tmi.2023.3264513
IF: 10.6
2023-01-01
IEEE Transactions on Medical Imaging
Abstract:Accurate medical image segmentation is of great significance for computer aided diagnosis. Although methods based on convolutional neural networks (CNNs) have achieved good results, it is weak to model the long-range dependencies, which is very important for segmentation task to build global context dependencies. The Transformers can establish long-range dependencies among pixels by self-attention, providing a supplement to the local convolution. In addition, multi-scale feature fusion and feature selection are crucial for medical image segmentation tasks, which is ignored by Transformers. However, it is challenging to directly apply self-attention to CNNs due to the quadratic computational complexity for high-resolution feature maps. Therefore, to integrate the merits of CNNs, multi-scale channel attention and Transformers, we propose an efficient hierarchical hybrid vision Transformer (H2Former) for medical image segmentation. With these merits, the model can be data-efficient for limited medical data regime. The experimental results show that our approach exceeds previous Transformer, CNNs and hybrid methods on three 2D and two 3D medical image segmentation tasks. Moreover, it keeps computational efficiency in model parameters, FLOPs and inference time. For example, H2Former outperforms TransUNet by 2.29% in IoU score on KVASIR-SEG dataset with 30.77% parameters and 59.23% FLOPs.
engineering, biomedical,imaging science & photographic technology, electrical & electronic,computer science, interdisciplinary applications,radiology, nuclear medicine & medical imaging