A Dynamic Cross-Scale Transformer with Dual-Compound Representation for 3D Medical Image Segmentation

Ruixia Zhang,Zhiqiong Wang,Zhongyang Wang,Junchang Xin
DOI: https://doi.org/10.1109/ICASSP49357.2023.10095987
2023-01-01
Abstract:Transformer models exploit multi-head self-attention to capture long-range information. Further, window-based self-attention solves the problem of quadratic computational complexity and provides a new solution for dense prediction of 3D images. However, Transformers miss structural information due to the naive tokenization scheme. Furthermore, single-scale attention fails to achieve a balance between feature representation and semantic information. Aiming at the above problems, we propose a window-based dynamic crossscale cross-attention transformer (DCS-Former) for precise representation of the diversity features. DCS-former first constructs dual-compound feature representations through Antehoc Structure-aware Module and Post-hoc Class-aware Module. Then, the bidirectional attention structure is designed to interactively fuse structural features with class representations. The experimental results show that our method outperforms various competing segmentation models on three different public datasets.
What problem does this paper attempt to address?