MoAFormer: Aggregating Adjacent Window Features into Local Vision Transformer Using Overlapped Attention Mechanism for Volumetric Medical Segmentation.

Yixi Luo,Huayi Yin,Xia Du
DOI: https://doi.org/10.1145/3581807.3581825
2022-01-01
Abstract:The window-based attention is used to alleviate the problem of abrupt increase in computation as the input image resolution grows and shows excellent performance. However, the problem that aggregating global features from different windows is waiting to be resolved. Swin-Transformer is proposed to construct hierarchical encoding by a shifted-window mechanism to interactively learn the information between different windows. In this work, we investigate the outcome of applying an overlapped attention block (MoA) after the local attention layer and apply plenty to medical image segmentation tasks. The overlapped attention module employs slightly larger and overlapped patches in the key and value to enable neighbouring pixel information transmission, which leads to significant performance gain. The experimental results on the ACDC and Synapse datasets demonstrate that the used method performs better than previous Transformer models.
What problem does this paper attempt to address?