ScaleNet: Rethinking Feature Interaction from a Scale-Wise Perspective for Medical Image Segmentation.

Yu Feng,Tai Ma,Hao Zeng,Zhengke Xu,Suwei Zhang,Ying Wen
DOI: https://doi.org/10.1007/978-3-031-50078-7_18
2024-01-01
Abstract:Recently, vision transformers have become outstanding segmentation structures for their remarkable global modeling capability. In current transformer-based models for medical image segmentation, convolutional layers are often replaced by transformers, or transformers are added to the deepest layer of the encoder to learn the global context. However, for the extracted multi-scale feature information, most existing methods tend to ignore the multi-scale dependencies, which leads to inadequate feature learning and fails to produce rich feature representations. In this paper, we propose ScaleNet from the perspective of feature interaction at different scales that can alleviate mentioned problems. Specifically, our approach consists of two multi-scale feature interaction modules: the spatial scale interaction (SSI) and the channel scale interaction (CSI). SSI uses a transformer to aggregate patches from different scale features to enhance the feature representations at the spatial scale. CSI uses a 1D convolutional layer and a fully connected layer to perform a global fusion of multi-level features at the channel scale. The combination of CSI and SSI enables ScaleNet to emphasize multi-scale dependencies and effectively resolve complex scale variations.
What problem does this paper attempt to address?