TransDiffSeg: Transformer-Based Conditional Diffusion Segmentation Model for Abdominal Multi-Objective

WenWen Gu,GuoDong Zhang,RongHui Ju,SuRan Wang,YanLin Li,TingYu Liang,Wei Guo,ZhaoXuan Gong
DOI: https://doi.org/10.1007/s10278-024-01206-7
2024-07-29
Abstract:In the domain of medical image segmentation, traditional diffusion probabilistic models are hindered by local inductive biases stemming from convolutional operations, constraining their ability to model long-term dependencies and leading to inaccurate mask generation. Conversely, Transformer offers a remedy by obviating the local inductive biases inherent in convolutional operations, thereby enhancing segmentation precision. Currently, the integration of Transformer and convolution operations mainly occurs in two forms: nesting and stacking. However, both methods address the bias elimination at a relatively large granularity, failing to fully leverage the advantages of both approaches. To address this, this paper proposes a conditional diffusion segmentation model named TransDiffSeg, which combines Transformer with convolution operations from traditional diffusion models in a parallel manner. This approach eliminates the accumulated local inductive bias of convolution operations at a finer granularity within each layer. Additionally, an adaptive feature fusion block is employed to merge conditional semantic features and noise features, enhancing global semantic information and reducing the Transformer's sensitivity to noise features. To validate the impact of granularity in bias elimination on performance and the impact of Transformer in alleviating the accumulated local inductive biases of convolutional operations in diffusion probabilistic models, experiments are conducted on the AMOS22 dataset and BTCV dataset. Experimental results demonstrate that eliminating local inductive bias at a finer granularity significantly improves the segmentation performance of diffusion probabilistic models. Furthermore, the results confirm that the finer the granularity of bias elimination, the better the segmentation performance.
What problem does this paper attempt to address?