DFormer: Diffusion-guided Transformer for Universal Image Segmentation

Hefeng Wang,Jiale Cao,Rao Muhammad Anwer,Jin Xie,Fahad Shahbaz Khan,Yanwei Pang

2023-06-08

Abstract:This paper introduces an approach, named DFormer, for universal image segmentation. The proposed DFormer views universal image segmentation task as a denoising process using a diffusion model. DFormer first adds various levels of Gaussian noise to ground-truth masks, and then learns a model to predict denoising masks from corrupted masks. Specifically, we take deep pixel-level features along with the noisy masks as inputs to generate mask features and attention masks, employing diffusion-based decoder to perform mask prediction gradually. At inference, our DFormer directly predicts the masks and corresponding categories from a set of randomly-generated masks. Extensive experiments reveal the merits of our proposed contributions on different image segmentation tasks: panoptic segmentation, instance segmentation, and semantic segmentation. Our DFormer outperforms the recent diffusion-based panoptic segmentation method Pix2Seq-D with a gain of 3.6% on MS COCO val2017 set. Further, DFormer achieves promising semantic segmentation performance outperforming the recent diffusion-based method by 2.2% on ADE20K val set. Our source code and models will be publicly on <a class="link-external link-https" href="https://github.com/cp3wan/DFormer" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to design an effective Transformer method based on the diffusion model to achieve competitive general - purpose image segmentation performance. Specifically, existing image segmentation methods are usually optimized for specific tasks and are difficult to be effectively generalized to different image segmentation tasks. Therefore, the paper proposes DFormer, which is a diffusion - guided Transformer framework for general - purpose image segmentation. DFormer views image segmentation as a process of generating from noise masks. During the training process, noise masks are generated by adding different levels of Gaussian noise to the real - label masks, and then the Transformer decoder is used to predict the real - label masks from the noise masks. In the inference stage, DFormer directly predicts masks and their corresponding classes from a set of randomly generated noise masks. Through this method, DFormer aims to overcome the problem of insufficient generalization ability of existing methods among different image segmentation tasks, thereby achieving consistent performance improvements in multiple tasks such as panoptic segmentation, instance segmentation, and semantic segmentation. Experimental results show that DFormer outperforms the recent diffusion - model - based panoptic segmentation method Pix2Seq - D by 3.6% in the panoptic segmentation task on the MS COCO validation set, and outperforms the recent diffusion - model - based method by 2.2% in the semantic segmentation task on the ADE20K validation set. This indicates that DFormer has advantages not only in performance improvement but also in parameter efficiency and training efficiency.

DFormer: Diffusion-guided Transformer for Universal Image Segmentation

MaskDiffusion: Exploiting Pre-Trained Diffusion Models for Semantic Segmentation

DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation

MedSegDiff-V2: Diffusion based Medical Image Segmentation with Transformer

Diff-SFCT: A Diffusion Model with Spatial-Frequency Cross Transformer for Medical Image Segmentation.

Cold SegDiffusion: A Novel Diffusion Model for Medical Image Segmentation

High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity

A generic plug & play diffusion-based denosing module for medical image segmentation

TransDiffSeg: Transformer-Based Conditional Diffusion Segmentation Model for Abdominal Multi-Objective

MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model

FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models

DifFSS: Diffusion Model for Few-Shot Semantic Segmentation

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

FDiff-Fusion:Denoising diffusion fusion network based on fuzzy learning for 3D medical image segmentation

P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

Diffusion Features to Bridge Domain Gap for Semantic Segmentation

Denoising Diffusions in Latent Space for Medical Image Segmentation

FDiff-Fusion: Denoising Diffusion Fusion Network Based on Fuzzy Learning for 3D Medical Image Segmentation

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery.