Abstract:Recently, researchers have proposed various deep learning methods to accurately detect infrared targets with the characteristics of indistinct shape and texture. Due to the limited variety of infrared datasets, training deep learning models with good generalization poses a challenge. To augment the infrared dataset, researchers employ data augmentation techniques, which often involve generating new images by combining images from different datasets. However, these methods are lacking in two respects. In terms of realism, the images generated by mixup-based methods lack realism and are difficult to effectively simulate complex real-world scenarios. In terms of diversity, compared with real-world scenes, borrowing knowledge from another dataset inherently has a limited diversity. Currently, the diffusion model stands out as an innovative generative approach. Large-scale trained diffusion models have a strong generative prior that enables real-world modeling of images to generate diverse and realistic images. In this paper, we propose Diff-Mosaic, a data augmentation method based on the diffusion model. This model effectively alleviates the challenge of diversity and realism of data augmentation methods via diffusion prior. Specifically, our method consists of two stages. Firstly, we introduce an enhancement network called Pixel-Prior, which generates highly coordinated and realistic Mosaic images by harmonizing pixels. In the second stage, we propose an image enhancement strategy named Diff-Prior. This strategy utilizes diffusion priors to model images in the real-world scene, further enhancing the diversity and realism of the images. Extensive experiments have demonstrated that our approach significantly improves the performance of the detection network. The code is available at <a class="link-external link-https" href="https://github.com/YupeiLin2388/Diff-Mosaic" rel="external noopener nofollow">this https URL</a>

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models

Open-vocabulary Object Segmentation with Diffusion Models

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior

Diffusion Models for Open-Vocabulary Segmentation

A Mamba-Diffusion Framework for Multimodal Remote Sensing Image Semantic Segmentation

MaskDiffusion: Exploiting Pre-Trained Diffusion Models for Semantic Segmentation

Diffusion Features to Bridge Domain Gap for Semantic Segmentation

Select-Mosaic: Data Augmentation Method for Dense Small Object Scenes

Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

Saliency information and mosaic based data augmentation method for densely occluded object recognition

A Simple Background Augmentation Method for Object Detection with Diffusion Model

MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models

MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models

P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation