FLAME Diffuser: Wildfire Image Synthesis using Mask Guided Diffusion

Hao Wang,Sayed Pedram Haeri Boroujeni,Xiwen Chen,Ashish Bastola,Huayu Li,Wenhui Zhu,Abolfazl Razi
2024-10-01
Abstract:Wildfires are a significant threat to ecosystems and human infrastructure, leading to widespread destruction and environmental degradation. Recent advancements in deep learning and generative models have enabled new methods for wildfire detection and monitoring. However, the scarcity of annotated wildfire images limits the development of robust models for these tasks. In this work, we present the FLAME Diffuser, a training-free, diffusion-based framework designed to generate realistic wildfire images with paired ground truth. Our framework uses augmented masks, sampled from real wildfire data, and applies Perlin noise to guide the generation of realistic flames. By controlling the placement of these elements within the image, we ensure precise integration while maintaining the original images style. We evaluate the generated images using normalized Frechet Inception Distance, CLIP Score, and a custom CLIP Confidence metric, demonstrating the high quality and realism of the synthesized wildfire images. Specifically, the fusion of Perlin noise in this work significantly improved the quality of synthesized images. The proposed method is particularly valuable for enhancing datasets used in downstream tasks such as wildfire detection and monitoring.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: In wildfire detection and monitoring tasks, the scarcity of labeled wildfire image data restricts the training and performance improvement of deep - learning models. To solve this problem, the authors propose FLAME Diffuser, a diffusion - model - based framework aimed at generating realistic wildfire images and their corresponding ground truth. By using enhanced masks and Perlin noise to guide the generation of flames, this method can precisely control the position and appearance of flames while maintaining the style of the original image. This not only improves the quality and diversity of synthetic images but also provides rich data - enhancement means for downstream tasks such as wildfire detection and monitoring. ### Main contributions of the paper: 1. **Propose FLAME Diffuser**: This is a diffusion - model framework without training, used to generate wildfire images with ground truth. The framework utilizes style images and enhanced masks to provide precise control over the placement and integration of wildfire elements (such as flames). 2. **Mask - enhancement method**: A method for generating masks with different complexity, geometric shapes and textures is proposed, enhancing the diversity and realism of synthetic wildfire elements. 3. **Automated annotation method**: Use CLIP for automated annotation, showing how to apply CLIP confidence to evaluate the success rate and accuracy of image synthesis. ### Key technologies of the solution: - **Mask generation**: Generate random masks through mathematical algorithms and enhance the dynamics and diversity of masks through image - processing techniques such as color transformation, noise addition and domain distortion. - **Mask - image diffusion**: Merge the enhanced mask with the real image to form a new composite input image. Encode the composite image into a latent representation through a pre - trained variational auto - encoder (VAE), and then denoise it through a pre - trained diffusion model (such as SDv1.5). During the denoise process, the mask and text prompt jointly guide the generation of flames, ensuring that the flames appear in the specified area and are coordinated with the surrounding environment. - **Evaluation metrics**: Use the normalized Fréchet Inception Distance (nFID), CLIP Score and custom - defined CLIP confidence metrics to evaluate the quality and effectiveness of the generated images. ### Experimental results: - **Qualitative evaluation**: Compared with the baseline method, mask - guided image diffusion can more precisely control the position of flames, and the generated flames are seamlessly integrated with the background environment, being more natural and realistic. - **Quantitative evaluation**: Through the comprehensive evaluation of nFID, CLIP Score and CLIP confidence metrics, the Perlin mask method performs best in terms of image quality, diversity and context preservation, and the generated flames have the highest quality and are the most realistic. In conclusion, FLAME Diffuser effectively solves the problem of scarcity of wildfire image data by introducing a mask - guided diffusion model, providing high - quality data - enhancement means for wildfire detection and monitoring tasks.