Abstract:In e-commerce platforms, visual content plays a pivotal role in capturing and retaining audience attention. A high-quality and aesthetically designed product background image can quickly grab consumers' attention, and increase their confidence in taking actions, such as making a purchase. Recently, diffusion models have achieved profound advancements, rendering product background generation a promising avenue for exploration. However, text-guided diffusion models require meticulously crafted prompts. The diverse range of products makes it challenging to compose prompts that result in visually appealing and semantically appropriate background scenes. Current work has made great efforts on creating prompts through expert-crafted rules or specialized fine-tuning of large language models, but it still relies on detailed human inputs and often falls short in generating desirable results by e-commerce standards. In this paper, we propose Product2Img, a novel prompt-free diffusion model with automatic training data refinement strategy for product background generation. Product2Img employs Contrastive Background Alignment (CBA) for the text encoder to enhance the relevant background perception ability in the diffusion generation process, without the need for specific background prompts. Meanwhile, we develope the Iterative Data Refinement with Self-improved Large Multimodal Model (IDR-LMM), a framework that iteratively enhances the data selection capability of LMM for diffusion model training, thereby yielding continuous performance improvements. Furthermore, we establish an E-commerce Product Background Dataset (EPBD) for the research in this paper and future work. Experimental results indicate that our approach significantly outperforms current prevalent methods in terms of automatic metrics and human evaluation, yielding improved background aesthetics and relevance.

Product2IMG: Prompt-Free E-commerce Product Background Generation with Diffusion Model and Self-Improved LMM

Generate E-commerce Product Background by Integrating Category Commonality and Personalized Style

Prompt-Free Diffusion: Taking "text" out of Text-to-Image Diffusion Models

A contrast-composition-distraction framework to understand product photo background's impact on consumer interest in E-commerce

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

VirtualModel: Generating Object-ID-retentive Human-object Interaction Image by Diffusion Model for E-commerce Marketing

E-Commerce Inpainting with Mask Guidance in Controlnet for Reducing Overcompletion

Dynamic Prompt Optimizing for Text-to-Image Generation

Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts

In-Context Learning Unlocked for Diffusion Models

Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation

Planning and Rendering: Towards Product Poster Generation with Diffusion Models

DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models

Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Contextualized Diffusion Models for Text-Guided Image and Video Generation

Saliency Guided Optimization of Diffusion Latents

BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

A Simple Background Augmentation Method for Object Detection with Diffusion Model

Automated Virtual Product Placement and Assessment in Images using Diffusion Models