DreamBooth++: Boosting Subject-Driven Generation Via Region-Level References Packing

Zhongyi Fan,Zixin Yin,Gang Li,Yibing Zhan,Heliang Zheng
DOI: https://doi.org/10.1145/3664647.3680734
2024-01-01
Abstract:DreamBooth has demonstrated significant potential in subject-driven text-to-image generation, especially in scenarios requiring precise preservation of a subject's appearance. However, it still suffers from inefficiency and requires extensive iterative training to customize concepts using a small set of reference images. To address these issues, we introduce DreamBooth++, a region-level training strategy designed to significantly improve the efficiency and effectiveness of learning specific subjects. In particular, our approach employs a region-level data re-formulation technique that packs a set of reference images into a single sample, significantly reducing computational costs. Moreover, we adapt convolution and self-attention layers to ensure their processings are restricted within individual regions. Thus their operational scope (i.e., receptive field) can be preserved within a single subject, avoiding generating multiple sub-images within a single image. Last but not least, we design a text-guided prior regularization between our model and the pretrained one to preserve the original semantic generation ability. Comprehensive experiments demonstrate that our training strategy not only accelerates the subject-learning process but also significantly boosts fidelity to both subject and prompts in subject-driven generation.
What problem does this paper attempt to address?