UnZipLoRA: Separating Content and Style from a Single Image

Chang Liu,Viraj Shah,Aiyu Cui,Svetlana Lazebnik
2024-12-06
Abstract:This paper introduces UnZipLoRA, a method for decomposing an image into its constituent subject and style, represented as two distinct LoRAs (Low-Rank Adaptations). Unlike existing personalization techniques that focus on either subject or style in isolation, or require separate training sets for each, UnZipLoRA disentangles these elements from a single image by training both the LoRAs simultaneously. UnZipLoRA ensures that the resulting LoRAs are compatible, i.e., they can be seamlessly combined using direct addition. UnZipLoRA enables independent manipulation and recontextualization of subject and style, including generating variations of each, applying the extracted style to new subjects, and recombining them to reconstruct the original image or create novel variations. To address the challenge of subject and style entanglement, UnZipLoRA employs a novel prompt separation technique, as well as column and block separation strategies to accurately preserve the characteristics of subject and style, and ensure compatibility between the learned LoRAs. Evaluation with human studies and quantitative metrics demonstrates UnZipLoRA's effectiveness compared to other state-of-the-art methods, including DreamBooth-LoRA, Inspiration Tree, and B-LoRA.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of separating content (or subject) and style from a single image. Specifically, the author proposes a method named **UnZipLoRA**, which can decompose an image into two independent low - rank adaptation (LoRA) models: one for representing the content (or subject) of the image, and the other for representing the style of the image. These two LoRA models can be simultaneously learned during the training process, and can independently generate new images, or be recombined to create new variants of the original image. #### Main challenges: 1. **Multi - task learning under single - image supervision**: Traditional personalization techniques usually focus on one aspect of content or style, or require separate training sets to learn content and style separately. However, the goal of UnZipLoRA is to simultaneously learn content and style from a single image. 2. **Disentanglement problem**: How to ensure that the two LoRA models do not interfere with each other during the training process, so as to accurately capture the concepts of content and style. 3. **Compatibility**: Ensure that the learned content LoRA and style LoRA can be seamlessly combined, so that high - quality images can be generated by direct addition during inference. #### Solutions: To address these challenges, UnZipLoRA introduces the following key techniques: 1. **Prompt Separation**: Use different prompts to train the content LoRA and style LoRA respectively, avoiding cross - contamination. 2. **Column Separation**: By dynamically allocating columns in the weight matrix, ensure the orthogonality between the content and style LoRA, reducing interference. 3. **Block Separation**: According to the sensitivity of different blocks of U - Net to content and style, adjust the training strategies of content LoRA and style LoRA respectively, further improving the accuracy of details. Through these methods, UnZipLoRA can successfully disentangle content and style on a single image and generate high - quality image variants. Experimental results show that UnZipLoRA outperforms other state - of - the - art methods, such as DreamBooth - LoRA, Inspiration Tree and B - LoRA, in both human studies and quantitative evaluations. ### Summary The core contribution of UnZipLoRA lies in that it provides a novel and effective method that can separate content and style from a single image, and can flexibly manipulate and recombine these elements, thus providing new possibilities for artistic creation and personalized image generation.