Abstract:Different users find different images generated for the same prompt desirable. This gives rise to personalized image generation which involves creating images aligned with an individual's visual preference. Current generative models are, however, unpersonalized, as they are tuned to produce outputs that appeal to a broad audience. Using them to generate images aligned with individual users relies on iterative manual prompt engineering by the user which is inefficient and undesirable. We propose to personalize the image generation process by first capturing the generic preferences of the user in a one-time process by inviting them to comment on a small selection of images, explaining why they like or dislike each. Based on these comments, we infer a user's structured liked and disliked visual attributes, i.e., their visual preference, using a large language model. These attributes are used to guide a text-to-image model toward producing images that are tuned towards the individual user's visual preference. Through a series of user studies and large language model guided evaluations, we demonstrate that the proposed method results in generations that are well aligned with individual users' visual preferences.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem that current generative models are unable to generate personalized images according to individual users' visual preferences. Specifically: 1. **Personalization requirements**: - Different users have different preferences for the images generated from the same prompt. - Current generative models are usually optimized for a wide audience, so the images they generate do not always match the specific preferences of individual users. 2. **Limitations of existing methods**: - Existing methods rely on users to achieve personalized generation by repeatedly adjusting prompts or providing simple like/dislike feedback, which is inefficient and provides a poor user experience. - Some methods only rely on binary choices (such as like or dislike) or ranking feedback, and these signals may be too simple to fully capture users' complex visual preferences. 3. **Proposed new method**: - The paper proposes ViPer (Visual Personalization of Generative Models via Individual Preference Learning), a method to personalize generative models by learning individual users' visual preferences. - ViPer extracts users' structured visual preferences by allowing users to make free - form comments on a diverse set of images and uses these preferences to guide the generative model to generate images that are more in line with users' preferences. ### Specific solutions - **Capturing user preferences**: - Users comment on a set of images, explaining why they like or dislike these images. - Use a large - language model (such as IDEFICS2 - 8b) to convert these free - form comments into structured visual preference attributes. - **Personalizing the generative model**: - Encode and embed users' visual preferences into the prompts of the generative model, thereby guiding the generative model to generate images that are in line with users' preferences. - By adjusting the parameter β, users can control the degree of personalization. - **Evaluation methods**: - Evaluate the alignment between the generated images and users' preferences through user studies and proxy metrics. - The experimental results show that the images generated by ViPer can better meet users' personalized needs than other baseline methods. In conclusion, this paper solves the deficiencies of existing generative models in personalized image generation by introducing the ViPer method, improving user satisfaction and the quality of generated images.

ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Personalized Image Generation with Large Multimodal Models

Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

PrefGen: Preference Guided Image Generation with Relative Attributes

Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example

Imagine yourself: Tuning-Free Personalized Image Generation

Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models

Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models

Imaginique Expressions: Tailoring Personalized Short-Text-to-Image Generation Through Aesthetic Assessment and Human Insights

PALP: Prompt Aligned Personalization of Text-to-Image Models

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

PMG : Personalized Multimodal Generation with Large Language Models

PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond

ViCo: Plug-and-play Visual Condition for Personalized Text-to-image Generation

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

My3DGen: A Scalable Personalized 3D Generative Model

ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models

Personalized Visual Instruction Tuning