ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Sogand Salehi,Mahdi Shafiei,Teresa Yeo,Roman Bachmann,Amir Zamir
2024-07-24
Abstract:Different users find different images generated for the same prompt desirable. This gives rise to personalized image generation which involves creating images aligned with an individual's visual preference. Current generative models are, however, unpersonalized, as they are tuned to produce outputs that appeal to a broad audience. Using them to generate images aligned with individual users relies on iterative manual prompt engineering by the user which is inefficient and undesirable. We propose to personalize the image generation process by first capturing the generic preferences of the user in a one-time process by inviting them to comment on a small selection of images, explaining why they like or dislike each. Based on these comments, we infer a user's structured liked and disliked visual attributes, i.e., their visual preference, using a large language model. These attributes are used to guide a text-to-image model toward producing images that are tuned towards the individual user's visual preference. Through a series of user studies and large language model guided evaluations, we demonstrate that the proposed method results in generations that are well aligned with individual users' visual preferences.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem that current generative models are unable to generate personalized images according to individual users' visual preferences. Specifically: 1. **Personalization requirements**: - Different users have different preferences for the images generated from the same prompt. - Current generative models are usually optimized for a wide audience, so the images they generate do not always match the specific preferences of individual users. 2. **Limitations of existing methods**: - Existing methods rely on users to achieve personalized generation by repeatedly adjusting prompts or providing simple like/dislike feedback, which is inefficient and provides a poor user experience. - Some methods only rely on binary choices (such as like or dislike) or ranking feedback, and these signals may be too simple to fully capture users' complex visual preferences. 3. **Proposed new method**: - The paper proposes ViPer (Visual Personalization of Generative Models via Individual Preference Learning), a method to personalize generative models by learning individual users' visual preferences. - ViPer extracts users' structured visual preferences by allowing users to make free - form comments on a diverse set of images and uses these preferences to guide the generative model to generate images that are more in line with users' preferences. ### Specific solutions - **Capturing user preferences**: - Users comment on a set of images, explaining why they like or dislike these images. - Use a large - language model (such as IDEFICS2 - 8b) to convert these free - form comments into structured visual preference attributes. - **Personalizing the generative model**: - Encode and embed users' visual preferences into the prompts of the generative model, thereby guiding the generative model to generate images that are in line with users' preferences. - By adjusting the parameter β, users can control the degree of personalization. - **Evaluation methods**: - Evaluate the alignment between the generated images and users' preferences through user studies and proxy metrics. - The experimental results show that the images generated by ViPer can better meet users' personalized needs than other baseline methods. In conclusion, this paper solves the deficiencies of existing generative models in personalized image generation by introducing the ViPer method, improving user satisfaction and the quality of generated images.