PrefGen: Preference Guided Image Generation with Relative Attributes

Alec Helbling,Christopher J. Rozell,Matthew O'Shaughnessy,Kion Fallah
2023-04-01
Abstract:Deep generative models have the capacity to render high fidelity images of content like human faces. Recently, there has been substantial progress in conditionally generating images with specific quantitative attributes, like the emotion conveyed by one's face. These methods typically require a user to explicitly quantify the desired intensity of a visual attribute. A limitation of this method is that many attributes, like how "angry" a human face looks, are difficult for a user to precisely quantify. However, a user would be able to reliably say which of two faces seems "angrier". Following this premise, we develop the $\textit{PrefGen}$ system, which allows users to control the relative attributes of generated images by presenting them with simple paired comparison queries of the form "do you prefer image $a$ or image $b$?" Using information from a sequence of query responses, we can estimate user preferences over a set of image attributes and perform preference-guided image editing and generation. Furthermore, to make preference localization feasible and efficient, we apply an active query selection strategy. We demonstrate the success of this approach using a StyleGAN2 generator on the task of human face editing. Additionally, we demonstrate how our approach can be combined with CLIP, allowing a user to edit the relative intensity of attributes specified by text prompts. Code at <a class="link-external link-https" href="https://github.com/helblazer811/PrefGen" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve The paper aims to address the issue of how to make it easier for users to control the relative attributes of images (such as the intensity of emotions on a human face) when using deep generative models to generate images. Existing methods typically require users to explicitly quantify the desired intensity of visual attributes, such as how "angry" a face appears. However, many attributes (like the degree of "anger") are difficult for users to quantify precisely. In contrast, users find it easier to judge which of two faces looks more "angry." To overcome this challenge, the authors developed the PrefGen system, which infers user preferences by presenting them with pairwise comparison queries (e.g., "Do you prefer image a or image b?") and generates or edits images based on these preferences. This approach does not require users to explicitly quantify attributes but instead estimates user preferences for a set of relative attributes through a series of pairwise comparison queries. Specifically, the PrefGen system uses a Bayesian framework to infer user preferences from pairwise comparison queries and employs an active query selection strategy to efficiently search the attribute space. Additionally, the authors demonstrate how PrefGen can be combined with existing models such as StyleGAN2 and CLIP to edit the relative intensity of human facial attributes.