EmoEdit: Evoking Emotions through Image Manipulation

Jingyuan Yang,Jiawei Feng,Weibin Luo,Dani Lischinski,Daniel Cohen-Or,Hui Huang
2024-05-21
Abstract:Affective Image Manipulation (AIM) seeks to modify user-provided images to evoke specific emotional responses. This task is inherently complex due to its twofold objective: significantly evoking the intended emotion, while preserving the original image composition. Existing AIM methods primarily adjust color and style, often failing to elicit precise and profound emotional shifts. Drawing on psychological insights, we extend AIM by incorporating content modifications to enhance emotional impact. We introduce EmoEdit, a novel two-stage framework comprising emotion attribution and image editing. In the emotion attribution stage, we leverage a Vision-Language Model (VLM) to create hierarchies of semantic factors that represent abstract emotions. In the image editing stage, the VLM identifies the most relevant factors for the provided image, and guides a generative editing model to perform affective modifications. A ranking technique that we developed selects the best edit, balancing between emotion fidelity and structure integrity. To validate EmoEdit, we assembled a dataset of 416 images, categorized into positive, negative, and neutral classes. Our method is evaluated both qualitatively and quantitatively, demonstrating superior performance compared to existing state-of-the-art techniques. Additionally, we showcase EmoEdit's potential in various manipulation tasks, including emotion-oriented and semantics-oriented editing.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the problem of how to evoke specific emotional responses by modifying user-provided input images in the task of Affective Image Manipulation (AIM), while maintaining the original structural integrity of the images. Specifically, existing methods either perform poorly in conveying the target emotion (such as color and style adjustments) or fail to precisely balance the conflict between emotional expression and structural fidelity. The paper proposes a new framework called EmoEdit, which aims to effectively solve this problem through a combination of content and color adjustments. EmoEdit achieves this goal in two stages: first, emotional attribution is performed, using a Vision-Language Model (VLM) to create a hierarchical structure of semantic factors representing abstract emotions; then, in the image editing stage, the VLM identifies the most relevant factors and guides a generative editing model to make emotional adjustments. Additionally, the paper develops a ranking technique to select the best editing results, balancing emotional fidelity and structural integrity. Through this method, EmoEdit can effectively evoke specific emotional responses from viewers without compromising the original structure of the image.