EmoEdit: Evoking Emotions through Image Manipulation

Jingyuan Yang,Jiawei Feng,Weibin Luo,Dani Lischinski,Daniel Cohen-Or,Hui Huang

2024-05-21

Abstract:Affective Image Manipulation (AIM) seeks to modify user-provided images to evoke specific emotional responses. This task is inherently complex due to its twofold objective: significantly evoking the intended emotion, while preserving the original image composition. Existing AIM methods primarily adjust color and style, often failing to elicit precise and profound emotional shifts. Drawing on psychological insights, we extend AIM by incorporating content modifications to enhance emotional impact. We introduce EmoEdit, a novel two-stage framework comprising emotion attribution and image editing. In the emotion attribution stage, we leverage a Vision-Language Model (VLM) to create hierarchies of semantic factors that represent abstract emotions. In the image editing stage, the VLM identifies the most relevant factors for the provided image, and guides a generative editing model to perform affective modifications. A ranking technique that we developed selects the best edit, balancing between emotion fidelity and structure integrity. To validate EmoEdit, we assembled a dataset of 416 images, categorized into positive, negative, and neutral classes. Our method is evaluated both qualitatively and quantitatively, demonstrating superior performance compared to existing state-of-the-art techniques. Additionally, we showcase EmoEdit's potential in various manipulation tasks, including emotion-oriented and semantics-oriented editing.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address the problem of how to evoke specific emotional responses by modifying user-provided input images in the task of Affective Image Manipulation (AIM), while maintaining the original structural integrity of the images. Specifically, existing methods either perform poorly in conveying the target emotion (such as color and style adjustments) or fail to precisely balance the conflict between emotional expression and structural fidelity. The paper proposes a new framework called EmoEdit, which aims to effectively solve this problem through a combination of content and color adjustments. EmoEdit achieves this goal in two stages: first, emotional attribution is performed, using a Vision-Language Model (VLM) to create a hierarchical structure of semantic factors representing abstract emotions; then, in the image editing stage, the VLM identifies the most relevant factors and guides a generative editing model to make emotional adjustments. Additionally, the paper develops a ranking technique to select the best editing results, balancing emotional fidelity and structural integrity. Through this method, EmoEdit can effectively evoke specific emotional responses from viewers without compromising the original structure of the image.

EmoEdit: Evoking Emotions through Image Manipulation

EmoPlayer: A Media Player for Video Clips with Affective Annotations.

Make Me Happier: Evoking Emotions Through Image Diffusion Models

EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

Affective image adjustment with a single word

EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

A generic framework for editing and synthesizing multimodal data with relative emotion strength

EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes

Language-guided Multi-modal Emotional Mimicry Intensity Estimation

Emotion Selectable End-to-End Text-based Speech Editing

EmoLLM: Multimodal Emotional Understanding Meets Large Language Models

EmoFace: Audio-driven Emotional 3D Face Animation

EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion

Research on Image-text Multimodal Emotions Analysis with Fused Emoji

Affective Image Filter: Reflecting Emotions from Text to Images

Emotional Video Captioning With Vision-Based Emotion Interpretation Network

MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis

Emotional attention: From eye tracking to computational modeling

Emotional Attention: A Study of Image Sentiment and Visual Attention

Learning emotional prompt features with multiple views for visual emotion analysis

Emotional Images: Assessing Emotions in Images and Potential Biases in Generative Models