Abstract:Although recent years have witnessed significant advancements in image editing thanks to the remarkable progress of text-to-image diffusion models, the problem of non-rigid image editing still presents its complexities and challenges. Existing methods often fail to achieve consistent results due to the absence of unique identity characteristics. Thus, learning a personalized identity prior might help with consistency in the edited results. In this paper, we explore a novel task: learning the personalized identity prior for text-based non-rigid image editing. To address the problems in jointly learning prior and editing the image, we present LIPE, a two-stage framework designed to customize the generative model utilizing a limited set of images of the same subject, and subsequently employ the model with learned prior for non-rigid image editing. Experimental results demonstrate the advantages of our approach in various editing scenarios over past related leading methods in qualitative and quantitative ways.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenge of maintaining identity consistency in non - rigid image editing. Specifically, existing methods often fail to faithfully preserve the identity characteristics of the subject when editing images, especially when performing non - rigid transformations (such as changing postures, expressions or viewpoints), which easily leads to inconsistent editing results. Therefore, the author proposes a new task: improving the consistency of non - rigid image editing results by learning the personalized identity prior. ### Main problem description in the paper 1. **Limitations of existing methods**: - Existing methods usually rely on the general domain prior of large - scale text - to - image (T2I) models. Although these models have strong generation capabilities, they perform poorly in preserving personalized identity characteristics. - These methods mainly rely on less controllable text prompts and are prone to modifying unnecessary image regions. - Although some recent works have attempted to customize personalized face priors, they require a large number of reference images (about 100), and are limited to portrait editing and cannot achieve more extensive non - rigid editing. 2. **Research objectives**: - Given a small number (3 - 5) of reference images of the same identity, can a personalized identity prior be learned to promote the non - rigid editing of test images while maintaining the unique properties of the identity? - Propose a new framework that can achieve high - quality non - rigid image editing while maintaining identity characteristics. ### Overview of the solution To solve the above problems, the author proposes a two - stage framework named LIPE (Learning personalized Identity Prior for non - rigid image Editing): 1. **Learning of personalized identity prior**: - Use a limited number of reference images to fine - tune the pre - trained T2I model to learn the personalized identity prior. - Generate detailed text - image pairs through data augmentation techniques to improve the model's understanding and generation ability of non - rigid attributes. 2. **Non - rigid image editing**: - Utilize the Identity - aware mask blend (NIMA) technique to precisely control the target object during the editing process and avoid changes in the background and other irrelevant attributes. ### Main contributions - **Introduce a new task**: Non - rigid image editing of personalized identity prior. - **Propose a new method**: The LIPE framework, which effectively solves the technical problems of personalized identity prior learning and non - rigid editing. - **Establish a new dataset**: A dataset specifically designed for this task, covering multiple categories of objects, for evaluating model performance. Through experimental verification, LIPE is significantly superior to existing methods in terms of maintaining identity consistency, background consistency and editing satisfaction.

LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing

"My Face, My Rules": Enabling Personalized Protection Against Unacceptable Face Editing.

FaceChain: A Playground for Identity-Preserving Portrait Generation

DiffusionRig: Learning Personalized Priors for Facial Appearance Editing

StableIdentity: Inserting Anybody into Anywhere at First Sight

DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation

Prior Preserved Text-to-Image Personalization Without Image Regularization

Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention

Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance

Disentangled face editing via individual walk in personalized facial semantic field

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models

LCM-Lookahead for Encoder-based Text-to-Image Personalization

Learning Feature-Preserving Portrait Editing from Generated Pairs

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

Identity-Aware and Shape-Aware Propagation of Face Editing in Videos

Face editing based on facial recognition features

PIE: Portrait Image Embedding for Semantic Control

Zero-shot Text-driven Physically Interpretable Face Editing

DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding