FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing

Gwanhyeong Koo,Sunjae Yoon,Ji Woo Hong,Chang D. Yoo

2024-07-25

Abstract:Current image editing methods primarily utilize DDIM Inversion, employing a two-branch diffusion approach to preserve the attributes and layout of the original image. However, these methods encounter challenges with non-rigid edits, which involve altering the image's layout or structure. Our comprehensive analysis reveals that the high-frequency components of DDIM latent, crucial for retaining the original image's key features and layout, significantly contribute to these limitations. Addressing this, we introduce FlexiEdit, which enhances fidelity to input text prompts by refining DDIM latent, by reducing high-frequency components in targeted editing areas. FlexiEdit comprises two key components: (1) Latent Refinement, which modifies DDIM latent to better accommodate layout adjustments, and (2) Edit Fidelity Enhancement via Re-inversion, aimed at ensuring the edits more accurately reflect the input text prompts. Our approach represents notable progress in image editing, particularly in performing complex non-rigid edits, showcasing its enhanced capability through comparative experiments.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges encountered in non - rigid editing (such as pose changes, perspective changes, etc.) in image editing. Existing image editing methods mainly rely on the DDIM inversion technique. This method is difficult to flexibly modify the structure or layout of the image while maintaining the original image properties and layout, especially performing poorly when dealing with non - rigid editing tasks. Specifically, the paper points out that high - frequency components retain the key features and layout information of the original image in the DDIM latent space, which limits the ability to edit the image more flexibly. To overcome these limitations, the paper proposes FlexiEdit, a new image editing method. By reducing the high - frequency components in the target editing area and adding Gaussian noise to improve the DDIM latent space, it improves the flexibility of non - rigid editing while maintaining the key properties of the image. The main contributions of FlexiEdit are: 1. **Latent Refinement**: By reducing the high - frequency components in the target editing area, the editing area can be more easily adapted to layout adjustments. 2. **Edit Fidelity Enhancement via Re - inversion**: Through the re - inversion process, it ensures that the editing results more accurately reflect the input text prompts while retaining the properties of the original objects. Through these techniques, FlexiEdit performs well in non - rigid editing tasks and can change the image layout more naturally while maintaining a high degree of consistency with the input text prompts.

FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing

FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing

Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing

DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing

Inversion-Free Image Editing with Natural Language

FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models

PFB-Diff: Progressive Feature Blending diffusion for text-driven image editing

Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing

High-Fidelity Diffusion-based Image Editing

ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models

E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning

An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control

FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models