Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner

Pengxiang Cai,Zhiwei Liu,Guibo Zhu,Yunfang Niu,Jinqiao Wang

2024-07-26

Abstract:Pixel-level fine-grained image editing remains an open challenge. Previous works fail to achieve an ideal trade-off between control granularity and inference speed. They either fail to achieve pixel-level fine-grained control, or their inference speed requires optimization. To address this, this paper for the first time employs a regression-based network to learn the variation patterns of StyleGAN latent codes during the image dragging process. This method enables pixel-level precision in dragging editing with little time cost. Users can specify handle points and their corresponding target points on any GAN-generated images, and our method will move each handle point to its corresponding target point. Through experimental analysis, we discover that a short movement distance from handle points to target points yields a high-fidelity edited image, as the model only needs to predict the movement of a small portion of pixels. To achieve this, we decompose the entire movement process into multiple sub-processes. Specifically, we develop a transformer encoder-decoder based network named 'Latent Predictor' to predict the latent code motion trajectories from handle points to target points in an autoregressive manner. Moreover, to enhance the prediction stability, we introduce a component named 'Latent Regularizer', aimed at constraining the latent code motion within the distribution of natural images. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) inference speed and image editing performance at the pixel-level granularity.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The main goal of this paper is to address the conflict between control precision and inference speed in the field of image editing, especially in terms of fine control at the pixel level. Specifically, the researchers developed a new method called "Auto DragGAN," which aims to achieve image editing based on Generative Adversarial Networks (GANs) that can maintain high efficiency while achieving pixel-level precise control. The key contributions of the paper include: 1. Proposing a regression-based network structure to learn the variation patterns on the generative image manifold, thereby achieving pixel-level precise editing during image dragging while reducing computational costs. 2. Transforming the image dragging problem into a regression problem of a latent code motion sequence and designing two components: the "Latent Predictor" and the "Latent Regularizer." The former is used to predict the latent code motion trajectory from the drag point to the target point, while the latter ensures that the changes in the latent code remain within a reasonable range of the natural image distribution. 3. Experimental results show that this method not only achieves pixel-level precise control but also reaches optimal levels in inference speed, surpassing existing technologies. In summary, this research introduces a novel method to solve the balance problem between control precision and processing speed in existing image editing technologies, providing a new solution for achieving high-quality and efficient image editing.

Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Style Fader Generative Adversarial Networks for Style Degree Controllable Artistic Style Transfer

Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation

StableDrag: Stable Dragging for Point-based Image Editing

Spatial Steerability of GANs via Self-Supervision from Discriminator

EditGAN: High-Precision Semantic Image Editing

FastDrag: Manipulate Anything in One Step

AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing

ReGANIE: Rectifying GAN Inversion Errors for Accurate Real Image Editing

LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos

Delta-GAN-Encoder: Encoding Semantic Changes for Explicit Image Editing, using Few Synthetic Samples

Self-Conditioned Generative Adversarial Networks for Image Editing

Self-Conditioned GANs for Image Editing

User‐Controllable Latent Transformer for StyleGAN Image Layout Editing

DragVideo: Interactive Drag-style Video Editing

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

Predicting the Invariance Behind Residuals: A Novel GAN Inversion Method for Image Editing and Detail Retaining

Lightweight Facial Attribute Editing with Separable Latent Vector

GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models

Designing an encoder for StyleGAN image manipulation