Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner

Pengxiang Cai,Zhiwei Liu,Guibo Zhu,Yunfang Niu,Jinqiao Wang
2024-07-26
Abstract:Pixel-level fine-grained image editing remains an open challenge. Previous works fail to achieve an ideal trade-off between control granularity and inference speed. They either fail to achieve pixel-level fine-grained control, or their inference speed requires optimization. To address this, this paper for the first time employs a regression-based network to learn the variation patterns of StyleGAN latent codes during the image dragging process. This method enables pixel-level precision in dragging editing with little time cost. Users can specify handle points and their corresponding target points on any GAN-generated images, and our method will move each handle point to its corresponding target point. Through experimental analysis, we discover that a short movement distance from handle points to target points yields a high-fidelity edited image, as the model only needs to predict the movement of a small portion of pixels. To achieve this, we decompose the entire movement process into multiple sub-processes. Specifically, we develop a transformer encoder-decoder based network named 'Latent Predictor' to predict the latent code motion trajectories from handle points to target points in an autoregressive manner. Moreover, to enhance the prediction stability, we introduce a component named 'Latent Regularizer', aimed at constraining the latent code motion within the distribution of natural images. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) inference speed and image editing performance at the pixel-level granularity.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main goal of this paper is to address the conflict between control precision and inference speed in the field of image editing, especially in terms of fine control at the pixel level. Specifically, the researchers developed a new method called "Auto DragGAN," which aims to achieve image editing based on Generative Adversarial Networks (GANs) that can maintain high efficiency while achieving pixel-level precise control. The key contributions of the paper include: 1. Proposing a regression-based network structure to learn the variation patterns on the generative image manifold, thereby achieving pixel-level precise editing during image dragging while reducing computational costs. 2. Transforming the image dragging problem into a regression problem of a latent code motion sequence and designing two components: the "Latent Predictor" and the "Latent Regularizer." The former is used to predict the latent code motion trajectory from the drag point to the target point, while the latter ensures that the changes in the latent code remain within a reasonable range of the natural image distribution. 3. Experimental results show that this method not only achieves pixel-level precise control but also reaches optimal levels in inference speed, surpassing existing technologies. In summary, this research introduces a novel method to solve the balance problem between control precision and processing speed in existing image editing technologies, providing a new solution for achieving high-quality and efficient image editing.