AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

Qifan Yu,Wei Chow,Zhongqi Yue,Kaihang Pan,Yang Wu,Xiaoyang Wan,Juncheng Li,Siliang Tang,Hanwang Zhang,Yueting Zhuang
2024-11-24
Abstract:Instruction-based image editing aims to modify specific image elements with natural language instructions. However, current models in this domain often struggle to accurately execute complex user instructions, as they are trained on low-quality data with limited editing types. We present AnyEdit, a comprehensive multi-modal instruction editing dataset, comprising 2.5 million high-quality editing pairs spanning over 20 editing types and five domains. We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results. Using the dataset, we further train a novel AnyEdit Stable Diffusion with task-aware routing and learnable task embedding for unified image editing. Comprehensive experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models. This presents prospects for developing instruction-driven image editing models that support human creativity.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of accuracy in current instruction - based image - editing models when executing complex user instructions, as well as the lack of high - quality datasets to support diverse editing tasks. Specifically, existing models are difficult to accurately execute complex user instructions due to the low quality of training data and limited editing types. To this end, the paper proposes AnyEdit, a multimodal instruction - editing dataset containing 2.5 million high - quality editing pairs, covering more than 20 editing types and five different domains. Through this dataset, the researchers further trained a new AnyEdit Stable Diffusion model, which has task - aware routing and learnable task embeddings and can uniformly handle various image - editing tasks. The paper shows through comprehensive experiments on three benchmark datasets that AnyEdit significantly improves the editing performance of diffusion models, demonstrating the prospects for developing instruction - driven image - editing models that support human creativity.