HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

Jinbin Bai,Wei Chow,Ling Yang,Xiangtai Li,Juncheng Li,Hanwang Zhang,Shuicheng Yan

2024-12-06

Abstract:We present HumanEdit, a high-quality, human-rewarded dataset specifically designed for instruction-guided image editing, enabling precise and diverse image manipulations through open-form language instructions. Previous large-scale editing datasets often incorporate minimal human feedback, leading to challenges in aligning datasets with human preferences. HumanEdit bridges this gap by employing human annotators to construct data pairs and administrators to provide feedback. With meticulously curation, HumanEdit comprises 5,751 images and requires more than 2,500 hours of human effort across four stages, ensuring both accuracy and reliability for a wide range of image editing tasks. The dataset includes six distinct types of editing instructions: Action, Add, Counting, Relation, Remove, and Replace, encompassing a broad spectrum of real-world scenarios. All images in the dataset are accompanied by masks, and for a subset of the data, we ensure that the instructions are sufficiently detailed to support mask-free editing. Furthermore, HumanEdit offers comprehensive diversity and high-resolution $1024 \times 1024$ content sourced from various domains, setting a new versatile benchmark for instructional image editing datasets. With the aim of advancing future research and establishing evaluation benchmarks in the field of image editing, we release HumanEdit at \url{<a class="link-external link-https" href="https://huggingface.co/datasets/BryanW/HumanEdit" rel="external noopener nofollow">this https URL</a>}.

Computer Vision and Pattern Recognition,Graphics

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in image - editing tasks, the existing large - scale editing datasets often contain less human feedback, making it difficult for these datasets to be aligned with human preferences. Specifically, the paper points out that the existing datasets often cannot well reflect the actual user requirements in the expression of editing instructions and the selection of mask areas, resulting in the edited output often having flaws or being inconsistent with human performance (for example, body distortion). The root cause of these problems lies in that the distribution of training data is usually noisy and does not match the editing instructions of real - world users. To meet this challenge, the paper introduces **HumanEdit**, which is a high - quality, human - rewarded, instruction - guided image - editing dataset. Through multiple rounds of quality control, HumanEdit is superior to the existing datasets in terms of data accuracy, diversity, high - resolution image sources, and support for masked and unmasked editing. In addition, HumanEdit also constructs data pairs through human annotators and provides feedback by administrators to ensure that the dataset can be better aligned with human preferences, thereby promoting the development of future research and evaluation benchmarks. In summary, the main goal of this paper is to improve the precision and diversity in image - editing tasks, especially for instruction - based image - editing tasks, by introducing a high - quality, high - resolution, diverse dataset that contains detailed human feedback.

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

InsightEdit: Towards Better Instruction Following for Image Editing

UniHuman: A Unified Model for Editing Human Images in the Wild

EditWorld: Simulating World Dynamics for Instruction-Following Image Editing

I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing

Multi-Reward as Condition for Instruction-based Image Editing

ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

InstructGIE: Towards Generalizable Image Editing

StyleBooth: Image Style Editing with Multimodal Instruction

Comprehensive Dataset of Face Manipulations for Development and Evaluation of Forensic Tools

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations