HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

Jinbin Bai,Wei Chow,Ling Yang,Xiangtai Li,Juncheng Li,Hanwang Zhang,Shuicheng Yan
2024-12-06
Abstract:We present HumanEdit, a high-quality, human-rewarded dataset specifically designed for instruction-guided image editing, enabling precise and diverse image manipulations through open-form language instructions. Previous large-scale editing datasets often incorporate minimal human feedback, leading to challenges in aligning datasets with human preferences. HumanEdit bridges this gap by employing human annotators to construct data pairs and administrators to provide feedback. With meticulously curation, HumanEdit comprises 5,751 images and requires more than 2,500 hours of human effort across four stages, ensuring both accuracy and reliability for a wide range of image editing tasks. The dataset includes six distinct types of editing instructions: Action, Add, Counting, Relation, Remove, and Replace, encompassing a broad spectrum of real-world scenarios. All images in the dataset are accompanied by masks, and for a subset of the data, we ensure that the instructions are sufficiently detailed to support mask-free editing. Furthermore, HumanEdit offers comprehensive diversity and high-resolution $1024 \times 1024$ content sourced from various domains, setting a new versatile benchmark for instructional image editing datasets. With the aim of advancing future research and establishing evaluation benchmarks in the field of image editing, we release HumanEdit at \url{<a class="link-external link-https" href="https://huggingface.co/datasets/BryanW/HumanEdit" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in image - editing tasks, the existing large - scale editing datasets often contain less human feedback, making it difficult for these datasets to be aligned with human preferences. Specifically, the paper points out that the existing datasets often cannot well reflect the actual user requirements in the expression of editing instructions and the selection of mask areas, resulting in the edited output often having flaws or being inconsistent with human performance (for example, body distortion). The root cause of these problems lies in that the distribution of training data is usually noisy and does not match the editing instructions of real - world users. To meet this challenge, the paper introduces **HumanEdit**, which is a high - quality, human - rewarded, instruction - guided image - editing dataset. Through multiple rounds of quality control, HumanEdit is superior to the existing datasets in terms of data accuracy, diversity, high - resolution image sources, and support for masked and unmasked editing. In addition, HumanEdit also constructs data pairs through human annotators and provides feedback by administrators to ensure that the dataset can be better aligned with human preferences, thereby promoting the development of future research and evaluation benchmarks. In summary, the main goal of this paper is to improve the precision and diversity in image - editing tasks, especially for instruction - based image - editing tasks, by introducing a high - quality, high - resolution, diverse dataset that contains detailed human feedback.