TreeReward: Improve Diffusion Model Via Tree-Structured Feedback Learning

Jiacheng Zhang,Jie Wu,Huafeng Kuang,Haiming Zhang,Yuxi Ren,Weifeng Chen,Manlin Zhang,Xuefeng Xiao,Guanbin Li
DOI: https://doi.org/10.1145/3664647.3680610
2024-01-01
Abstract:Recently, there has been significant progress in leveraging human feedback to enhance diffusion-based image generation, garnering considerable interest and attention. However, existing methods fail to achieve a fine-grained performance boost for the following challenges: i) insufficient amount of fine-grained feedback data; ii) lack of effective fine-grained feedback learning framework; To tackle these challenges, we present TreeReward to facilitate the fine-grained feedback optimization for diffusion models. Specifically, to address the limitation of the fine-grained feedback data, we first design a novel "AI + Expert" feedback data construction pipeline, yielding about 2.2M high-quality feedback dataset encompassing six fine-grained dimensions at a relatively low cost. Built upon this dataset, we introduce a tree-structure reward model to exploit the fine-grained feedback data efficiently and provide tailored optimization during feedback learning. We validate the feedback learning performance of our method across different fine-grained dimensions and various downstream tasks. Extensive experiments on both Stable Diffusion v1.5 (SD1.5) and Stable Diffusion XL (SDXL) demonstrate the effectiveness of our method in enhancing the general and fine-grained generation and downstream tasks generalization.
What problem does this paper attempt to address?