Abstract:Social reward as a form of community recognition provides a strong source of motivation for users of online platforms to engage and contribute with content. The recent progress of text-conditioned image synthesis has ushered in a collaborative era where AI empowers users to craft original visual artworks seeking community validation. Nevertheless, assessing these models in the context of collective community preference introduces distinct challenges. Existing evaluation methods predominantly center on limited size user studies guided by image quality and prompt alignment. This work pioneers a paradigm shift, unveiling Social Reward - an innovative reward modeling framework that leverages implicit feedback from social network users engaged in creative editing of generated images. We embark on an extensive journey of dataset curation and refinement, drawing from Picsart: an online visual creation and editing platform, yielding a first million-user-scale dataset of implicit human preferences for user-generated visual art named Picsart Image-Social. Our analysis exposes the shortcomings of current metrics in modeling community creative preference of text-to-image models' outputs, compelling us to introduce a novel predictive model explicitly tailored to address these limitations. Rigorous quantitative experiments and user study show that our Social Reward model aligns better with social popularity than existing metrics. Furthermore, we utilize Social Reward to fine-tune text-to-image models, yielding images that are more favored by not only Social Reward, but also other established metrics. These findings highlight the relevance and effectiveness of Social Reward in assessing community appreciation for AI-generated artworks, establishing a closer alignment with users' creative goals: creating popular visual art. Codes can be accessed at

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community

Class-Conditional self-reward mechanism for improved Text-to-Image models

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Stable Preference: Redefining Training Paradigm of Human Preference Model for Text-to-Image Synthesis

Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Human Preference Score: Better Aligning Text-to-Image Models with Human Preference

Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

DreamReward: Text-to-3D Generation with Human Preference

Reward Incremental Learning in Text-to-Image Generation

Boosting Text-to-Image Diffusion Models with Fine-Grained Semantic Rewards

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation

Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models

Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis

Rich Human Feedback for Text-to-Image Generation

Optimizing Prompts for Text-to-Image Generation

Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences

EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation