Abstract:Generative methods now produce outputs nearly indistinguishable from real data but often fail to fully capture the data distribution. Unlike quality issues, diversity limitations in generative models are hard to detect visually, requiring specific metrics for assessment. In this paper, we draw attention to the current lack of diversity in generative models and the inability of common metrics to measure this. We achieve this by framing diversity as an image retrieval problem, where we measure how many real images can be retrieved using synthetic data as queries. This yields the Image Retrieval Score (IRS), an interpretable, hyperparameter-free metric that quantifies the diversity of a generative model's output. IRS requires only a subset of synthetic samples and provides a statistical measure of confidence. Our experiments indicate that current feature extractors commonly used in generative model assessment are inadequate for evaluating diversity effectively. Consequently, we perform an extensive search for the best feature extractors to assess diversity. Evaluation reveals that current diffusion models converge to limited subsets of the real distribution, with no current state-of-the-art models superpassing 77% of the diversity of the training data. To address this limitation, we introduce Diversity-Aware Diffusion Models (DiADM), a novel approach that improves diversity of unconditional diffusion models without loss of image quality. We do this by disentangling diversity from image quality by using a diversity aware module that uses pseudo-unconditional features as input. We provide a Python package offering unified feature extraction and metric computation to further facilitate the evaluation of generative models <a class="link-external link-https" href="https://github.com/MischaD/beyondfid" rel="external noopener nofollow">this https URL</a>.

On the Relation Between Quality-Diversity Evaluation and Distribution-Fitting Goal in Text Generation

Exploring the Pareto-Optimality between Quality and Diversity in Text Generation

Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores

Distribution Aware Metrics for Conditional Natural Language Generation

Unifying Human and Statistical Evaluation for Natural Language Generation

Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation

The Detection of Distributional Discrepancy for Text Generation

Differentiated Distribution Recovery for Neural Text Generation

Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization

Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization

Open-Domain Text Evaluation via Contrastive Distribution Methods

Diversifying Question Generation over Knowledge Base via External Natural Questions

Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References

Beyond Scale: The Diversity Coefficient as a Data Quality Metric for Variability in Natural Language Data

DP-GAN: Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text

Image Generation Diversity Issues and How to Tame Them

Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework

Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization

Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity.

GRUEN for Evaluating Linguistic Quality of Generated Text

Measuring and Improving Semantic Diversity of Dialogue Generation