Abstract:Abstract We present two new metrics for evaluating generative models in the class-conditional image generation setting. These metrics are obtained by generalizing the two most popular unconditional metrics: the Inception Score (IS) and the Fréchet Inception Distance (FID). A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts. The link takes the form of a product in the case of IS or an upper bound in the FID case. We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models, thus providing additional insights about their performance, from unlearned classes to mode collapse.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the problem of evaluating the performance of generative models in conditional image generation. Specifically, existing evaluation metrics such as Inception Score (IS) and Fréchet Inception Distance (FID) are mainly applicable to unconditional image generation models, and they cannot fully take into account whether the generated images meet the requirements of the specified conditions when evaluating conditional image generation models. In addition, although the evaluation method based on classification accuracy can measure the category accuracy of the generated images, it ignores the quality and diversity of the images. To overcome these limitations, the authors propose two new evaluation metrics: **Conditional Inception Score (CIS)** and **Conditional Fréchet Inception Distance (CFID)**. These two metrics respectively extend IS and FID to adapt to the scenario of conditional image generation. ### Specific problem description: 1. **Limitations of existing evaluation metrics**: - **Unconditional evaluation metrics**: IS and FID cannot distinguish whether the generated images meet the specified conditions when evaluating conditional generative models. - **Evaluation method based on classification accuracy**: Although it can evaluate the category accuracy of the generated images, it ignores the quality and diversity of the images. 2. **Need for new evaluation metrics**: - **Conditional Inception Score (CIS)**: It is used to evaluate the quality and diversity of the generated images and consider whether the generated images meet the specified conditions at the same time. - **Conditional Fréchet Inception Distance (CFID)**: It is used to evaluate the distribution difference between the generated images and the real images and consider whether the generated images meet the specified conditions at the same time. ### Solution: 1. **Conditional Inception Score (CIS)**: - **Between - Class Component (BCIS)**: It measures the distinguishability between generated images of different classes. - **Within - Class Component (WCIS)**: It measures the quality and diversity of generated images within the same class. 2. **Conditional Fréchet Inception Distance (CFID)**: - **Between - Class Component (BCFID)**: It measures the distance between the average category features of the generated images and the average category features of the real images. - **Within - Class Component (WCFID)**: It measures the distribution difference between the generated images and the real images within each category. By introducing these new evaluation metrics, the authors aim to provide a more comprehensive and accurate evaluation method in order to better understand and compare the performance of conditional image generation models.

Evaluation Metrics for Conditional Image Generation

A Study on the Evaluation of Generative Models

ImagenHub: Standardizing the evaluation of conditional image generation models

Statistics Enhancement Generative Adversarial Networks for Diverse Conditional Image Synthesis

Rethinking FID: Towards a Better Evaluation Metric for Image Generation

Establishing an Evaluation Metric to Quantify Climate Change Image Realism

A study of the evaluation metrics for generative images containing combinational creativity

Normalizing Flow-Based Metric for Image Generation

Evaluating Text-to-Image GANs Performance: A Comparative Analysis of Evaluation Metrics

Improving the Evaluation of Generative Models with Fuzzy Logic

Attribute Based Interpretable Evaluation Metrics for Generative Models

An empirical study on evaluation metrics of generative adversarial networks

On Aliased Resizing and Surprising Subtleties in GAN Evaluation

An Optimism-based Approach to Online Evaluation of Generative Models

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

On the Evaluation of Generative Adversarial Networks By Discriminative Models

Distribution Aware Metrics for Conditional Natural Language Generation

Evaluating generative networks using Gaussian mixtures of image features

Compound Frechet Inception Distance for Quality Assessment of GAN Created Images

Revisiting the Evaluation of Image Synthesis with GANs

FLD+: Data-efficient Evaluation Metric for Generative Models