Evaluation Metrics for Conditional Image Generation

Yaniv Benny,Tomer Galanti,Sagie Benaim,Lior Wolf
DOI: https://doi.org/10.1007/s11263-020-01424-w
IF: 13.369
2021-03-02
International Journal of Computer Vision
Abstract:Abstract We present two new metrics for evaluating generative models in the class-conditional image generation setting. These metrics are obtained by generalizing the two most popular unconditional metrics: the Inception Score (IS) and the Fréchet Inception Distance (FID). A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts. The link takes the form of a product in the case of IS or an upper bound in the FID case. We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models, thus providing additional insights about their performance, from unlearned classes to mode collapse.
computer science, artificial intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the problem of evaluating the performance of generative models in conditional image generation. Specifically, existing evaluation metrics such as Inception Score (IS) and Fréchet Inception Distance (FID) are mainly applicable to unconditional image generation models, and they cannot fully take into account whether the generated images meet the requirements of the specified conditions when evaluating conditional image generation models. In addition, although the evaluation method based on classification accuracy can measure the category accuracy of the generated images, it ignores the quality and diversity of the images. To overcome these limitations, the authors propose two new evaluation metrics: **Conditional Inception Score (CIS)** and **Conditional Fréchet Inception Distance (CFID)**. These two metrics respectively extend IS and FID to adapt to the scenario of conditional image generation. ### Specific problem description: 1. **Limitations of existing evaluation metrics**: - **Unconditional evaluation metrics**: IS and FID cannot distinguish whether the generated images meet the specified conditions when evaluating conditional generative models. - **Evaluation method based on classification accuracy**: Although it can evaluate the category accuracy of the generated images, it ignores the quality and diversity of the images. 2. **Need for new evaluation metrics**: - **Conditional Inception Score (CIS)**: It is used to evaluate the quality and diversity of the generated images and consider whether the generated images meet the specified conditions at the same time. - **Conditional Fréchet Inception Distance (CFID)**: It is used to evaluate the distribution difference between the generated images and the real images and consider whether the generated images meet the specified conditions at the same time. ### Solution: 1. **Conditional Inception Score (CIS)**: - **Between - Class Component (BCIS)**: It measures the distinguishability between generated images of different classes. - **Within - Class Component (WCIS)**: It measures the quality and diversity of generated images within the same class. 2. **Conditional Fréchet Inception Distance (CFID)**: - **Between - Class Component (BCFID)**: It measures the distance between the average category features of the generated images and the average category features of the real images. - **Within - Class Component (WCFID)**: It measures the distribution difference between the generated images and the real images within each category. By introducing these new evaluation metrics, the authors aim to provide a more comprehensive and accurate evaluation method in order to better understand and compare the performance of conditional image generation models.