Revisiting Precision and Recall Definition for Generative Model Evaluation

Loïc Simon,Ryan Webster,Julien Rabin
DOI: https://doi.org/10.48550/arXiv.1905.05441
2019-05-14
Abstract:In this article we revisit the definition of Precision-Recall (PR) curves for generative models proposed by Sajjadi et al. (<a class="link-https" data-arxiv-id="1806.00035" href="https://arxiv.org/abs/1806.00035">arXiv:1806.00035</a>). Rather than providing a scalar for generative quality, PR curves distinguish mode-collapse (poor recall) and bad quality (poor precision). We first generalize their formulation to arbitrary measures, hence removing any restriction to finite support. We also expose a bridge between PR curves and type I and type II error rates of likelihood ratio classifiers on the task of discriminating between samples of the two distributions. Building upon this new perspective, we propose a novel algorithm to approximate precision-recall curves, that shares some interesting methodological properties with the hypothesis testing technique from Lopez-Paz et al (<a class="link-https" data-arxiv-id="1610.06545" href="https://arxiv.org/abs/1610.06545">arXiv:1610.06545</a>). We demonstrate the interest of the proposed formulation over the original approach on controlled multi-modal datasets.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?