Abstract:Precision and Recall are foundational metrics in machine learning where both accurate predictions and comprehensive coverage are essential, such as in recommender systems and multi-label learning. In these tasks, balancing precision (the proportion of relevant items among those predicted) and recall (the proportion of relevant items successfully predicted) is crucial. A key challenge is that one-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. For instance, in recommender systems like YouTube, training data only consists of videos that a user has actively selected, while unselected items remain unseen. Despite this lack of negative feedback in training, avoiding undesirable recommendations at test time is essential. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions, such as between users and items. This framework subsumes the classical binary and multi-class PAC learning models as well as multi-label learning with partial feedback, where only a single random correct label per example is observed, rather than all correct labels. Our work uncovers a rich statistical and algorithmic landscape, with nuanced boundaries on what can and cannot be learned. Notably, classical methods like Empirical Risk Minimization fail in this setting, even for simple hypothesis classes with only two hypotheses. To address these challenges, we develop novel algorithms that learn exclusively from positive data, effectively minimizing both precision and recall losses. Specifically, in the realizable setting, we design algorithms that achieve optimal sample complexity guarantees. In the agnostic case, we show that it is impossible to achieve additive error guarantees--as is standard in PAC learning--and instead obtain meaningful multiplicative approximations.

Revisiting Precision and Recall Definition for Generative Model Evaluation

Unifying and extending Precision Recall metrics for assessing generative models

Precision and Recall Reject Curves for Classification

Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows

Training Normalizing Flows with the Precision-Recall Divergence

Statistical Precision – Recall curves for object detection quality assessment

Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions

Tuning model parameters in class‐imbalanced learning with precision‐recall curve

Population and Empirical PR Curves for Assessment of Ranking Algorithms

Probably Approximately Precision and Recall Learning

Soft Precision and Recall

Anytime valid and asymptotically optimal inference driven by predictive recursion

Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation

A Unifying Information-theoretic Perspective on Evaluating Generative Models

Sounding Out Reconstruction Error-Based Evaluation of Generative Models of Expressive Performance

TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models

Decision Curve Analysis: a Technical Note

How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models

Refereeing the Referees: Evaluating Two-Sample Tests for Validating Generators in Precision Sciences

The Effect of Class Imbalance on Precision-Recall Curves

Curvature Filtrations for Graph Generative Model Evaluation