Abstract:Output thresholding is the technique to search for the best threshold to be used during inference for any classifiers that can produce probability estimates on train and testing datasets. It is particularly useful in high imbalance classification problems where the default threshold is not able to refer to imbalance in class distributions and fail to give the best performance. This paper proposes OTLP, a thresholding framework using mixed integer linear programming which is model agnostic, can support different objective functions and different set of constraints for a diverse set of problems including both balanced and imbalanced classification problems. It is particularly useful in real world applications where the theoretical thresholding techniques are not able to address to product related requirements and complexity of the applications which utilize machine learning models. Through the use of Credit Card Fraud Detection Dataset, we evaluate the usefulness of the framework.

What problem does this paper attempt to address?

This paper proposes a method named OTLP (Output Thresholding using Mixed Integer Linear Programming) to solve the problem of optimal threshold selection for classifiers during prediction, especially for highly imbalanced classification tasks. In imbalanced datasets, the default threshold may fail to handle the skewness of class distribution, resulting in poor performance. OTLP achieves a model-independent solution using the mixed integer linear programming (MILP) framework, which can adapt to different objective functions and constraints to address various problems, including balanced and imbalanced classification tasks. The paper introduces the working principle of OTLP, which adjusts the decision threshold on the training and validation sets to find the optimal threshold for assigning class labels to the testing data based on model probability estimation. The effectiveness of this framework is demonstrated through evaluation on a credit card fraud detection dataset. The paper also discusses related work, such as other threshold optimization methods, and points out their limitations, such as lack of support for complex constraints or model-specific requirements. The advantages of OTLP lie in its flexibility to handle different types of constraints, support for custom objective functions, and no restriction on model type. The experiments showcase the performance of OTLP under different classifiers, dataset class ratios, objective functions, and constraint settings, proving its ability to find optimal thresholds superior to the default value and its applicability to various types of classification problems. In summary, this paper addresses the optimization of classifier output thresholds, proposes a generic approach applicable to various scenarios, and validates its practicality and effectiveness through experiments.

OTLP: Output Thresholding Using Mixed Integer Linear Programming

Dealing with Class Imbalance using Thresholding

OT Cleaner: Label Correction As Optimal Transport

Taming False Positives in Out-of-Distribution Detection with Human Feedback

Finding the Best Classification Threshold in Imbalanced Classification

OPIT: A Simple but Effective Method for Sparse Subspace Tracking in High-Dimension and Low-Sample-Size Context

Threshold-aware Learning to Generate Feasible Solutions for Mixed Integer Programs

Threshold Optimization of Pseudo-Inverse Linear Discriminants Based on Overall Accuracies

Performance analysis of multi level threshold based OTSU method

Iterative thresholding algorithm based on non-convex method for modified lp-norm regularization minimization

Parallel Algorithm for Optimal Threshold Labeling of Ordinal Regression Methods

Adaptive Double-Exploration Tradeoff for Outlier Detection

Test-Time Linear Out-of-Distribution Detection

Exact and Heuristic Solution Techniques for Mixed-Integer Quantile Minimization Problems

A multi-level thresholding image segmentation method using hybrid Arithmetic Optimization and Harris Hawks Optimizer algorithms

Low-rank Optimal Transport: Approximation, Statistics and Debiasing

Double-Bounded Optimal Transport for Advanced Clustering and Classification

Automatic Outlier Rectification via Optimal Transport

Optimal Transport With Relaxed Marginal Constraints

Unbalanced Low-rank Optimal Transport Solvers

Learning to Optimize for Mixed-Integer Non-linear Programming