Abstract:In this work, we present a novel, machine-learning approach for constructing Multiclass Interpretable Scoring Systems (MISS) - a fully data-driven methodology for generating single, sparse, and user-friendly scoring systems for multiclass classification problems. Scoring systems are commonly utilized as decision support models in healthcare, criminal justice, and other domains where interpretability of predictions and ease of use are crucial. Prior methods for data-driven scoring, such as SLIM (Supersparse Linear Integer Model), were limited to binary classification tasks and extensions to multiclass domains were primarily accomplished via one-versus-all-type techniques. The scores produced by our method can be easily transformed into class probabilities via the softmax function. We demonstrate techniques for dimensionality reduction and heuristics that enhance the training efficiency and decrease the optimality gap, a measure that can certify the optimality of the model. Our approach has been extensively evaluated on datasets from various domains, and the results indicate that it is competitive with other machine learning models in terms of classification performance metrics and provides well-calibrated class probabilities.

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing the construction of interpretable scoring systems for multiclass problems. Specifically, the research team proposed a new method called MISS (Multiclass Interpretable Scoring Systems), which is a fully data-driven approach for generating simple and user-friendly multiclass scoring systems. Such scoring systems are particularly useful in fields like healthcare, criminal justice, and other areas where predictive outcomes need to be easily understood and interpreted. ### Main Contributions 1. **MISS Method**: This is a data-driven approach for learning multiclass scoring systems. By utilizing Integer Programming (IP), MISS can adhere to domain-specific constraints and pair with an optimality gap, which can prove the model's optimality. Additionally, this method can provide class probabilities. 2. **Training Performance Improvement**: New dimension reduction methods (Recursive Feature Aggregation, RFA) and algorithmic improvements adapted from RiskSLIM to multiclass settings were developed to enhance the efficiency of training MISS. 3. **Experimental Validation**: Experiments were conducted on multiple binary and multiclass datasets, showing that MISS performs comparably to other machine learning methods in terms of classification performance metrics and can provide well-calibrated class probabilities. 4. **Open Code**: Publicly available code for training MISS using the CPLEX optimizer, compatible with the scikit-learn interface, is provided. ### Problems Addressed - **Limitations of Existing Techniques**: Existing scoring system methods are often limited to binary classification tasks or use simplified strategies like One-vs-One or One-vs-Rest (OvR) techniques when extended to multiclass tasks, which reduces the interpretability and practicality of the systems. - **Need**: In multiclass problems, there is a need for a scoring system that maintains high predictive performance while also being highly interpretable. ### Method Overview MISS is built on Mixed-Integer Nonlinear Programming (MINLP), optimizing the scoring system by minimizing cross-entropy loss while maximizing the Area Under the Curve (AUC) and calibration, and simultaneously penalizing feature sparsity (via the l0-norm) and constraining coefficients to small integers. The sample score for each class is obtained by summing the points corresponding to positive binary features, and the predicted class is the one with the highest score. ### Conclusion MISS aims to address the construction of interpretable scoring systems for multiclass problems, providing an effective and practical solution while ensuring the interpretability and accuracy of predictive outcomes.

MISS: Multiclass Interpretable Scoring Systems

Supersparse Linear Integer Models for Interpretable Classification

Learning Optimal Fair Scoring Systems for Multi-Class Classification

Probabilistic Scoring Lists for Interpretable Machine Learning

Supersparse Linear Integer Models for Predictive Scoring Systems

Supersparse linear integer models for optimized medical scoring systems

iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries

Interpretable Classification Models for Recidivism Prediction

Optimized Scoring Systems: Toward Trust in Machine Learning for Healthcare and Criminal Justice

What is Interpretable? Using Machine Learning to Design Interpretable Decision-Support Systems

Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

Toward a Fairness-Aware Scoring System for Algorithmic Decision-Making

Expert Study on Interpretable Machine Learning Models with Missing Data

Modeling and Analyzing Scorer Preferences in Short-Answer Math Questions

Finding Patterns in Ambiguity: Interpretable Stress Testing in the Decision~Boundary

STQS: Interpretable multi-modal Spatial-Temporal-seQuential model for automatic Sleep scoring

Superior Scoring Rules for Probabilistic Evaluation of Single-Label Multi-Class Classification Tasks

Multi-Scored Sleep Databases: How to Exploit the Multiple-Labels in Automated Sleep Scoring

Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

A Responsible Machine Learning Workflow with Focus on Interpretable Models, Post-hoc Explanation, and Discrimination Testing

Learning Optimized Risk Scores