MISS: Multiclass Interpretable Scoring Systems

Michal K. Grzeszczyk,Tomasz Trzciński,Arkadiusz Sitek
2024-01-10
Abstract:In this work, we present a novel, machine-learning approach for constructing Multiclass Interpretable Scoring Systems (MISS) - a fully data-driven methodology for generating single, sparse, and user-friendly scoring systems for multiclass classification problems. Scoring systems are commonly utilized as decision support models in healthcare, criminal justice, and other domains where interpretability of predictions and ease of use are crucial. Prior methods for data-driven scoring, such as SLIM (Supersparse Linear Integer Model), were limited to binary classification tasks and extensions to multiclass domains were primarily accomplished via one-versus-all-type techniques. The scores produced by our method can be easily transformed into class probabilities via the softmax function. We demonstrate techniques for dimensionality reduction and heuristics that enhance the training efficiency and decrease the optimality gap, a measure that can certify the optimality of the model. Our approach has been extensively evaluated on datasets from various domains, and the results indicate that it is competitive with other machine learning models in terms of classification performance metrics and provides well-calibrated class probabilities.
Machine Learning
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the construction of interpretable scoring systems for multiclass problems. Specifically, the research team proposed a new method called MISS (Multiclass Interpretable Scoring Systems), which is a fully data-driven approach for generating simple and user-friendly multiclass scoring systems. Such scoring systems are particularly useful in fields like healthcare, criminal justice, and other areas where predictive outcomes need to be easily understood and interpreted. ### Main Contributions 1. **MISS Method**: This is a data-driven approach for learning multiclass scoring systems. By utilizing Integer Programming (IP), MISS can adhere to domain-specific constraints and pair with an optimality gap, which can prove the model's optimality. Additionally, this method can provide class probabilities. 2. **Training Performance Improvement**: New dimension reduction methods (Recursive Feature Aggregation, RFA) and algorithmic improvements adapted from RiskSLIM to multiclass settings were developed to enhance the efficiency of training MISS. 3. **Experimental Validation**: Experiments were conducted on multiple binary and multiclass datasets, showing that MISS performs comparably to other machine learning methods in terms of classification performance metrics and can provide well-calibrated class probabilities. 4. **Open Code**: Publicly available code for training MISS using the CPLEX optimizer, compatible with the scikit-learn interface, is provided. ### Problems Addressed - **Limitations of Existing Techniques**: Existing scoring system methods are often limited to binary classification tasks or use simplified strategies like One-vs-One or One-vs-Rest (OvR) techniques when extended to multiclass tasks, which reduces the interpretability and practicality of the systems. - **Need**: In multiclass problems, there is a need for a scoring system that maintains high predictive performance while also being highly interpretable. ### Method Overview MISS is built on Mixed-Integer Nonlinear Programming (MINLP), optimizing the scoring system by minimizing cross-entropy loss while maximizing the Area Under the Curve (AUC) and calibration, and simultaneously penalizing feature sparsity (via the l0-norm) and constraining coefficients to small integers. The sample score for each class is obtained by summing the points corresponding to positive binary features, and the predicted class is the one with the highest score. ### Conclusion MISS aims to address the construction of interpretable scoring systems for multiclass problems, providing an effective and practical solution while ensuring the interpretability and accuracy of predictive outcomes.