Abstract:Machine learning (ML) is increasingly being used to support high-stakes decisions. However, there is frequently a construct gap: a gap between the construct of interest to the decision-making task and what is captured in proxies used as labels to train ML models. As a result, ML models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. Thus, an essential step in the design of ML systems for decision support is selecting a target label among available proxies. In this work, we explore the use of historical expert decisions as a rich -- yet also imperfect -- source of information that can be combined with observed outcomes to narrow the construct gap. We argue that managers and system designers may be interested in learning from experts in instances where they exhibit consistency with each other, while learning from observed outcomes otherwise. We develop a methodology to enable this goal using information that is commonly available in organizational information systems. This involves two core steps. First, we propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Second, we introduce a label amalgamation approach that allows ML models to simultaneously learn from expert decisions and observed outcomes. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap, yielding better predictive performance than learning from either observed outcomes or expert decisions alone.

What problem does this paper attempt to address?

This paper aims to address a critical issue faced by machine learning (ML) models in supporting high-risk decisions—the **construct gap**. The construct gap refers to the difference between the construct of interest in the decision task (\(Y_c\)) and the proxy labels (\(Y\) and \(D\)) used to train the ML model. Specifically, \(Y_c\) is the true target that decision-makers are trying to predict, while \(Y\) and \(D\) represent the observed outcomes and historical expert decisions, respectively, both of which are imperfect proxies for \(Y_c\). ### Problems the Paper Attempts to Solve: 1. **Construct Gap Issue**: How to select appropriate proxy labels to train ML models to better approximate \(Y_c\) in the absence of direct measurements of \(Y_c\)? 2. **Importance of Expert Consistency**: How to leverage the consistency of historical expert decisions to improve the predictive performance of ML models, especially when these decisions are more likely to be closer to \(Y_c\)? ### Main Contributions: - **Managerial Contribution**: Proposes a method to improve algorithmic decision support systems by learning from both expert decisions and observed outcomes when only \(Y\) and \(D\) are available as imperfect proxies. Specifically, when multiple experts show consistency in certain cases, the model learns from these expert decisions; otherwise, it learns from observed outcomes. - **Methodological Contribution**: Develops an influence function-based method to estimate expert consistency on individual instances and integrates this consistency information into model training through a label amalgamation strategy to more accurately approximate \(Y_c\). - **Empirical Contribution**: Validates the effectiveness of the proposed method through simulations and real data experiments, particularly in the application to child welfare, demonstrating the method's advantage in improving prediction accuracy. ### Method Overview: 1. **Estimating Expert Consistency**: - **Step 1**: Train a predictive model to predict historical expert decisions \(D\) based on observed features \(X\). - **Step 2**: Use influence functions to assess whether high-confidence predictions are driven by multiple experts' historical decisions, thereby estimating expert consistency. 2. **Label Amalgamation**: - **Step 1**: Estimate expert consistency for each historical data instance. - **Step 2**: Fuse expert decisions \(D\) with observed outcomes \(Y\) for high-consistency cases to create new labels for model training. ### Experimental Validation: - **Semi-Synthetic Data**: Validates the method's performance, robustness, and failure modes. - **Real Data**: Conducts an empirical study in the child welfare domain, demonstrating the proposed method's advantage in improving the accuracy of other relevant metrics while maintaining performance in predicting out-of-home placements. Through these contributions, the paper provides an effective approach to help managers and system designers better utilize historical expert decisions and observed outcomes to improve the performance of ML models in high-risk decision support when facing the construct gap.

Leveraging Expert Consistency to Improve Algorithmic Decision Support

A Machine Learning Framework Towards Transparency in Experts' Decision Quality

Collaborative Human-ML Decision Making Using Experts' Privileged Information Under Uncertainty.

Incorporating Experts' Judgment into Machine Learning Models

(De)Noise: Moderating the Inconsistency Between Human Decision-Makers

The impact of inconsistent human annotations on AI driven clinical decision making

Bridging the gap: Towards an Expanded Toolkit for AI-driven Decision-Making in the Public Sector

Expert–Machine Collaborative Decision Making: We Need Healthy Competition

Heuristic-Based Weak Learning for Automated Decision-Making

Improving Expert Predictions with Conformal Prediction

Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization

Designing Decision Support Systems Using Counterfactual Prediction Sets

Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes

Human-AI collaboration to mitigate decision noise in financial underwriting: A study on FinTech innovation in a lending firm

Integrating Explainable Machine Learning in Clinical Decision Support Systems: Study Involving a Modified Design Thinking Approach

Preservation of Feature Stability in Machine Learning Under Data Uncertainty for Decision Support in Critical Domains

Multi-Model Assessing and Visualizing Consistency and Compatibility of Experts in Group Decision-Making

Probabilistic Consensus through Ensemble Validation: A Framework for LLM Reliability

Collaborative Learning via Prediction Consensus

Consistent Joint Decision-Making with Heterogeneous Learning Models

Algorithm, Expert, or Both? Evaluating the Role of Feature Selection Methods on User Preferences and Reliance