Leveraging Expert Consistency to Improve Algorithmic Decision Support

Maria De-Arteaga,Vincent Jeanselme,Artur Dubrawski,Alexandra Chouldechova
2024-06-03
Abstract:Machine learning (ML) is increasingly being used to support high-stakes decisions. However, there is frequently a construct gap: a gap between the construct of interest to the decision-making task and what is captured in proxies used as labels to train ML models. As a result, ML models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. Thus, an essential step in the design of ML systems for decision support is selecting a target label among available proxies. In this work, we explore the use of historical expert decisions as a rich -- yet also imperfect -- source of information that can be combined with observed outcomes to narrow the construct gap. We argue that managers and system designers may be interested in learning from experts in instances where they exhibit consistency with each other, while learning from observed outcomes otherwise. We develop a methodology to enable this goal using information that is commonly available in organizational information systems. This involves two core steps. First, we propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Second, we introduce a label amalgamation approach that allows ML models to simultaneously learn from expert decisions and observed outcomes. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap, yielding better predictive performance than learning from either observed outcomes or expert decisions alone.
Machine Learning,Human-Computer Interaction
What problem does this paper attempt to address?
This paper aims to address a critical issue faced by machine learning (ML) models in supporting high-risk decisions—the **construct gap**. The construct gap refers to the difference between the construct of interest in the decision task (\(Y_c\)) and the proxy labels (\(Y\) and \(D\)) used to train the ML model. Specifically, \(Y_c\) is the true target that decision-makers are trying to predict, while \(Y\) and \(D\) represent the observed outcomes and historical expert decisions, respectively, both of which are imperfect proxies for \(Y_c\). ### Problems the Paper Attempts to Solve: 1. **Construct Gap Issue**: How to select appropriate proxy labels to train ML models to better approximate \(Y_c\) in the absence of direct measurements of \(Y_c\)? 2. **Importance of Expert Consistency**: How to leverage the consistency of historical expert decisions to improve the predictive performance of ML models, especially when these decisions are more likely to be closer to \(Y_c\)? ### Main Contributions: - **Managerial Contribution**: Proposes a method to improve algorithmic decision support systems by learning from both expert decisions and observed outcomes when only \(Y\) and \(D\) are available as imperfect proxies. Specifically, when multiple experts show consistency in certain cases, the model learns from these expert decisions; otherwise, it learns from observed outcomes. - **Methodological Contribution**: Develops an influence function-based method to estimate expert consistency on individual instances and integrates this consistency information into model training through a label amalgamation strategy to more accurately approximate \(Y_c\). - **Empirical Contribution**: Validates the effectiveness of the proposed method through simulations and real data experiments, particularly in the application to child welfare, demonstrating the method's advantage in improving prediction accuracy. ### Method Overview: 1. **Estimating Expert Consistency**: - **Step 1**: Train a predictive model to predict historical expert decisions \(D\) based on observed features \(X\). - **Step 2**: Use influence functions to assess whether high-confidence predictions are driven by multiple experts' historical decisions, thereby estimating expert consistency. 2. **Label Amalgamation**: - **Step 1**: Estimate expert consistency for each historical data instance. - **Step 2**: Fuse expert decisions \(D\) with observed outcomes \(Y\) for high-consistency cases to create new labels for model training. ### Experimental Validation: - **Semi-Synthetic Data**: Validates the method's performance, robustness, and failure modes. - **Real Data**: Conducts an empirical study in the child welfare domain, demonstrating the proposed method's advantage in improving the accuracy of other relevant metrics while maintaining performance in predicting out-of-home placements. Through these contributions, the paper provides an effective approach to help managers and system designers better utilize historical expert decisions and observed outcomes to improve the performance of ML models in high-risk decision support when facing the construct gap.