Abstract:This paper investigates the relationships between hyperparameters of machine learning and fairness. Data-driven solutions are increasingly used in critical socio-technical applications where ensuring fairness is important. Rather than explicitly encoding decision logic via control and data structures, the ML developers provide input data, perform some pre-processing, choose ML algorithms, and tune hyperparameters (HPs) to infer a program that encodes the decision logic. Prior works report that the selection of HPs can significantly influence fairness. However, tuning HPs to find an ideal trade-off between accuracy, precision, and fairness has remained an expensive and tedious task. Can we predict fairness of HP configuration for a given dataset? Are the predictions robust to distribution shifts? We focus on group fairness notions and investigate the HP space of 5 training algorithms. We first find that tree regressors and XGBoots significantly outperformed deep neural networks and support vector machines in accurately predicting the fairness of HPs. When predicting the fairness of ML hyperparameters under temporal distribution shift, the tree regressors outperforms the other algorithms with reasonable accuracy. However, the precision depends on the ML training algorithm, dataset, and protected attributes. For example, the tree regressor model was robust for training data shift from 2014 to 2018 on logistic regression and discriminant analysis HPs with sex as the protected attribute; but not for race and other training algorithms. Our method provides a sound framework to efficiently perform fine-tuning of ML training algorithms and understand the relationships between HPs and fairness.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: **Can the fairness of machine learning (ML) hyper - parameter configurations be predicted, and is this prediction still robust when the data distribution changes?** Specifically, the paper focuses on: 1. **Can the group fairness of ML hyper - parameters be accurately predicted given the training data set and protected attributes?** 2. **How do different types of prediction methods (such as neural networks, support vector regression, tree regressors, and XGBoost) perform in predicting the fairness of ML hyper - parameters?** 3. **Can the fairness of ML hyper - parameters be predicted under changes in time distribution? Which types of ML algorithms are more robust under such changes?** ### Detailed Interpretation #### Problem Background With the widespread use of data - driven solutions in critical socio - technical applications, ensuring the fairness of these systems has become crucial. Machine learning developers build models by providing input data, selecting algorithms, and adjusting hyper - parameters. However, the selection of hyper - parameters has a significant impact on the fairness of the model, and finding the ideal combination of hyper - parameters to balance accuracy, precision, and fairness is an expensive and cumbersome task. #### Research Objectives The goals of the paper are: - To explore the relationship between machine learning hyper - parameters and fairness. - To use regression methods to learn this relationship, thereby predicting the fairness of specific hyper - parameter configurations and avoiding a complete training cycle. - To analyze the robustness of these prediction models under changes in data distribution (especially changes in time distribution). #### Experimental Design To answer the above questions, the authors conducted the following experiments: - **Data Sets**: Four socially critical data sets (Adult Census, Compas Recidivism, Default Credit, Bank Marketing) were used, covering different protected attributes (such as gender, race, etc.). - **ML Algorithms**: Five popular machine learning algorithms (decision tree classifier, support vector machine, logistic regression classifier, random forest, and discriminant analysis) were selected. - **Prediction Methods**: Four regression methods (deep neural network, support vector regression, tree regressor, and XGBoost) were used to learn the mapping from hyper - parameters to fairness. #### Main Findings - **Performance on Fixed Data Sets**: Tree regressors and XGBoost performed well in predicting the AOD fairness of all five algorithms, achieving \( R^2\geq0.95 \) in 40% of cases and only having \( R^2\leq0.5 \) in 6.7% of cases. - **Performance under Changes in Time Distribution**: Tree regressors and XGBoost performed relatively well in the one - year time - distribution change, but the accuracy decreased significantly for other training algorithms and protected attributes (such as race). #### Conclusions The paper provides a systematic framework for efficiently adjusting ML training algorithms and understanding the relationship between hyper - parameters and fairness. Although there are challenges in prediction in some cases, this study provides valuable insights for reducing bias configurations in data - driven software development and points out directions for future research. ### Formula Summary The formulas involved in the paper include: - **Average Odds Difference (AOD)**: \[ AOD_M=\frac{|TPR_M(0)-TPR_M(1)|+|FPR_M(0)-FPR_M(1)|}{2} \] - **Mean Squared Error (MSE)**: \[ MSE = \frac{1}{n}\sum_{i = 1}^{n}(AOD_i-\hat{AOD}_i)^2 \] - **Coefficient of Determination (\( R^2 \))**: \[ R^2=1-\frac{\sum_{i = 1}^{n}(AOD_i-\hat{AOD}_i)^2}{\sum_{i = 1}^{n}(AOD_i-\bar{AOD})^2} \] These formulas are used to evaluate the performance and fairness indicators of prediction models.

Predicting Fairness of ML Software Configurations

Fairness-aware Configuration of Machine Learning Libraries

Does Machine Bring in Extra Bias in Learning? Approximating Fairness in Models Promptly

A novel approach for assessing fairness in deployed machine learning algorithms

Software Engineering for Fairness: A Case Study with Hyperparameter Optimization

Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions

Equal Opportunity and Affirmative Action via Counterfactual Predictions

The Sharpe predictor for fairness in machine learning

Fix Fairness, Don't Ruin Accuracy: Performance Aware Fairness Repair using AutoML

Metrizing Fairness

Fairness Measures of Machine Learning Models in Judicial Penalty Prediction

Testing Relative Fairness in Human Decisions With Machine Learning

On The Fairness Impacts of Hardware Selection in Machine Learning

Is it Still Fair? A Comparative Evaluation of Fairness Algorithms through the Lens of Covariate Drift

Data vs. Model Machine Learning Fairness Testing: An Empirical Study

Evaluating Fairness Using Permutation Tests

On the Fairness of Machine-Assisted Human Decisions

Fairness And Performance In Harmony: Data Debiasing Is All You Need

On Formalizing Fairness in Prediction with Machine Learning

Fairness in Machine Learning with Tractable Models

Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking