Abstract:The emergence and growth of research on issues of ethics in AI, and in particular algorithmic fairness, has roots in an essential observation that structural inequalities in society are reflected in the data used to train predictive models and in the design of objective functions. While research aiming to mitigate these issues is inherently interdisciplinary, the design of unbiased algorithms and fair socio-technical systems are key desired outcomes which depend on practitioners from the fields of data science and computing. However, these computing fields broadly also suffer from the same under-representation issues that are found in the datasets we analyze. This disconnect affects the design of both the desired outcomes and metrics by which we measure success. If the ethical AI research community accepts this, we tacitly endorse the status quo and contradict the goals of non-discrimination and equity which work on algorithmic fairness, accountability, and transparency seeks to address. Therefore, we advocate in this work for diversifying computing as a core priority of the field and our efforts to achieve ethical AI practices. We draw connections between the lack of diversity within academic and professional computing fields and the type and breadth of the biases encountered in datasets, machine learning models, problem formulations, and interpretation of results. Examining the current fairness/ethics in AI literature, we highlight cases where this lack of diverse perspectives has been foundational to the inequity in treatment of underrepresented and protected group data. We also look to other professional communities, such as in law and health, where disparities have been reduced both in the educational diversity of trainees and among professional practices. We use these lessons to develop recommendations that provide concrete steps for the computing community to increase diversity.

Data Representativity for Machine Learning and AI Systems

Dataset Representativeness and Downstream Task Fairness

Data quality dimensions for fair AI

Position: Measure Dataset Diversity, Don't Just Claim It

Understanding the Representation and Representativeness of Age in AI Data Sets

A Survey on Bias and Fairness in Machine Learning

Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection

The Pursuit of Fairness in Artificial Intelligence Models: A Survey

Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data

On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms

Awareness in Practice: Tensions in Access to Sensitive Attribute Data for Antidiscrimination

Does the dataset meet your expectations? Explaining sample representation in image data

Bias in data‐driven artificial intelligence systems—An introductory survey

Ethical Considerations in AI Addressing Bias and Fairness in Machine Learning Models

Big Data, Data Science, and Civil Rights

No computation without representation: Avoiding data and algorithm biases through diversity

Adaptive Sampling Strategies to Construct Equitable Training Datasets

Representation Debiasing of Generated Data Involving Domain Experts

Fairness in AI-Driven Recruitment: Challenges, Metrics, Methods, and Future Directions

Understanding Bias in Machine Learning