Building Legal Datasets

Jerrold Soh
DOI: https://doi.org/10.48550/arXiv.2111.02034
IF: 5.414
2021-11-03
Machine Learning
Abstract:Data-centric AI calls for better, not just bigger, datasets. As data protection laws with extra-territorial reach proliferate worldwide, ensuring datasets are legal is an increasingly crucial yet overlooked component of ``better''. To help dataset builders become more willing and able to navigate this complex legal space, this paper reviews key legal obligations surrounding ML datasets, examines the practical impact of data laws on ML pipelines, and offers a framework for building legal datasets.
What problem does this paper attempt to address?