Data-centric AI: Techniques and Future Perspectives.

Daochen Zha,Kwei-Herng Lai,Fan Yang,Na Zou,Huiji Gao,Xia Hu
DOI: https://doi.org/10.1145/3580305.3599553
2023-01-01
Abstract:The role of data in AI has been significantly magnified by the emerging concept of data-centric AI. In contrast to the traditional model-centric paradigm, which focuses on developing more effective models given fixed datasets, data-centric AI emphasizes the systematic engineering of data in building AI systems. However, as a new concept, many critical aspects of data-centric AI remain ambiguous, such as its definitions, associated tasks, algorithms, challenges, and benchmarks. This tutorial aims to review and discuss this emerging field, with a particular focus on the three general data-centric AI goals: training data development, inference data development, and data maintenance. The objective of this tutorial is threefold: (1) to formally categorize the field of data-centric AI using a goal-driven taxonomy and discuss the needs and challenges of each goal, (2) to comprehensively review the state-of-the-art techniques, and (3) to discuss the future perspectives and open research directions to inspire further innovations in this field.
What problem does this paper attempt to address?