Data Scraping for the Training of Generative AI — Lessons from Chinese Case Law and Regulation

Qian Li,Konrad Kollnig
DOI: https://doi.org/10.9785/cri-2024-250201
2024-02-01
Computer Law Review International
Abstract:Abstract The collection of data from websites at great scale - so-called data scraping - is the foundation for ChatGPT and most other Generative AI (GenAI) tools. Much of the previous discussion on the regulation of GenAI has focused on the US and EU and not so much on more technical aspects like data scraping. In response, this article focuses on the regulation of data scraping to build and deploy GenAI in China, and reviews applicable regulation and case law. We find that the sectoral approach to AI regulation in China provides important insights into balancing technological progress and societal values, diverging from the laissez-faire attitude in the US and the horizontal approach with the AI Act in the EU.
English Else
What problem does this paper attempt to address?