Abstract:The deployment of AI in consumer products is currently focused on the use of so-called foundation models, large neural networks pre-trained on massive corpora of digital records. This emphasis on scaling up datasets and pre-training computation raises the risk of further consolidating the industry, and enabling monopolistic (or oligopolistic) behavior. Judges and regulators seeking to improve market competition may employ various remedies. This paper explores dissolution -- the breaking up of a monopolistic entity into smaller firms -- as one such remedy, focusing in particular on the technical challenges and opportunities involved in the breaking up of large models and datasets. We show how the framework of Conscious Data Contribution can enable user autonomy during under dissolution. Through a simulation study, we explore how fine-tuning and the phenomenon of "catastrophic forgetting" could actually prove beneficial as a type of machine unlearning that allows users to specify which data they want used for what purposes.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: during the process of a monopoly enterprise's dissolution, how to ensure users' data autonomy, and technically break down large - scale models and data sets into smaller parts while respecting users' wishes. Specifically, the paper explores the following aspects: 1. **The dissolution issue in anti - monopoly measures**: - The paper discusses how to handle the huge data sets and pre - trained models of a monopoly enterprise (such as Google) when it is required by the court to be dissolved due to violating anti - monopoly laws. This kind of dissolution means splitting a large company into multiple small companies. 2. **Users' data autonomy**: - During the dissolution process, users' data autonomy is a key issue. The paper proposes a framework named "Conscious Data Contribution (CDC)", which allows users to decide how their data are used by new companies. For example, a user can choose to provide his/her music listening history to the new company that inherits the music streaming business, but not provide location data. 3. **Technical challenges and solutions**: - Through simulation research, the paper explores how fine - tuning and the "catastrophic forgetting" phenomenon can help realize users' data autonomy. Specifically, fine - tuning can make the model perform well on new tasks while "forgetting" the data that are not authorized to be used, thus realizing a natural machine forgetting mechanism. ### Main methods and technical details - **Data representation**: - Suppose the monopoly enterprise \( F \) has a data set \( U\in\mathbb{R}^{N\times d} \) containing \( N \) users, and each user's data \( u_i \) is a \( d \)-dimensional vector, which is composed of multiple task - specific data vectors \( u_i^{(j)} \): \[ u_i=\begin{pmatrix} u_i^{(1)} \\ u_i^{(2)} \\ \vdots \\ u_i^{(J)} \end{pmatrix} \] - Users can choose to contribute some of their task - specific data to specific inheriting companies, which is represented by a binary matrix \( C(u_i)\in\{0, 1\}^{K\times J} \): \[ [C(u_i)]_{j,k}=\begin{cases} 1 & \text{if user } i \text{ contributes the data of task } j \text{ to inheriting company } k \\ 0 & \text{otherwise} \end{cases} \] - **Experimental verification**: - The paper carried out simulation experiments on image generation and text classification to verify the effect of fine - tuning. The experimental results show that fine - tuning can effectively "forget" unauthorized data while maintaining good performance on authorized data. ### Conclusion By introducing the CDC framework and using the fine - tuning technology, the paper provides a method to protect users' data autonomy during the dissolution process of monopoly enterprises. Although there will be a lot of complexity and challenges in actual operation, this research provides valuable ideas for future AI regulation.

Promoting User Data Autonomy During the Dissolution of a Monopolistic Firm

AI Model Disgorgement: Methods and Choices

Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies

Reclaiming the Digital Commons: A Public Data Trust for Training Data

Big Data and Digital Markets Contestability: Theory of Harm and Data Access Remedies

Preserving Consumer Autonomy through European Union Regulation of Artificial Intelligence: A Long-Term Approach

Collective Privacy Recovery: Data-sharing Coordination via Decentralized Artificial Intelligence

Market Concentration Implications of Foundation Models

Datalism and Data Monopolies in the Era of A.I.: A Research Agenda

MONAL: Model Autophagy Analysis for Modeling Human-AI Interactions

Eternal Sunshine of the Mechanical Mind: The Irreconcilability of Machine Learning and the Right to be Forgotten

Distribution-Aware Compensation Design for Sustainable Data Rights in Machine Learning

Towards Data Governance of Frontier AI Models

Amnesiac Machine Learning

Consent in Crisis: The Rapid Decline of the AI Data Commons

On the Trade-Off between Actionable Explanations and the Right to be Forgotten

Is Data Ownership Empowerment Welfare-Enhancing?

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World

From Machine Learning to Machine Unlearning: Complying with GDPR's Right to be Forgotten while Maintaining Business Value of Predictive Models

Real Risks of Fake Data: Synthetic Data, Diversity-Washing and Consent Circumvention

Control, Confidentiality, and the Right to be Forgotten