Promoting User Data Autonomy During the Dissolution of a Monopolistic Firm

Rushabh Solanki,Elliot Creager
2024-11-21
Abstract:The deployment of AI in consumer products is currently focused on the use of so-called foundation models, large neural networks pre-trained on massive corpora of digital records. This emphasis on scaling up datasets and pre-training computation raises the risk of further consolidating the industry, and enabling monopolistic (or oligopolistic) behavior. Judges and regulators seeking to improve market competition may employ various remedies. This paper explores dissolution -- the breaking up of a monopolistic entity into smaller firms -- as one such remedy, focusing in particular on the technical challenges and opportunities involved in the breaking up of large models and datasets. We show how the framework of Conscious Data Contribution can enable user autonomy during under dissolution. Through a simulation study, we explore how fine-tuning and the phenomenon of "catastrophic forgetting" could actually prove beneficial as a type of machine unlearning that allows users to specify which data they want used for what purposes.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: during the process of a monopoly enterprise's dissolution, how to ensure users' data autonomy, and technically break down large - scale models and data sets into smaller parts while respecting users' wishes. Specifically, the paper explores the following aspects: 1. **The dissolution issue in anti - monopoly measures**: - The paper discusses how to handle the huge data sets and pre - trained models of a monopoly enterprise (such as Google) when it is required by the court to be dissolved due to violating anti - monopoly laws. This kind of dissolution means splitting a large company into multiple small companies. 2. **Users' data autonomy**: - During the dissolution process, users' data autonomy is a key issue. The paper proposes a framework named "Conscious Data Contribution (CDC)", which allows users to decide how their data are used by new companies. For example, a user can choose to provide his/her music listening history to the new company that inherits the music streaming business, but not provide location data. 3. **Technical challenges and solutions**: - Through simulation research, the paper explores how fine - tuning and the "catastrophic forgetting" phenomenon can help realize users' data autonomy. Specifically, fine - tuning can make the model perform well on new tasks while "forgetting" the data that are not authorized to be used, thus realizing a natural machine forgetting mechanism. ### Main methods and technical details - **Data representation**: - Suppose the monopoly enterprise \( F \) has a data set \( U\in\mathbb{R}^{N\times d} \) containing \( N \) users, and each user's data \( u_i \) is a \( d \)-dimensional vector, which is composed of multiple task - specific data vectors \( u_i^{(j)} \): \[ u_i=\begin{pmatrix} u_i^{(1)} \\ u_i^{(2)} \\ \vdots \\ u_i^{(J)} \end{pmatrix} \] - Users can choose to contribute some of their task - specific data to specific inheriting companies, which is represented by a binary matrix \( C(u_i)\in\{0, 1\}^{K\times J} \): \[ [C(u_i)]_{j,k}=\begin{cases} 1 & \text{if user } i \text{ contributes the data of task } j \text{ to inheriting company } k \\ 0 & \text{otherwise} \end{cases} \] - **Experimental verification**: - The paper carried out simulation experiments on image generation and text classification to verify the effect of fine - tuning. The experimental results show that fine - tuning can effectively "forget" unauthorized data while maintaining good performance on authorized data. ### Conclusion By introducing the CDC framework and using the fine - tuning technology, the paper provides a method to protect users' data autonomy during the dissolution process of monopoly enterprises. Although there will be a lot of complexity and challenges in actual operation, this research provides valuable ideas for future AI regulation.