Algorithmic Collective Action in Machine Learning

Moritz Hardt,Eric Mazumdar,Celestine Mendler-Dünner,Tijana Zrnic
2024-08-08
Abstract:We initiate a principled study of algorithmic collective action on digital platforms that deploy machine learning algorithms. We propose a simple theoretical model of a collective interacting with a firm's learning algorithm. The collective pools the data of participating individuals and executes an algorithmic strategy by instructing participants how to modify their own data to achieve a collective goal. We investigate the consequences of this model in three fundamental learning-theoretic settings: the case of a nonparametric optimal learning algorithm, a parametric risk minimizer, and gradient-based optimization. In each setting, we come up with coordinated algorithmic strategies and characterize natural success criteria as a function of the collective's size. Complementing our theory, we conduct systematic experiments on a skill classification task involving tens of thousands of resumes from a gig platform for freelancers. Through more than two thousand model training runs of a BERT-like language model, we see a striking correspondence emerge between our empirical observations and the predictions made by our theory. Taken together, our theory and experiments broadly support the conclusion that algorithmic collectives of exceedingly small fractional size can exert significant control over a platform's learning algorithm.
Machine Learning,Computer Science and Game Theory
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: **How can collective action through algorithms influence the learning results of machine - learning algorithms on digital platforms?** Specifically, the authors studied how a collective composed of individuals can guide the machine - learning algorithm optimization process on the platform through coordinated actions (such as modifying data) to achieve the collective's goals. ### Main Contributions 1. **Establishment of a theoretical model**: - Propose a simple theoretical model to describe the interaction between the collective and the company's learning algorithm. - The collective aggregates the personal data of participants and executes an algorithmic strategy to guide participants on how to modify their own data to achieve the collective goal. - When the company processes these modified data, it will adjust its machine - learning model. 2. **Strategy analysis under three learning - theory settings**: - **Non - parametric optimal learning**: Study how, in the optimal case, the collective can make the classifier associate specific signals and target labels by modifying data points. - **Parametric risk minimization**: Explore how the collective can influence the parametric risk minimization problem so that the finally selected model is close to the target model set by the collective. - **Gradient - based optimization**: Analyze how, in a non - convex optimization environment, the collective can influence the learning process by controlling the gradient. 3. **Empirical evaluation**: - Through a large number of experiments on the freelancer platform, the validity of the theoretical predictions was verified. - Experiments show that even a very small proportion of the collective (such as less than 1% of the population) can significantly influence the results of the machine - learning model. ### Key Formulas - **Mixed distribution**: \[ P=\alpha P^{*}+(1 - \alpha)P_{0} \] where \(P^{*}\) is the data distribution generated under the collective strategy, \(P_{0}\) is the original data distribution, and \(\alpha\) is the proportion of the collective. - **Lower bound of the success probability (feature - label strategy)**: \[ S(\alpha)\geq1-\frac{1 - \alpha}{\alpha}\cdot(1 - \epsilon)\Delta+\frac{\epsilon}{1 - 2\epsilon}\cdot\xi \] where \(\xi\) represents the uniqueness of the signal, \(\Delta\) represents the sub - optimality gap, and \(\epsilon\) represents the sub - optimality of the classifier. - **Critical Mass**: \[ \alpha^{*}\leq\frac{(1 - \epsilon)\Delta+\epsilon}{(1 - S^{*})((1 - \epsilon)\Delta+\epsilon)+(1 - 2\epsilon)\cdot\xi} \] This formula gives the minimum collective proportion \(\alpha^{*}\) required to achieve the target success rate \(S^{*}\). ### Conclusion The research in this paper shows that even a very small part of the collective can have a significant impact on the machine - learning algorithms of the platform. This impact can be achieved through coordinated algorithmic strategies, thus providing a new means for workers or consumers on the platform to change the behavior of algorithms.