Abstract:Collaborative learning offers a promising avenue for leveraging decentralized data. However, collaboration in groups of strategic learners is not a given. In this work, we consider strategic agents who wish to train a model together but have sampling distributions of different quality. The collaboration is organized by a benevolent aggregator who gathers samples so as to maximize total welfare, but is unaware of data quality. This setting allows us to shed light on the deleterious effect of adverse selection in collaborative learning. More precisely, we demonstrate that when data quality indices are private, the coalition may undergo a phenomenon known as unravelling, wherein it shrinks up to the point that it becomes empty or solely comprised of the worst agent. We show how this issue can be addressed without making use of external transfers, by proposing a novel method inspired by probabilistic verification. This approach makes the grand coalition a Nash equilibrium with high probability despite information asymmetry, thereby breaking unravelling.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in collaborative learning, when data quality information is asymmetric, how to prevent the "unravelling" phenomenon caused by adverse selection. Specifically, the author has studied how to effectively conduct collaborative learning in the case where multiple strategic agents have different - quality data distributions, so as to avoid the situation where low - quality data providers dominate the cooperation, which leads to high - quality data providers withdrawing from the cooperation. ### Problem Background In collaborative learning, multiple agents share data and computing resources to complete a common learning task. However, when these agents are strategic and data quality information is private, agents may conceal or falsely report their data quality to gain a competitive advantage. This information asymmetry will lead to the adverse selection problem, that is, low - quality data providers may dominate the cooperation, while high - quality data providers will choose to withdraw, and eventually lead to the disintegration of the cooperation. ### Main Contributions 1. **Strict Analytical Framework**: - Proposed a strict framework for analyzing the collaborative learning of strategic agents with different - quality data distributions. - Utilized domain adaptation tools to formally define data quality and modeled collaboration as a principal - agent problem, where the principal is responsible for collecting samples to maximize social welfare. 2. **Reveal the Unravelling Phenomenon**: - Proved that when data quality is private information, a simple aggregation strategy will lead to complete unravelling. Specifically, in any pure - strategy Nash equilibrium, the set of agents willing to cooperate is either an empty set or only contains the providers of the lowest - quality data. 3. **Propose Solutions**: - When external transfers are allowed, the VCG mechanism can re - establish optimality. - When external transfers are not allowed, proposed a technique based on probabilistic verification to design a mechanism, making the grand coalition a Nash equilibrium with high probability, thus breaking the unravelling phenomenon. ### Specific Methods of the Solution - **Probabilistic Verification Mechanism**: - Assume that the aggregator can estimate the data quality of each agent through a small number of samples. - Designed a new mechanism \(\hat{\Gamma}\), which ensures that the grand coalition is still a Nash equilibrium even when the data quality is unknown through the probabilistic verification method. ### Practical Applications - In the classification task, showed how to actually implement this mechanism, including defining the estimator \(\hat{\theta}_j\) that satisfies the assumption \(H5\). - Provided specific examples to illustrate how to calculate the type estimate \(\hat{\theta}_{ERM,0,j}\) by flipping labels and performing empirical risk minimization in the classification setting. ### Conclusion The research in this paper shows that information asymmetry may lead to serious adverse selection problems in collaborative learning, but by introducing a non - transfer mechanism based on probabilistic verification, the disintegration of cooperation can be effectively prevented, ensuring the participation of high - quality data providers, and thus maintaining the long - term stability and effectiveness of collaborative learning.

Unravelling in Collaborative Learning

Privacy-Preserving Collaborative Deep Learning with Unreliable Participants.

One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning

Incentives in Private Collaborative Machine Learning

Fully Decentralized Joint Learning of Personalized Models and Collaboration Graphs

Incentivizing Honesty among Competitors in Collaborative Learning and Optimization

A Kernel Perspective on Distillation-based Collaborative Learning

Collaborative Learning via Prediction Consensus

Defection-Free Collaboration between Competitors in a Learning System

How to Incentivize Data-Driven Collaboration Among Competing Parties

Collaboratively Learning Linear Models with Structured Missing Data

On the Necessity of Collaboration for Online Model Selection with Decentralized Data

Collaboration Equilibrium in Federated Learning

Generalizing Differentially Private Decentralized Deep Learning with Multi-Agent Consensus

On the Conflict of Robustness and Learning in Collaborative Machine Learning

Collaborative Active Learning in Conditional Trust Environment

Meta Clustering for Collaborative Learning

Joint Training of Deep Ensembles Fails Due to Learner Collusion

Together or Alone: The Price of Privacy in Collaborative Learning

Secure Aggregation Meets Sparsification in Decentralized Learning

Unsupervised collaborative learning using privileged information