Abstract:We study a collaborative learning problem where $m$ agents estimate a vector $\mu\in\mathbb{R}^d$ by collecting samples from normal distributions, with each agent $i$ incurring a cost $c_{i,k} \in (0, \infty]$ to sample from the $k^{\text{th}}$ distribution $\mathcal{N}(\mu_k, \sigma^2)$. Instead of working on their own, agents can collect data that is cheap to them, and share it with others in exchange for data that is expensive or even inaccessible to them, thereby simultaneously reducing data collection costs and estimation error. However, when agents have different collection costs, we need to first decide how to fairly divide the work of data collection so as to benefit all agents. Moreover, in naive sharing protocols, strategic agents may under-collect and/or fabricate data, leading to socially undesirable outcomes. Our mechanism addresses these challenges by combining ideas from cooperative and non-cooperative game theory. We use ideas from axiomatic bargaining to divide the cost of data collection. Given such a solution, we develop a Nash incentive-compatible (NIC) mechanism to enforce truthful reporting. We achieve a $\mathcal{O}(\sqrt{m})$ approximation to the minimum social penalty (sum of agent estimation errors and data collection costs) in the worst case, and a $\mathcal{O}(1)$ approximation under favorable conditions. We complement this with a hardness result, showing that $\Omega(\sqrt{m})$ is unavoidable in any NIC mechanism.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is how to fairly allocate data collection work and ensure the authenticity of data when sharing data among heterogeneous strategic agents to estimate a vector $\mu \in \mathbb{R}^d$. Specifically: 1. **Heterogeneity of data collection costs**: The cost $c_{i,k}$ for each agent $i$ to draw samples from the $k$-th distribution $N(\mu_k, \sigma^2)$ is different, and some agents may not be able to obtain data from certain distributions (i.e., $c_{i,k}=\infty$). Therefore, a mechanism needs to be designed to fairly allocate data collection tasks so that all agents can benefit from it. 2. **Preventing data falsification and free - riding behavior**: Under a simple data - sharing protocol, agents may collect less data or falsify data for their own interests, resulting in socially undesirable outcomes. For example, if an agent expects other agents to contribute a large amount of data, it may choose not to collect its own data but rely on the data provided by others. This behavior not only harms the interests of other agents but also affects the overall data quality. To solve these problems, the paper proposes a mechanism design method that combines the ideas of cooperative game theory and non - cooperative game theory: - **Using axiomatic bargaining theory**: Use this method to fairly allocate the cost of data collection. - **Developing Nash incentive - compatible (NIC) mechanism**: Ensure that agents report data truthfully, and in the worst - case scenario, the social penalty of this mechanism (i.e., the sum of the agent's estimation error and data collection cost) is close to $O(\sqrt{m})$ times the optimal solution, and can reach $O(1)$ times approximation under favorable conditions. In addition, the paper also proves a hardness result, indicating that in any NIC mechanism, a social penalty of $\Omega(\sqrt{m})$ is inevitable. This shows that when dealing with heterogeneity and strategic agent problems, designing an efficient and fair mechanism is challenging. In summary, this paper aims to solve the difficult problem of how to fairly allocate data collection tasks and ensure data authenticity when sharing data among heterogeneous agents, and at the same time provides an effective mechanism design method to meet these challenges.

Data Sharing for Mean Estimation Among Heterogeneous Strategic Agents

Data Sharing Markets

Statistical Estimation with Strategic Data Sources in Competitive Settings

Self-Interested Agents in Collaborative Learning: An Incentivized Adaptive Data-Centric Framework

Differentially-Private Collaborative Online Personalized Mean Estimation

Socially Privacy-Preserving Data Collection for Crowdsensing

Online Resource Sharing via Dynamic Max-Min Fairness: Efficiency, Robustness and Non-Stationarity

Dynamic Information Sharing and Punishment Strategies

Scalable Decentralized Algorithms for Online Personalized Mean Estimation

Fair Multi-party Machine Learning -- a Game Theoretic approach

Cooperative Information Sharing to Improve Distributed Learning in Multi-Agent Systems

Sharing Non-anonymous Costs of Multiple Resources Optimally

Scalable Mechanisms for Rational Secret Sharing

Crowd-Empowered Privacy-Preserving Data Aggregation for Mobile Crowdsensing.

Efficient Core-selecting Incentive Mechanism for Data Sharing in Federated Learning

Sharing a Reward Based on Peer Evaluations

Quantifying Inefficiency of Fair Cost-Sharing Mechanisms for Sharing Economy

On Collaboration in Distributed Parameter Estimation with Resource Constraints

Unravelling in Collaborative Learning

Exploiting Structure for Optimal Multi-Agent Bayesian Decentralized Estimation

Incentivizing Honesty among Competitors in Collaborative Learning and Optimization