Incentive-Aware Decentralized Data Collaboration.

Yatong Wang,Yuncheng Wu,Xincheng Chen,Gang Feng,Beng Chin Ooi
DOI: https://doi.org/10.1145/3589303
2023-01-01
Abstract:Data collaboration enables multiple parties to pool data for deriving meaningful data insights. However, data misuse and unlawful data collection have led to precautionary measures being imposed by individual organizations to guide against data leakage and abuse. As a response, decentralized federated learning (DFL) has emerged as an attractive paradigm to facilitate data collaboration while being amenable to privacy-preserving data and knowledge sharing, cost reduction, and prediction accuracy improvement. Unfortunately, the participating parties in DFL tend to be heterogeneous with skew datasets and uneven capabilities. Inevitably, training and transmission costs, and the presence of free-riders pose challenges to the adoption and participation of DFL. The absence of centralized parameter servers further exacerbates the problem of evaluating the contribution of each individual party. Therefore, an effective incentive mechanism is essential to promote data collaboration. In this paper, we propose a novel Incentive-aware Decentralized fEderated leArning (IDEA) framework for facilitating data collaboration. Specifically, we first design a customizable reward scheme for heterogeneous parties to optimize their respective objectives such as higher model accuracy, communication efficiency, and computational efficiency. To reward fairly to deserving parties while offering flexibility, we propose a novel multi-agent reinforcement learning (MARL) incentive mechanism, which enables heterogeneous parties to learn their own optimal collaboration policy. We then design an efficient decentralized data collaboration algorithm that supports the customizable reward scheme based on individual objective-specific collaboration policy. We theoretically prove that the algorithm achieves a Nash equilibrium, which ensures the fairness of the corresponding rewards for parties. We conduct extensive experiments to evaluate the performance of our proposed framework against four baselines on five real-world datasets. The results show that IDEA outperforms state-of-the-art methods in terms of effectiveness, efficiency, and accumulated reward.
What problem does this paper attempt to address?