Quantifying Bytes: Understanding Practical Value of Data Assets in Federated Learning
Minghao Yao,Saiyu Qi,Zhen Tian,Qian Li,Yong Han,Haihong Li,Yong Qi
DOI: https://doi.org/10.26599/tst.2024.9010034
2024-09-14
Tsinghua Science & Technology
Abstract:The data asset is emerging as a crucial component in both industrial and commercial applications. Mining valuable knowledge from the data benefits decision-making and business. However, the usage of data assets raises tension between sensitive information protection and value estimation. As an emerging machine learning paradigm, Federated Learning (FL) allows multiple clients to jointly train a global model based on their data without revealing it. This approach harnesses the power of multiple data assets while ensuring their privacy. Despite the benefits, it relies on a central server to manage the training process and lacks quantification of the quality of data assets, which raises privacy and fairness concerns. In this work, we present a novel framework that combines Federated Learning and Blockchain by Shapley value (FLBS) to achieve a good trade-off between privacy and fairness. Specifically, we introduce blockchain in each training round to elect aggregation and evaluation nodes for training, enabling decentralization and contribution-aware incentive distribution, with these nodes functionally separated and able to supervise each other. The experimental results validate the effectiveness of FLBS in estimating contribution even in the presence of heterogeneity and noisy data.
computer science, information systems,engineering, electrical & electronic, software engineering