Abstract:Decentralized data markets can provide more equitable forms of data acquisition for machine learning. However, to realize practical marketplaces, efficient techniques for seller selection need to be developed. We propose and benchmark federated data measurements to allow a data buyer to find sellers with relevant and diverse datasets. Diversity and relevance measures enable a buyer to make relative comparisons between sellers without requiring intermediate brokers and training task-dependent models.

What problem does this paper attempt to address?

The paper primarily focuses on addressing the issue of how to effectively select data sellers in a decentralized data market. Specifically, the paper makes the following key contributions: 1. **Background and Motivation**: With the development of artificial intelligence technology, large-scale datasets have become increasingly important, but traditional data collection methods face numerous ethical challenges and legal risks. Therefore, researchers propose decentralized data markets as a solution, aiming to achieve a fairer and more transparent way of data acquisition. 2. **Problem Definition**: In a decentralized data market, a core issue that needs to be addressed is how buyers can efficiently find sellers with relevant and diverse data. Traditional methods rely on data brokers to accomplish this task, but in a decentralized market, new methods are needed to achieve this goal. 3. **Solution**: The paper proposes a federated data metric-based approach to solve the above problem. This method allows buyers to compare the value of different sellers by calculating the relevance and diversity metrics of their data without directly accessing the sellers' data or performing task-specific model evaluations. 4. **Experimental Validation**: To validate the effectiveness of the proposed federated data metrics, researchers conducted benchmark tests on multiple computer vision datasets. These tests include evaluating different metrics for ranking sellers, predicting downstream classification performance, and assessing robustness to duplicate and noisy data. 5. **Key Findings**: - Relevance metrics (such as Euclidean distance, cosine similarity, etc.) help identify sellers most relevant to the buyer's needs. - Diversity metrics (such as volume, Vendi score, etc.) have a strong correlation with downstream classification performance, indicating that highly diverse data helps improve model generalization. - By sending multiple queries (including some in false directions), dishonest behavior of sellers can be effectively detected. - The proposed method shows robustness in handling duplicate and noisy data. In summary, this paper proposes a federated data metric framework to reduce search costs in decentralized data markets, thereby promoting more efficient market operations.

Data Measurements for Decentralized Data Markets

DDS: an Auction Based on a Variant of Data Shapley for Federated Learning.

Federated Learning for Data Market: Shapley-UCB for Seller Selection and Incentives

DAVED: Data Acquisition via Experimental Design for Data Markets

A Marketplace for Data: An Algorithmic Solution

A Data-Centric Online Market for Machine Learning: From Discovery to Pricing

Data Exchange Markets via Utility Balancing

When Crowdsourcing Meets Data Markets: A Fair Data Value Metric for Data Trading

Dealer: End-to-End Data Marketplace with Model-based Pricing

A Survey of Data Marketplaces and Their Business Models

A Socially Optimal Data Marketplace With Differentially Private Federated Learning

Data Sharing Markets

FL-Market: Trading Private Models in Federated Learning

Equilibria of Data Marketplaces with Privacy-Aware Sellers under Endogenous Privacy Costs

Private Data Valuation and Fair Payment in Data Marketplaces

Data market platforms

FedMark: A Marketplace for Federated Data on the Web

A Survey of Data Pricing for Data Marketplaces

Disentangled Structural and Featural Representation for Task-Agnostic Graph Valuation

Toward Decentralized Fair Data Trading Based on Blockchain