Abstract:Crowdsensing has been recognized as a promising data collection paradigm, in which a platform outsources sensing tasks to a large number of users. However, requesting users to report raw data may give rise to many practical concerns, such as a significant overhead of communication and central processing, besides users' privacy concerns. In many scenarios (e.g, advertising and recommendation), the data collector directly benefits from statistical aggregation of raw data. Thus motivated, we consider the data collection problem based on user's local histograms, which is intimately related to the fundamental trade-off between the platform's accuracy and users' privacy. Because of users' social relationship, their data are often correlated, indicating that users' privacy may be leaked from others' data. To tackle this challenge, we first utilize Gaussian Markov random fields to model the correlation structure embedded in users' data. The data collection is modeled as a Stackelberg game where the platform decides its reward policy and users decide their noise levels while taking into account the social coupling among users. For the reward policy design, we first establish the relationship between users' Nash equilibrium and the payment mechanism, and then optimize the platform's accuracy under a budget constraint. Further, since the noise levels are users' private information, they may use falsified noise levels to achieve higher payoffs, which in turn impairs the crowdsensing performance. It turns out that with the insight into the correlation structure among users' data, the information asymmetry can be overcome based on peer prediction. We revisit the payment mechanism to guarantee dominant truthfulness of each user's strategy. Theoretical analysis and numerical results demonstrate the effectiveness of the proposed mechanism.

Noisy Data Collection Towards Diversity Maximization

Clustering Ensemble with High Diversity Based on Adding Artificial Data

Socially Privacy-Preserving Data Collection for Crowdsensing

Position: Measure Dataset Diversity, Don't Just Claim It

The Optimal Noise Distribution for Privacy Preserving in Mobile Aggregation Applications

Towards optimal noise distribution for privacy preserving in data aggregation

Not All Noises Are Created Equally:Diffusion Noise Selection and Optimization

Diversity in Machine Learning

Sample Weighting: an Inherent Approach for Outlier Suppressing Discriminant Analysis

Measuring diversity. A review and an empirical analysis

Designing Data: Proactive Data Collection and Iteration for Machine Learning

3-methyladenine-DNA glycosylase II: the crystal structure of an AlkA-hypoxanthine complex suggests the possibility of product inhibition.

Dataset Distillers Are Good Label Denoisers In the Wild

Toward Optimal Additive Noise Distribution for Privacy Protection in Mobile Statistics Aggregation

Data collection and quality challenges in deep learning: a data-centric AI perspective

A context model for collecting diversity-aware data

A boosting method to detect noisy data

Noise Matters: Diffusion Model-based Urban Mobility Generation with Collaborative Noise Priors

Diversity Measurement and Subset Selection for Instruction Tuning Datasets

How Does Data Diversity Shape the Weight Landscape of Neural Networks?

The Data Addition Dilemma