Data Poisoning Attacks and Defenses in Dynamic Crowdsourcing with Online Data Quality Learning

Yuxi Zhao,Xiaowen Gong,Fuhong Lin,Xu Chen
DOI: https://doi.org/10.1109/tmc.2021.3133365
IF: 6.075
2023-01-01
IEEE Transactions on Mobile Computing
Abstract:Crowdsourcing has found a wide variety of applications, including spectrum sensing, traffic monitoring, as well as data annotation for machine learning based data analytics. To improve data accuracy and cost-effectiveness, workers’ data quality can be learned from their data in an online manner, which can be used for task assignment and data aggregation. However, crowdsourcing is vulnerable to data poisoning attacks, where the attacker reports malicious data to reduce aggregated data accuracy. In this paper, we study malicious data attacks on dynamic crowdsourcing where tasks are assigned and performed sequentially, and we explore online quality learning as a defense mechanism against the attack by finding malicious workers with low quality. We first focus on the asymptotic setting where workers’ quality is accurately learned by the requester, based on which we then turn to the general non-asymptotic setting where the quality is estimated online with errors. For each setting, we first characterize the conditions under which the attack strategy can effectively reduce the aggregated data accuracy. Our results show that the malicious noise variance needs to be within a certain range for the attack to be effective. Then we analyze the harm of effective attack strategies. It reveals that the regret of the online quality learning algorithm can be substantially increased from $\mathcal {O}(\log ^2T)$O(log2T) (upper bound) to $\Omega (T)$Ω(T) (lower bound) due to effective attacks. To further mitigate the attack, we also study median and maximum influence of estimation based data aggregation as defense mechanisms. Our results provide useful insights on the impacts of data poisoning attacks when online quality learning is used to defend against the attack. We evaluate the proposed attacks and defenses via extensive simulation results based on real-world data, which demonstrate the effectiveness of the attacks and defenses.
What problem does this paper attempt to address?