Outlier Detection for Streaming Task Assignment in Crowdsourcing

Yan Zhao,Xuanhao Chen,Liwei Deng,Tung Kieu,Chenjuan Guo,Bin Yang,Kai Zheng,Christian S. Jensen
DOI: https://doi.org/10.1145/3485447.3512067
2022-01-01
Abstract:Crowdsourcing aims to enable the assignment of available resources to the completion of tasks at scale. The continued digitization of societal processes translates into increased opportunities for crowdsourcing. For example, crowdsourcing enables the assignment of computational resources of humans, called workers, to tasks that are notoriously hard for computers. In settings faced with malicious actors, detection of such actors holds the potential to increase the robustness of crowdsourcing platform. We propose a framework called Outlier Detection for Streaming Task Assignment that aims to improve robustness by detecting malicious actors. In particular, we model the arrival of workers and the submission of tasks as evolving time series and provide means of detecting malicious actors by means of outlier detection. We propose a novel socially aware Generative Adversarial Network (GAN) based architecture that is capable of contending with the complex distributions found in time series. The architecture includes two GANs that are designed to adversarially train an autoencoder to learn the patterns of distributions in worker and task time series, thus enabling outlier detection based on reconstruction errors. A GAN structure encompasses a game between a generator and a discriminator, where it is desirable that the two can learn to coordinate towards socially optimal outcomes, while avoiding being exploited by selfish opponents. To this end, we propose a novel training approach that incorporates social awareness into the loss functions of the two GANs. Additionally, to improve task assignment efficiency, we propose an efficient greedy algorithm based on degree reduction that transforms task assignment into a bipartite graph matching. Extensive experiments offer insight into the effectiveness and efficiency of the proposed framework.
What problem does this paper attempt to address?