Automatic Quality Control of Crowdsourced Rainfall Data with Multiple Noises: A Machine Learning Approach

Geng Niu,Pan Yang,Yi Zheng,Ximing Cai,Huapeng Qin
DOI: https://doi.org/10.1029/2020wr029121
IF: 5.4
2021-10-22
Water Resources Research
Abstract:In geophysics, crowdsourcing is an emerging nontraditional environmental monitoring approach that support data acquisition from individual citizens. However, because of the involvement of undertrained citizens and imprecise low-cost sensors, crowdsourced data applications suffer from different types of noises that can deteriorate the overall monitoring accuracy. In this study, we propose a machine learning approach for automatic crowdsourced data quality control (CSQC) that detects and removes noisy data inputs in spatially and temporally discrete crowdsourced observations coming from both fixed-point sensors (e.g., surveillance cameras) and moving sensors (e.g., moving cars/pedestrians). We design a set of features from original and interpolated rainfall data and use them to train and test the CSQC models using both supervised and unsupervised machine learning algorithms. The performances of the CSQC models under various scenarios assuming no retraining are also tested (hereafter referred to as transferability). The results based on synthetic but realistic data show that the CSQC models can significantly reduce the overall rainfall estimate errors. Under the stationary assumption, the CSQC models based on both supervised and unsupervised algorithms perform well in noisy data identification and overall rainfall estimation error reduction; however, if the model is transferred to other cities with different rainfall patterns or noise compositions (without retraining), supervised multilayer perceptrons (MLPs) show the best performance. This article is protected by copyright. All rights reserved.
environmental sciences,water resources,limnology
What problem does this paper attempt to address?