CrowdChart: Crowdsourced Data Extraction From Visualization Charts

Chengliang Chai,Guoliang Li,Ju Fan,Yuyu Luo
DOI: https://doi.org/10.1109/tkde.2020.2972543
IF: 9.235
2021-11-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Visualization charts are widely utilized for presenting structured data. Under many circumstances, people want to digitalize the data in the charts collected from various sources (e.g., papers and websites), in oder to further analyze the data or create new charts. However, existing automatic and semi-automatic approaches are not always effective due to the variety of charts. In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. There are several challenges. The first is how to avoid tedious human interaction with charts and design effective crowdsourcing tasks. Second, it is challenging to evaluate worker's quality for truth inference, because workers may not only provide inaccurate values but also misalign values to wrong data series. Third, to guarantee quality, one may assign a task to many workers, leading to a high crowdsourcing cost. To address these challenges, we design an effective crowdsourcing task scheme that splits a chart into simple micro-tasks. We introduce a novel worker quality model by considering worker's accuracy and task difficulty. We also devise effective task assignment and early-termination mechanisms to save the cost. We evaluate our approach on real-world datasets on real crowdsourced platforms, and the results demonstrate the effectiveness of our method.
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?