COCA: Cost-Effective Collaborative Annotation System by Combining Experts and Amateurs

Jiayu Lei,Zheng Zhang,Lan Zhang,Xiang-Yang Li
DOI: https://doi.org/10.1109/icde53745.2022.00055
2022-01-01
Abstract:Data annotation has been a key boost for the artificial intelligence. However, difficult tasks such as fine-grained classification need lots of labeled data to train a feasible model. On the one hand, using people who have expert knowledge on the datasets to annotate all data can be costly. On the other hand, amateurs are cheaper but not able to give precise labels. Related works like machine labeling need labeled data to start up. Crowd-Model labeling can hardly solve complex tasks like fine-grained classification. Lately, combining domain experts and cost-effective crowd to solve complex tasks has become an area of increasing interest in research and industry. However, most works rarely investigate the cost gap between experts and amateurs and see how it influences the final annotation cost. In this paper, we combine both experts and amateurs to build a cost-effective data annotation system called COCA. COCA annotates the target dataset from scratch and save costs by our annotation assignment strategy. Extensive evaluations show that when reaching the same precision, COCA can reach a lower cost than SOTA automatic labeling models when the ratio of expert price to amateur price is above a certain value.
What problem does this paper attempt to address?