Learning from Crowds in a Principled Way: an Interpretable Framework Unifying Deep Networks and Graphical Models

Xuan Wei,Mingyue Zhang,Daniel Zeng
2021-01-01
SSRN Electronic Journal
Abstract:Microtask crowdsourcing has emerged as a cost-effective approach to collecting large-scale high-quality labeled data across a wide range of business scenarios, particularly those artificial intelligence-powered applications that are usually data-hungry. To aggregate the crowd efforts and achieve certain cumulative goals, some assumptions (e.g., worker heterogeneity in quality) are considered, and models are developed based on these assumptions. However, most of the current design of learning from crowds makes simple or constrained assumptions, and the conclusions suffer from low interpretability and generalizability. To provide a set of generalizable practices for the future design of learning from crowds, we first formulate several general hypotheses, including worker heterogeneity in reliability, usefulness of task feature and task clustering structure, etc. To test these hypotheses, we propose an interpretable deep graphical framework that enables incremental design and hence allows us to conduct before-and-after evaluation towards the underlying assumptions. This deep framework also allows us to make less constrained and hence more useful assumptions by modeling complex non-linear relationships with deep networks. An efficient inference algorithm combining variational message passing and amortized learning is then developed to estimate the parameters. Last, we empirically test these hypotheses using eight real-world tasks including text and image classifications. The results also demonstrate the effectiveness of our framework over state-of-the-art benchmark models. Last, a sanity check lends support to the interpretability of the proposed framework. Our work not only serves as a cost-effective approach to aggregating crowd annotations but also provides general practices for the next-generation design of learning from crowds.
What problem does this paper attempt to address?