Learning Representative Examples for Data Annotation

Jingbo Zhu,Huizhen Wang,Benjamin K. Tsou
DOI: https://doi.org/10.1142/S1793840609002081
2009-01-01
Abstract:Among the techniques to solve the knowledge bottleneck problem of supervised learning models, active learning is a promising method. One of the popular techniques of active learning is uncertainty sampling which, however, often presents problems when outliers are selected. To solve this problem, this paper presents a density-based re-ranking technique, in which a density measure is adopted to determine whether an unlabeled example is an outlier. The motivation of this method is to use not only the most informative example in terms of uncertainty measure, but also the most representative example in terms of density measure. The second effort we made is that a technique of sampling by clustering (SBC) is presented to build a representative initial training data set for active learning. Experimental results of active learning for word sense disambiguation and text classification tasks show that the proposed techniques can improve active learning with uncertainty sampling.
What problem does this paper attempt to address?