Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification.

Jingbo Zhu,Huizhen Wang,Tianshun Yao,Benjamin K. Tsou
DOI: https://doi.org/10.3115/1599081.1599224
2008-01-01
Abstract:This paper addresses two issues of active learning. Firstly, to solve a problem of uncertainty sampling that it often fails by selecting outliers, this paper presents a new selective sampling technique, sampling by uncertainty and density (SUD), in which a k-Nearest-Neighbor-based density measure is adopted to determine whether an unlabeled example is an outlier. Secondly, a technique of sampling by clustering (SBC) is applied to build a representative initial training data set for active learning. Finally, we implement a new algorithm of active learning with SUD and SBC techniques. The experimental results from three real-world data sets show that our method outperforms competing methods, particularly at the early stages of active learning.
What problem does this paper attempt to address?