Large Margin Distribution Learning with Cost Interval and Unlabeled Data.

Yu-Hang Zhou,Zhi-Hua Zhou
DOI: https://doi.org/10.1109/tkde.2016.2535283
IF: 9.235
2016-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:In many real-world applications, different types of misclassification usually suffer from different costs, but the accurate cost is often hard to be determined and usually one can only get an interval-estimation like that one type of mistake is about 5 to 10 times more serious than the other type. On the other hand, there are usually abundant unlabeled data available, leading to great research effort about semi-supervised learning. It is noticeable that cost interval and unlabeled data usually appear simultaneously in practice tasks; however, there is rare study tackling them together. In this paper, we propose the cisLDM approach which is able to handle cost interval and exploit unlabeled data in a principled way. Rather than maximizing the minimum margin like traditional large margin classifiers, cisLDM tries to optimize the margin distribution on both labeled and unlabeled data when minimizing the worst-case total-cost and the mean total-cost simultaneously according to the cost interval. Experiments on a broad range of datasets and cost settings exhibit the impressive performance of cisLDM. In particular, cisLDM is able to reduce 47 percent more total-cost than standard SVM and 27 percent more total-cost than cost-sensitive semi-supervised SVM which assumes the true cost value is known in advance.
What problem does this paper attempt to address?