Margin-Based Over-Sampling Method for Learning from Imbalanced Datasets.

Xiannian Fan,Ke Tang,Thomas Weise
DOI: https://doi.org/10.1007/978-3-642-20847-8_26
2011-01-01
Abstract:Learning from imbalanced datasets has drawn more and more attentions from both theoretical and practical aspects. Over- sampling is a popular and simple method for imbalanced learning. In this paper, we show that there is an inherently potential risk associated with the over-sampling algorithms in terms of the large margin principle. Then we propose a new synthetic over sampling method, named Margin-guided Synthetic Over-sampling (MSYN), to reduce this risk. The MSYN improves learning with respect to the data distributions guided by the margin-based rule. Empirical study verities the efficacy of MSYN.
What problem does this paper attempt to address?