Rare Event Prediction Using Similarity Majority Under-Sampling Technique

Jinyan Li,Simon Fong,Shimin Hu,Victor W. Chu,Raymond K. Wong,Sabah Mohammed,Nilanjan Dey
DOI: https://doi.org/10.1007/978-981-10-7242-0_3
IF: 3.732
2017-01-01
Soft Computing
Abstract:In data mining it is not uncommon to be confronted by imbalanced classification problem in which interesting samples are rare. Having too many ordinary but too few rare samples as training data, will mislead the classifier to become over-fitted by learning too much from majority class samples and become under-fitted lacking recognizing power for minority class samples. In this research work, a novel rebalancing technique that under-samples (reduce by sampling) the majority class size for subsiding the imbalanced class distributions without synthesizing extra training samples, is studied. This simple method is called Similarity Majority Under-Sampling Technique (SMUTE). By measuring the similarity between each majority class sample and its surrounding minority class samples, SMUTE effectively discriminates the majority and minority class samples with consideration of not changing too much of the underlying non-linear mapping between the input variables and the target classes. Two experiments are conducted and reported in this paper: one is an extensive performance comparison of SMUTE with the states-of-the-arts using generated imbalanced data; the other is the use of real data representing a case of natural disaster prevention where accident samples are rare. SMUTE is found to be working favourably well over other methods in both cases.
What problem does this paper attempt to address?