Towards Deeper Insights into Deep Learning from Imbalanced Data.

Jie Song,Yun Shen,Yongcheng Jing,Mingli Song
DOI: https://doi.org/10.1007/978-981-10-7299-4_56
2017-01-01
Abstract:Imbalanced performance usually happens to those classifiers (including deep neural networks) trained on imbalanced training data. These classifiers are more likely to make mistakes on minority class instances than on those majority class ones. Existing explanations attribute the imbalanced performance to the imbalanced training data. In this paper, using deep neural networks, we strive for deeper insights into the imbalanced performance. We find that imbalanced data is a neither sufficient nor necessary condition for imbalanced performance in deep neural networks, and another important factor for imbalanced performance is the distance between the majority class instances and the decision boundary. Based on our observations, we propose a new undersampling method (named Moderate Negative Mining) which is easy to implement, state-of-the-art in performance and suitable for deep neural networks, to solve the imbalanced classification problem. Various experiments validate our insights and demonstrate the superiority of the proposed under-sampling method.
What problem does this paper attempt to address?