Imbalanced customer classification for bank direct marketing

Georgios Marinakos,Sophia Daskalaki
DOI: https://doi.org/10.1057/s41270-017-0013-7
2017-03-01
Journal of Marketing Analytics
Abstract:This paper aims to contribute insights on data analytics methodologies when applied to direct marketing. From a business perspective, the objective is to unveil those banking customers who are most likely to respond positively to a term deposit marketing campaign. Mathematically, this is a typical classification problem; however, in our case, the class of interest is relatively rare and the dataset imbalanced. The paper offers a comparison of performance between statistical, distance-based, induction and Machine Learning classification algorithms on predicting potential depositors, when trained with imbalanced datasets. The main effort focuses on rebalancing effectively the datasets during training so as to reverse the negative effect of imbalance and to increase the correct classifications for the under-represented class. Distance-based and cluster-based resampling techniques are applied in comparison and in combination in order to understand how customer targeting could become more effective for practitioners. Using a publicly available dataset for direct marketing of bank products, we study the influence of resampling techniques on the different algorithms and conclude that our proposed cluster-based technique is overall the most effective in relation to other well-established techniques.
What problem does this paper attempt to address?