Interpretable Data Mining Approaches to Predict Term Deposits Subscriptions

Qingyang Chen
DOI: https://doi.org/10.54691/bcpbm.v44i.4841
2023-04-27
BCP Business & Management
Abstract:Under the covid pandemic, informatization has accelerated. Customer behaviors have changed dramatically in almost every industry, which requires companies and organizations to analyze the latest customer features and develop specialized marketing contents. Banks play a major role in the financial markets, and term deposit subscription is one of the most important products of banks. In this study, a Portuguese retail bank’s telemarketing data on selling term deposits has been analyzed which includes 16 input variables. All 16 variables have been converted to numeric inputs by using one-hot encoding method. For testing, the dataset has been divided into train and test groups by 80% and 20% respectively. To make this study more interpretable, two tree-structured data mining models, decision tree and random forest classifier, has been trained and tested. The input variable called “Duration” can only be known after the outcome is known so it has been dropped when creating predictive models. Two feature importance histograms and three confusion matrics are generated to visualize the customer features and evaluate the models. Area under Curve-Receiver Operating Characteristic Score (AUC-ROC scores) are also computed to compare the accuracy of models. Testing results have proved that “duration” has a severe relationship with clients’ subscriptions and should not be included in predictive models. Random forest classifier is better than decision tree and can generate comparable accuracy to previous models which included “duration.” Balance and age are the top two influential customer features on term deposit subscriptions.
What problem does this paper attempt to address?