Data Pre-Processing Algorithm for Neural Network Binary Classification Model in Bank Tele-Marketing

Khairul Nizam Abd Halim*,Abdul Syukor Mohd Jaya,Ahmad Firdaus Ahmad Fadzil,,,
DOI: https://doi.org/10.35940/ijitee.c8472.019320
2020-01-10
International Journal of Innovative Technology and Exploring Engineering
Abstract:Tele-marketing presents a huge challenge in identifying potential customers with lack of effective marketing strategy may led a company to succumbs to problems such as prolonged marketing campaign. Various attempts to improve the performance of binary classification model for bank tele-marketing data. Previous researches indicate that the neural network is the most common algorithms being employed and able to produce commendable results with higher accuracy percentages compared to other algorithms. Despite several attempts to improve the model through treatment of imbalance dataset and features selection, this research argues that they are incomplete. Therefore, this research proposes a data pre-processing algorithm for bank tele-marketing binary classification neural network. Three datasets have been employed (19, 16, and 20 features) to evaluate the performance of the algorithm towards the classification model. The data pre-processing algorithm is divided into three phases; data cleaning, data imbalance treatment and finally data normalization. In this paper, the result indicated that binary classification model complemented with data cleaning techniques such as Missing common (MC) and Tomek Links (TL) shows a better result compared to Ignore Missing (IM). In terms of data normalization, techniques such as MaxAbsScaler (MAS) and MinMaxScaler (MMS) consistently indicated better performance from other normalization techniques. The classification model employed in this paper utilize data pre-processing algorithm combination of MC-TL-MMS. The algorithm using this approach able to record an area of the receiver operating characteristic curve (AUC) of 0.9129 and 0.9464 by using 16 features and 20 features respectively. This result presents the highest figure in terms of performance accuracy compared to other previous researches
What problem does this paper attempt to address?