Feature Selection for Loan Repayment Prediction System Using Machine Learning

Jishnu Goyal
DOI: https://doi.org/10.22214/ijraset.2023.49748
International Journal for Research in Applied Science and Engineering Technology
Abstract:Abstract: It is essential for banks to evaluate and predict the repayment ability of the loaners in order to minimise the risk of loan payment default. Due to this, there are systems created by the banks to process the loan request based on the loaners’ status, such as employment status, credit history, etc. This paper attempts to determine the most significant factors/features which help in predicting whether a loan applicant would be able to repay their loan. Feature selection provides an effective way to solve this problem by removing irrelevant and redundant data, which can reduce computation time, improve learning accuracy, and facilitate a better understanding of the learning model or data. In order to properly assess the repayment ability of all groups of people, several frequently-used evaluation measures for feature selection are applied, and different sets of features using different feature selection methods are generated. Afterwards, those sets are tested against different machine learning models, to figure out the most effective feature set that should be analysed in order to figure out the repayment ability of an applicant. The data used in this study was gathered from a Kaggle Dataset which contained the details of over 300,000+ loaners and whether they were able to repay their loans or not. After data cleaning and feature engineering, the dataset still appeared quite imbalanced , so, along with accuracy, other measures such as precision, recall, and F1 Score were also considered. Results of the study indicate that, days employed, number of family members , number of children , income of the person were some of the most significant factors for determining a borrower’s performance.
What problem does this paper attempt to address?