Abstract:Peer-to-peer lending, a novel element of Internet finance that links lenders and borrowers via online platforms, has generated large profits for investors. However, borrowers’ missed payments have negatively impacted the industry’s sustainable growth. It is imperative to create a system that can correctly predict loan defaults to lessen the damage brought on by defaulters. The goal of this study is to fill the gap in the literature by exploring the feasibility of developing prediction models for P2P loan defaults without relying heavily on personal data while also focusing on identifying key variables influencing borrowers’ repayment capacity through systematic feature selection and exploratory data analysis. Given this, this study aims to create a computational model that aids lenders in determining the approval or rejection of a loan application, relying on the financial data provided by applicants. The selected dataset, sourced from an open database, contains 8578 transaction records and includes 14 attributes related to financial information, with no personal data included. A loan dataset is first subjected to an in-depth exploratory data analysis to find behaviors connected to loan defaults. Subsequently, diverse and noteworthy machine learning classification algorithms, including Random Forest, Support Vector Machine, Decision Tree, Logistic Regression, Naïve Bayes, and XGBoost, were employed to build models capable of discerning borrowers who repay their loans from those who do not. Our findings indicate that borrowers who fail to comply with their lenders’ credit policies, pay elevated interest rates, and possess low FICO ratings are at a higher likelihood of defaulting. Furthermore, elevated risk is observed among clients who obtain loans for small businesses. All classification models, including XGBoost and Random Forest, successfully developed and performed satisfactorily and achieved an accuracy of over 80%. When the decision threshold is set to 0.4, the best performance for predicting loan defaulters is achieved using logistic regression, which accurately identifies 83% of the defaulted loans, with a recall of 83%, precision of 21% and f1 score of 33%.

Predicting credit risk on the basis of financial and non-financial variables and data mining

Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines

MACHINE LEARNING-BASED APPROACHES FOR CREDIT CARD DEBT PREDICTION

Algorithm Comparison for Data Mining Classification: Assessing Bank Customer Credit Scoring Default Risk

Default Prediction Model: The Significant Role of Data Engineering in the Quality of Outcomes

Checking account activity and credit default risk of enterprises: An application of statistical learning methods

Credit card score prediction using machine learning models: A new dataset

Improving Credit Risk Assessment through Deep Learning-based Consumer Loan Default Prediction Model

Comparing Data Mining Models in Loan Default Prediction: A Framework and a Demonstration

Performance Comparison of Data Mining Algorithms for the Predictive Accuracy of Credit Card Defaulters

The Impact of Feature Selection and Transformation on Machine Learning Methods in Determining the Credit Scoring

Predicting the Risk Level of a Loan Based on the Customer's Personal Factors Using Machine Learning Models

Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction

Reimagining Peer-to-Peer Lending Sustainability: Unveiling Predictive Insights with Innovative Machine Learning Approaches for Loan Default Anticipation

Enhancing banking governance: A machine learning-based credit risk classification

Enhancing Supervised Model Performance in Credit Risk Classification Using Sampling Strategies and Feature Ranking

A Hybrid Model for Credit Risk Assessment: Empirical Validation by Real-World Credit Data

Prediction and Analysis of Financial Default Loan Behavior Based on Machine Learning Model

An implementation of ensemble methods, logistic regression, and neural network for default prediction in Peer-to-Peer lending

Credit Default Mining Using Combined Machine Learning and Heuristic Approach

Predicting Credit Risk for Unsecured Lending: A Machine Learning Approach