Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending

Mario Sanz-Guerrero,Javier Arroyo
2024-08-05
Abstract:Peer-to-peer (P2P) lending has emerged as a distinctive financing mechanism, linking borrowers with lenders through online platforms. However, P2P lending faces the challenge of information asymmetry, as lenders often lack sufficient data to assess the creditworthiness of borrowers. This paper proposes a novel approach to address this issue by leveraging the textual descriptions provided by borrowers during the loan application process. Our methodology involves processing these textual descriptions using a Large Language Model (LLM), a powerful tool capable of discerning patterns and semantics within the text. Transfer learning is applied to adapt the LLM to the specific task at hand. Our results derived from the analysis of the Lending Club dataset show that the risk score generated by BERT, a widely used LLM, significantly improves the performance of credit risk classifiers. However, the inherent opacity of LLM-based systems, coupled with uncertainties about potential biases, underscores critical considerations for regulatory frameworks and engenders trust-related concerns among end-users, opening new avenues for future research in the dynamic landscape of P2P lending and artificial intelligence.
Risk Management,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve The paper aims to address the issue of information asymmetry in peer-to-peer (P2P) lending. In P2P lending, borrowers submit loan applications through online platforms, and potential lenders need to decide whether to invest based on this information. However, lenders often lack sufficient data to assess the creditworthiness of borrowers. To tackle this challenge, the paper leverages large language models (LLMs), specifically BERT (Bidirectional Encoder Representations from Transformers), to analyze the loan descriptions provided by borrowers. Specifically, the paper applies transfer learning to enable BERT to distinguish between default and non-default loans and generate a risk score. The study finds that combining the risk score generated by BERT with traditional variables can significantly enhance the performance of credit evaluation models. Additionally, this approach reduces subjectivity and complexity as it does not require manual labeling. Nevertheless, the opacity and potential biases of LLMs highlight the need for transparent regulatory frameworks to build trust among users and regulatory bodies. Overall, the paper demonstrates how BERT can be used to extract useful information from textual descriptions to improve credit risk assessment methods in P2P lending and emphasizes the importance of transparency and trust.