Corporate Bankruptcy Prediction with Domain-Adapted BERT

Alex Kim,Sangwon Yoon
DOI: https://doi.org/10.18653/v1/2021.econlp-1.4
2023-12-06
Abstract:This study performs BERT-based analysis, which is a representative contextualized language model, on corporate disclosure data to predict impending bankruptcies. Prior literature on bankruptcy prediction mainly focuses on developing more sophisticated prediction methodologies with financial variables. However, in our study, we focus on improving the quality of input dataset. Specifically, we employ BERT model to perform sentiment analysis on MD&A disclosures. We show that BERT outperforms dictionary-based predictions and Word2Vec-based predictions in terms of adjusted R-square in logistic regression, k-nearest neighbor (kNN-5), and linear kernel support vector machine (SVM). Further, instead of pre-training the BERT model from scratch, we apply self-learning with confidence-based filtering to corporate disclosure data (10-K). We achieve the accuracy rate of 91.56% and demonstrate that the domain adaptation procedure brings a significant improvement in prediction accuracy.
Computation and Language,Machine Learning,General Economics
What problem does this paper attempt to address?
This paper attempts to address the issue of improving the quality of input data in corporate bankruptcy prediction, particularly in the use of non-financial information. Specifically: 1. **Improving Text Analysis Methods**: The paper enhances prediction accuracy by using the BERT model to perform sentiment analysis on the Management Discussion and Analysis (MD&A) section disclosed by companies. Traditional dictionary methods have limitations when dealing with complex texts, whereas context-based sentiment analysis can more accurately capture the emotional information in the text. 2. **Domain Adaptation**: Since publicly available pre-trained models (such as Fin-BERT) are trained on financial news data and are not entirely suitable for interpreting company disclosure documents, the authors applied a self-learning approach to adjust the model, making it better adapted to the target domain (i.e., company disclosure texts), thereby improving model performance. In summary, this paper aims to demonstrate that using context-aware language models (such as BERT) and domain adaptation techniques can significantly enhance the effectiveness of corporate bankruptcy prediction based on textual data.