Interpretability of deep learning models in analysis of Spanish financial text

César Vaca,Manuel Astorgano,Alfonso J. López-Rivero,Fernando Tejerina,Benjamín Sahelices
DOI: https://doi.org/10.1007/s00521-024-09474-8
2024-02-26
Neural Computing and Applications
Abstract:Abstract Artificial intelligence methods based on deep learning (DL) have recently made significant progress in many different areas including free text classification and sentiment analysis. We believe that corporate governance is one of these areas, where DL can generate very valuable and differential knowledge, for example, by analyzing the biographies of independent directors, which allows for qualitative modeling of their profile in an automatic way. For this technology to be accepted it is important to be able to explain how it generates its results. In this work we have developed a six-dimensional labeled dataset of independent director biographies, implemented three recurrent DL models based on LSTM and transformers along with four ensembles, one of which is an innovative proposal based on a multi-layer perceptron (MLP), trained them using Spanish language and economics and finance terminology and performed a comprehensive test study that demonstrates the accuracy of the results. We have also performed a complete study of explainability using the SHAP methodology by comparatively analyzing the developed models. We have achieved a mean error (MAE) of 8% in the modeling of the open text biographies, which has allowed us to perform a case study of time analysis that has detected significant variations in the composition of the Standard Expertise Profile (SEP) of the boards of directors, related to the crisis of the period 2008–2013. This work shows that DL technology can be accurately applied to free text analysis in the finance and economic domain, by automatically analyzing large volumes of data to generate knowledge that would have been unattainable by other means.
computer science, artificial intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the interpretability of deep - learning (DL) models in analyzing Spanish financial texts, especially in the aspect of automatic qualitative modeling of the biographies of independent directors. Specifically, the author aims to solve the problem through the following aspects: 1. **Dataset Creation**: A dataset of biographies of independent directors containing more than 1,000 tagged entries was created. These biographies describe the directors' professional backgrounds and careers. 2. **Model Development and Training**: Three deep - learning models based on recurrent neural networks (such as LSTM and Transformer) and four ensemble models (one of which is based on the multi - layer perceptron MLP) were developed and trained. These models were trained with terms in Spanish and in the economic and financial fields. 3. **Interpretability Research**: A comprehensive interpretability analysis of the developed models was carried out using the SHAP method to explain how the models generate their results. This includes relating the model outputs to the main elements considered when generating these outputs. 4. **Case Study**: Through the analysis of the time - evolution of the results of a large - scale dataset, the changing trends of the Standard Professional Knowledge Configuration (SEP) before and after the 2008 - 2013 financial crisis were revealed, demonstrating the application potential of DL technology in the field of corporate governance. ### Specific Problem Description - **Unstructured Text Analysis**: The biographies of independent directors are unstructured free - text, which is difficult to analyze manually. The paper solves this problem through an automated method. - **Large - Scale Data Analysis**: A large amount of biographical information needs to be processed, and manual analysis consumes a great deal of resources. The paper proposes using deep - learning models to automatically process these data. - **Model Interpretability**: Deep - learning models are usually regarded as "black boxes", and it is difficult to explain their decision - making processes. The paper improves the interpretability of the models by introducing methods such as SHAP. ### Main Contributions - A unique six - dimensional annotated dataset for describing the professional backgrounds of independent directors was created. - Three state - of - the - art deep - learning models and four ensemble models were trained and adjusted, one of which is an innovative MLP model, significantly improving the model accuracy. - Testing research was carried out to characterize the quality of the proposal, and interpretability research was carried out to help understand the mechanism by which the model generates results. - Finally, a case study was carried out using the best model, and the SEP index was proposed to represent the influence of the biographies of independent directors on their company management. Through these efforts, the paper shows that deep - learning technology can be accurately applied to the free - text analysis in the financial and economic fields and automatically generate knowledge, which cannot be achieved by other means.