Predicting Financial Literacy via Semi-supervised Learning

David Hason Rudd,Huan Huo,Guandong Xu
DOI: https://doi.org/10.1007/978-3-030-97546-3_25
2023-12-18
Abstract:Financial literacy (FL) represents a person's ability to turn assets into income, and understanding digital currencies has been added to the modern definition. FL can be predicted by exploiting unlabelled recorded data in financial networks via semi-supervised learning (SSL). Measuring and predicting FL has not been widely studied, resulting in limited understanding of customer financial engagement consequences. Previous studies have shown that low FL increases the risk of social harm. Therefore, it is important to accurately estimate FL to allocate specific intervention programs to less financially literate groups. This will not only increase company profitability, but will also reduce government spending. Some studies considered predicting FL in classification tasks, whereas others developed FL definitions and impacts. The current paper investigated mechanisms to learn customer FL level from their financial data using sampling by synthetic minority over-sampling techniques for regression with Gaussian noise (SMOGN). We propose the SMOGN-COREG model for semi-supervised regression, applying SMOGN to deal with unbalanced datasets and a nonparametric multi-learner co-regression (COREG) algorithm for labeling. We compared the SMOGN-COREG model with six well-known regressors on five datasets to evaluate the proposed models effectiveness on unbalanced and unlabelled financial data. Experimental results confirmed that the proposed method outperformed the comparator models for unbalanced and unlabelled financial data. Therefore, SMOGN-COREG is a step towards using unlabelled data to estimate FL level.
Machine Learning,Computational Engineering, Finance, and Science,Computers and Society,Econometrics
What problem does this paper attempt to address?
The paper attempts to address the problem of predicting customers' Financial Literacy (FL) through Semi-supervised Learning (SSL) methods. Specifically, the paper focuses on the following points: 1. **Importance of Financial Literacy**: Financial literacy refers to an individual's ability to convert assets into income, and in modern definitions, it also includes the understanding of digital currencies. Low financial literacy increases the risk of social harm, so accurately estimating financial literacy is crucial for allocating specific intervention programs. 2. **Limitations of Existing Research**: Currently, there is limited research on the measurement and prediction of financial literacy, leading to a limited understanding of the consequences of customers' financial participation. Most studies rely on questionnaires to assess financial literacy levels, but this method is time-consuming and costly. 3. **Advantages of Semi-supervised Learning**: Using a small amount of labeled data and a large amount of unlabeled data for learning can effectively reduce the cost and time of manual labeling while improving the model's performance. 4. **Handling Imbalanced Datasets**: Financial network data usually has a severe class imbalance problem, meaning that data samples for certain levels of financial literacy are scarce. The paper proposes a semi-supervised regression model (SMOGN-COREG) that combines Synthetic Minority Over-sampling Technique (SMOGN) and Co-training Regression (COREG) to handle imbalanced datasets. 5. **Experimental Validation**: The paper conducts experiments on 5 real-world financial datasets to validate the effectiveness of the proposed SMOGN-COREG model on imbalanced and unlabeled data. The results show that this model outperforms six other commonly used regression algorithms. In summary, the paper aims to effectively predict customers' financial literacy levels through semi-supervised learning methods, particularly the SMOGN-COREG model, thereby providing more targeted interventions for financial institutions and governments.