Analyzing Bias in Sensitive Personal Information Used to Train Financial Models

Reginald Bryant,Celia Cintas,Isaac Wambugu,Andrew Kinai,Komminist Weldemariam
DOI: https://doi.org/10.48550/arXiv.1911.03623
2019-11-09
Abstract:Bias in data can have unintended consequences that propagate to the design, development, and deployment of machine learning models. In the financial services sector, this can result in discrimination from certain financial instruments and services. At the same time, data privacy is of paramount importance, and recent data breaches have seen reputational damage for large institutions. Presented in this paper is a trusted model-lifecycle management platform that attempts to ensure consumer data protection, anonymization, and fairness. Specifically, we examine how datasets can be reproduced using deep learning techniques to effectively retain important statistical features in datasets whilst simultaneously protecting data privacy and enabling safe and secure sharing of sensitive personal information beyond the current state-of-practice.
Cryptography and Security,Databases,Machine Learning
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on bias in the use of sensitive personal information and privacy protection in financial model training. Specifically: 1. **Data Bias**: The paper explores how to preserve important statistical features during the generation of data sets while reducing or eliminating bias in the data. These biases may lead to discrimination in the design, development, and deployment of financial models, affecting the fairness of certain financial instruments and services. 2. **Data Privacy**: The paper proposes a method to protect personal sensitive information by generating synthetic data using deep - learning techniques, ensuring the security and privacy of data when shared. This method can avoid the risk of data re - identification that may be brought by traditional anonymization techniques. 3. **Data Sharing**: The paper also focuses on how to achieve the secure sharing of sensitive personal information while protecting privacy, especially in the financial services industry. This helps financial institutions better utilize data for model training and optimization while complying with regulations and protecting consumer rights. In summary, the main goal of this paper is to build a reliable model life - cycle management platform to ensure the protection, anonymization, and fairness of consumer data, thereby achieving safer and fairer data use in the financial services industry.