Denoising ESG: quantifying data uncertainty from missing data with Machine Learning and prediction intervals

Sergio Caprioli,Jacopo Foschi,Riccardo Crupi,Alessandro Sabatino
2024-07-29
Abstract:Environmental, Social, and Governance (ESG) datasets are frequently plagued by significant data gaps, leading to inconsistencies in ESG ratings due to varying imputation methods. This paper explores the application of established machine learning techniques for imputing missing data in a real-world ESG dataset, emphasizing the quantification of uncertainty through prediction intervals. By employing multiple imputation strategies, this study assesses the robustness of imputation methods and quantifies the uncertainty associated with missing data. The findings highlight the importance of probabilistic machine learning models in providing better understanding of ESG scores, thereby addressing the inherent risks of wrong ratings due to incomplete data. This approach improves imputation practices to enhance the reliability of ESG ratings.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the inconsistent scoring and reliability issues in the environmental, social, and governance (ESG) data sets due to a large amount of missing data. Specifically, the paper focuses on using machine - learning techniques to deal with the missing - value problem in ESG data sets and quantifies the risks caused by data anomalies by introducing prediction uncertainty, thereby providing a tool to assess the impact of these risks on score variability. The paper compares multiple data imputation methods, such as K - Nearest Neighbors (KNN), Gradient Boosting, Multiple Imputation by Chained Equations (MICE), and neural networks, etc., aiming to improve the accuracy of data imputation and quantify uncertainty in order to achieve reliable ESG scoring. Through this method, the paper hopes to provide a more reliable method for ESG scoring in the banking and financial fields, reduce the scoring differences caused by different data imputation methods, and thus enhance the effectiveness and credibility of ESG ratings in sustainable investment and financing decisions.