Nano-ESG: Extracting Corporate Sustainability Information from News Articles

Fabian Billert,Stefan Conrad
2024-12-20
Abstract:Determining the sustainability impact of companies is a highly complex subject which has garnered more and more attention over the past few years. Today, investors largely rely on sustainability-ratings from established rating-providers in order to analyze how responsibly a company acts. However, those ratings have recently been criticized for being hard to understand and nearly impossible to reproduce. An independent way to find out about the sustainability practices of companies lies in the rich landscape of news article data. In this paper, we explore a different approach to identify key opportunities and challenges of companies in the sustainability domain. We present a novel dataset of more than 840,000 news articles which were gathered for major German companies between January 2023 and September 2024. By applying a mixture of Natural Language Processing techniques, we first identify relevant articles, before summarizing them and extracting their sustainability-related sentiment and aspect using Large Language Models (LLMs). Furthermore, we conduct an evaluation of the obtained data and determine that the LLM-produced answers are accurate. We release both datasets at <a class="link-external link-https" href="https://github.com/Bailefan/Nano-ESG" rel="external noopener nofollow">this https URL</a>.
Information Retrieval
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the complexity and transparency issues in corporate sustainability impact assessment. Specifically, the author attempts to provide an independent method through news article data to understand companies' sustainability practices. The following are the main problems and motivations of this research: 1. **Limitations of existing ESG ratings**: - Currently, investors mainly rely on ESG scores provided by third - party rating agencies to analyze companies' social responsibility performance. - These ratings have recently been criticized because they are difficult to understand and almost impossible to reproduce. - Different rating agencies have large differences in scores for the same company, causing user confusion and doubts about the effectiveness of the scores. 2. **Utilizing news article data**: - News articles provide a rich, real - time, and unfiltered source of information, reflecting the public's views on corporate sustainability practices and current events. - Unlike formal reports or corporate disclosures, news reports can capture the immediate reactions, controversies, or initiatives of corporate actions. - Using natural language processing (NLP) techniques to extract ESG information from these public data sources can enable different stakeholders to track corporate sustainability more transparently, without relying entirely on ratings that may mask the complexity of ESG performance. 3. **Creating the Nano - ESG dataset**: - The author constructed a dataset containing more than 840,000 news articles, covering news reports of major German companies from January 2023 to September 2024. - By applying multiple NLP techniques and large language models (LLM), they first identify relevant news articles, then summarize them, and extract their sustainability - related sentiment and aspects (i.e., environment, society, or governance). - Finally, the author evaluated the quality of the generated data and determined that the answers generated by the LLM were accurate. ### Summary This paper solves the opacity and non - reproducibility problems of existing ESG ratings by constructing and evaluating a dataset named Nano - ESG. It uses news article data and advanced NLP techniques to provide a new, more transparent method for corporate sustainability assessment. This not only helps investors better understand companies' sustainability performance but also can assist companies themselves in improving their sustainability practices.