GeoSEE: Regional Socio-Economic Estimation With a Large Language Model

Sungwon Han,Donghyun Ahn,Seungeon Lee,Minhyuk Song,Sungwon Park,Sangyoon Park,Jihee Kim,Meeyoung Cha
2024-06-14
Abstract:Moving beyond traditional surveys, combining heterogeneous data sources with AI-driven inference models brings new opportunities to measure socio-economic conditions, such as poverty and population, over expansive geographic areas. The current research presents GeoSEE, a method that can estimate various socio-economic indicators using a unified pipeline powered by a large language model (LLM). Presented with a diverse set of information modules, including those pre-constructed from satellite imagery, GeoSEE selects which modules to use in estimation, for each indicator and country. This selection is guided by the LLM's prior socio-geographic knowledge, which functions similarly to the insights of a domain expert. The system then computes target indicators via in-context learning after aggregating results from selected modules in the format of natural language-based texts. Comprehensive evaluation across countries at various stages of development reveals that our method outperforms other predictive models in both unsupervised and low-shot contexts. This reliable performance under data-scarce setting in under-developed or developing countries, combined with its cost-effectiveness, underscores its potential to continuously support and monitor the progress of Sustainable Development Goals, such as poverty alleviation and equitable growth, on a global scale.
Computers and Society
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of how to accurately estimate regional socio - economic indicators in the case of data scarcity, especially in developing or under - developed countries. Traditional survey methods face challenges such as high costs, logistical complexity, and vulnerability to natural disasters or conflicts, resulting in difficulties in obtaining data in affected areas. Therefore, researchers have explored new methods of combining heterogeneous data sources with AI - driven reasoning models to measure socio - economic conditions, such as poverty and population distribution, within a wide geographical area. Specifically, the study proposes GeoSEE (Regional Socio - Economic Estimation with a Large Language Model), which is a unified pipeline method based on large language models (LLM) for estimating various socio - economic indicators. The main features of GeoSEE include: 1. **Multi - modal data fusion**: GeoSEE utilizes multiple non - traditional data sources such as satellite images and point - of - interest (POI) data, and determines which modules are most suitable for the situation in specific countries and regions through a feature selection mechanism. 2. **Adaptive module selection**: Through the knowledge base of LLM, GeoSEE can select the most relevant modules for information extraction according to task requirements, thereby improving prediction accuracy. 3. **Context learning**: Using text paragraphs described in natural language, compare data in different regions through context learning to calculate target indicators. 4. **Applicable to data - scarce environments**: Especially in developing countries lacking ground - truth labels, GeoSEE performs well and can provide reliable prediction results in the case of unsupervised or few - sample situations. In conclusion, GeoSEE provides new tools and technical support for monitoring and evaluating sustainable development goals (such as poverty reduction, balanced growth, etc.) on a global scale, especially in regions with limited resources and different development levels.