Estimating Socioeconomic Proxy Variables Using Multimodal Deep Learning Models

Yanbing Bai,Zelan Zhu,Huixue Su,Xiao Liu,Liangzhi Li
DOI: https://doi.org/10.1007/978-981-97-5618-6_35
2024-01-01
Abstract:Timely and accurate socioeconomic indicator monitoring is vital for understanding development trends and informing policy decisions. Traditional data collection methods are costly and labor-intensive, leading to reliance on proxy variables. Despite their effectiveness, unimodal variables have limitations in capturing the complexity of regional socioeconomic activities. This study introduces a novel framework that leverages multimodal data, integrating satellite imagery, street view images, and text data to enhance socioeconomic indicator estimation. The framework involves three stages: (i) data collection and feature extraction using deep learning models; (ii) alignment of different data modalities to ensure spatial correlation; (iii) application of three feature fusion strategies: Concatenation Fusion, Cross-attn Fusion, and Convolution Fusion to construct comprehensive multimodal features. These features are used to train a prediction network for socioeconomic indicators estimation. This study underscores the potential of multimodal data in improving the performance and interpretability of socioeconomic indicator estimation.
What problem does this paper attempt to address?