Mohit Agarwal,Mimi Sun,Chaitanya Kamath,Arbaaz Muslim,Prithul Sarker,Joydeep Paul,Hector Yee,Marcin Sieniek,Kim Jablonski,Yael Mayer,David Fork,Sheila de Guia,Jamie McPike,Adam Boulanger,Tomer Shekel,David Schottlander,Yao Xiao,Manjit Chakravarthy Manukonda,Yun Liu,Neslihan Bulut,Sami Abu-el-haija,Arno Eigenwillig,Parth Kothari,Bryan Perozzi,Monica Bharel,Von Nguyen,Luke Barrington,Niv Efron,Yossi Matias,Greg Corrado,Krish Eswaran,Shruthi Prabhakara,Shravya Shetty,Gautam Prasad

Abstract:Supporting the health and well-being of dynamic populations around the world requires governmental agencies, organizations and researchers to understand and reason over complex relationships between human behavior and local contexts in order to identify high-risk groups and strategically allocate limited resources. Traditional approaches to these classes of problems often entail developing manually curated, task-specific features and models to represent human behavior and the natural and built environment, which can be challenging to adapt to new, or even, related tasks. To address this, we introduce a Population Dynamics Foundation Model (PDFM) that aims to capture the relationships between diverse data modalities and is applicable to a broad range of geospatial tasks. We first construct a geo-indexed dataset for postal codes and counties across the United States, capturing rich aggregated information on human behavior from maps, busyness, and aggregated search trends, and environmental factors such as weather and air quality. We then model this data and the complex relationships between locations using a graph neural network, producing embeddings that can be adapted to a wide range of downstream tasks using relatively simple models. We evaluate the effectiveness of our approach by benchmarking it on 27 downstream tasks spanning three distinct domains: health indicators, socioeconomic factors, and environmental measurements. The approach achieves state-of-the-art performance on all 27 geospatial interpolation tasks, and on 25 out of the 27 extrapolation and super-resolution tasks. We combined the PDFM with a state-of-the-art forecasting foundation model, TimesFM, to predict unemployment and poverty, achieving performance that surpasses fully supervised forecasting. The full set of embeddings and sample code are publicly available for researchers.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to understand and model the complex relationship between dynamic populations and their environments more effectively to support population health and well - being on a global scale. Specifically, the paper aims to address the following challenges by constructing a general geospatial reasoning model - the Population Dynamics Foundation Model (PDFM): 1. **Limitations of traditional methods**: Traditional geospatial modeling methods usually require manual curation of features and models for specific tasks, which makes them difficult to adapt to new or related tasks. In addition, these methods also have difficulties in handling multi - source heterogeneous data. 2. **Needs for cross - domain applications**: In order to better understand the relationship between human behavior and the natural and built environments, a model that can handle multiple data modalities and is applicable to a wide range of geospatial tasks is required. These tasks include health indicators, socio - economic factors, and environmental measurements, etc. 3. **Resource allocation and risk assessment**: Government agencies, organizations, and researchers need to identify populations at high risk and determine how to effectively allocate limited assistance resources. This requires that the model can not only perform accurate spatial interpolation and extrapolation but also perform super - resolution prediction and time - series forecasting. To solve these problems, the paper introduces PDFM. This model models rich aggregated information at the zip code and county levels through Graph Neural Network (GNN), including maps, congestion levels, search trends, and environmental factors such as weather and air quality. The embeddings generated by PDFM can be used for various downstream tasks, such as interpolation, extrapolation, super - resolution, and prediction, thus providing a general and flexible geospatial modeling framework. In summary, the main objective of this paper is to develop a basic model that can efficiently process multi - source geospatial data to support a wide range of geospatial reasoning tasks, especially applications in the health, socio - economic, and environmental fields.

General Geospatial Inference with a Population Dynamics Foundation Model

A Multi-Scale Unified Model of Human Mobility in Urban Agglomerations

Unraveling near real-time spatial dynamics of population using geographical ensemble learning

Community search signatures as foundation features for human-centered geospatial modeling

Population Mapping with Multisensor Remote Sensing Images and Point-Of-Interest Data.

Improved Population Mapping for China Using Remotely Sensed and Points-of-interest Data Within a Random Forests Model.

Diagnosing the performance of human mobility models at small spatial scales using volunteered geographic information

Spatiotemporal factor models for functional data with application to population map forecast

Spatial-Attention and Demographic-Augmented Generative Adversarial Imputation Network for Population Health Data Reconstruction

Curating Transient Population in Urban Dynamics System

Interpretable Deep Learning for Consistent Large-Scale Urban Population Estimation Using Earth Observation Data

DeepDPM: Dynamic Population Mapping via Deep Neural Network

Spatiotemporal Modeling and Forecasting at Scale with Dynamic Generalized Linear Models

Revealing Urban Dynamics by Learning Online and Offline Behaviours Together

Representing Urban Forms: A Collective Learning Model with Heterogeneous Human Mobility Data.

Modeling and Monitoring of Indoor Populations using Sparse Positioning Data (Extension)

Understanding of the predictability and uncertainty in population distributions empowered by visual analytics

Deep Learning for Spatiotemporal Modeling of Urbanization

A Hybrid Approach Integrating a Gravity Model and Machine Learning

EpiGeoPop: A Tool for Developing Spatially Accurate Country-level Epidemiological Models

Foundation Models for Generalist Geospatial Artificial Intelligence