Xishun Liao,Yifan Liu,Chenchen Kuai,Haoxuan Ma,Yueshuai He,Shangqing Cao,Chris Stanford,Jiaqi Ma
Abstract:Understanding human mobility patterns is crucial for urban planning, transportation management, and public health. This study tackles two primary challenges in the field: the reliance on trajectory data, which often fails to capture the semantic interdependencies of activities, and the inherent incompleteness of real-world trajectory data. We have developed a model that reconstructs and learns human mobility patterns by focusing on semantic activity chains. We introduce a semi-supervised iterative transfer learning algorithm to adapt models to diverse geographical contexts and address data scarcity. Our model is validated using comprehensive datasets from the United States, where it effectively reconstructs activity chains and generates high-quality synthetic mobility data, achieving a low Jensen-Shannon Divergence (JSD) value of 0.001, indicating a close similarity between synthetic and real data. Additionally, sparse GPS data from Egypt is used to evaluate the transfer learning algorithm, demonstrating successful adaptation of US mobility patterns to Egyptian contexts, achieving a 64\% of increase in similarity, i.e., a JSD reduction from 0.09 to 0.03. This mobility reconstruction model and the associated transfer learning algorithm show significant potential for global human mobility modeling studies, enabling policymakers and researchers to design more effective and culturally tailored transportation solutions.
What problem does this paper attempt to address?
This paper attempts to solve two main problems:
1. **Limitations of relying on trajectory data**: Most studies rely on trajectory data to analyze spatio - temporal patterns, but this method often fails to capture the semantic interdependencies between activities. For example, it cannot answer key questions about human behavior, such as how people arrange their daily activities, which activities usually occur successively, and the distribution of activities throughout the day (such as working and school hours). Understanding these semantic relationships is crucial for establishing a comprehensive human mobility model.
2. **Incompleteness of real - world trajectory data**: Due to the intermittent nature of data collection and privacy issues, actual trajectory data usually provides an incomplete or fragmented view of an individual's daily movement patterns. This incompleteness makes it difficult to model and understand the full picture of human activities and their interdependencies, especially in different situations.
To solve these problems, the author proposes a new method, which specifically includes the following points:
- **Reconstruction of semantic activity chains**: By focusing on semantic activity chains rather than simply trajectory data, the model can infer missing activities, understand the dependencies between activities, and capture the temporal patterns of human behavior.
- **Semi - supervised iterative transfer learning algorithm**: In order to adapt to different geographical environments and solve the problem of data scarcity, the author introduces a semi - supervised iterative transfer learning algorithm. This algorithm can effectively transfer knowledge across data sets and regions without a large amount of real - data.
### Specific problem description
In the paper, the j - th trajectory of an agent i is defined to contain N stop - over points, which are represented as:
\[ \text{Tra}_{ji} = \{P_{i,j}^1, P_{i,j}^2, \ldots, P_{i,j}^{No}\} \]
Each stop - over point \( P_{i,j}^n \) contains an activity type \( T_{i,j}^n \), a GPS location \((x,y)_{i,j}\), a start time \( S_{i,j}^n \) and an end time \( E_{i,j}^n \).
Due to the fragmented nature of GPS - based data collection methods, stop - over points usually only represent certain moments of the day and cannot cover all of the agent's daily activities. Therefore, the activity chains in the actual data set are usually incomplete. For example, the recorded activity chain may include a "Home" activity from "02 - 01 00:00 to 02 - 01 07:00", a "Work" activity from "02 - 01 08:00 to 02 - 01 10:00", etc., but there is a significant gap between "02 - 01 10:00 and 02 - 02 00:00".
### Solution
Given an incomplete activity chain, the model \( M_{R1} \) can reconstruct the potentially missing daily activities according to the common activity patterns in the area. For example, the model can fill in the missing time periods, complete the "Work" period, add an "EatOut (buying a meal)" activity, and propose a "Home" activity to complete the daily cycle, ensuring the formation of a reasonable activity chain.
### Performance evaluation
The performance of the model is quantified by evaluating the similarity between the generated activity pattern and the real - world (baseline) activity pattern distribution. This paper uses Jensen - Shannon Divergence (JSD) as the similarity metric, and the formula is as follows:
\[ \text{JSD}(P \| Q) = \frac{1}{2} \sum_{x \in X} \left[ P(x) \log \left( \frac{P(x)}{M(x)} \right) \right] + \frac{1}{2} \sum_{x \in X} \left[ Q(x) \log \left( \frac{Q(x)}{M(x)} \right) \right] \]
where \( M=\frac{P + Q}{2}\), \( P \) represents the activity pattern distribution extracted from the generated activity chain, \( Q \) represents the activity pattern distribution calculated from the real activity chain, and \( X \) represents a specific activity.