Towards the Inference of Travel Purpose with Heterogeneous Urban Data

Chuishi Meng,Yu Cui,Qing He,Lu Su,Jing Gao
DOI: https://doi.org/10.1109/tbdata.2019.2921823
2022-01-01
IEEE Transactions on Big Data
Abstract:In people's daily lives, travel takes up an important part, and many trips are generated everyday, such as going to school or shopping. With the widely adoption of GPS-integrated devices, a large amount of trips can be recorded with GPS trajectories. These trajectories are represented by sequences of geo-coordinates and can help us answer simple questions such as "where did you go". However, there is another important question awaiting to be answered, that is "what did/will you do", i.e., the trip purpose inference. In practice, people's trip purposes are very important in understanding travel behaviors and estimating travel demands. Obviously, it is very challenging to infer trip purposes solely based on the trajectories, because the GPS devices are not accurate enough to pinpoint the venues visited. In this paper, we infer individual's trip purposes by combining the knowledge from heterogeneous data sources including trajectories, POls and social media data. The proposed Dynamic Bayesian Network model (DBN) captures three important factors: the sequential properties of trip activities, the functionality and POI popularity of trip end areas. In addition, we propose an efficient method with local candidate pools to identify POIs from geo-tagged social media messages, and learn the POI popularities from nearby social media data. Moreover, trip data is usually imbalanced across different activities. This data imbalance problem can cause serious challenges because the DBN model could be biased by those "popular" class labels. Considering this challenge, we propose an ensemble DBN method with sampling technique (eDBN) which results in more accurate inference. Furthermore, real-world trip data are continuously collected on a daily basis. The batch model would result in unnecessary computation because historical data need to be revisited. We handle this problem by proposing an incremental DBN method (iDBN) which is both effective and efficient. Extensive experiments are conducted on real-world data sets with trajectories of 8,361 residents and the 6.9 million geo-tagged tweets in the Bay area. Experimental results demonstrate the advantages of the proposed method on correctly inferring the trip purposes.
What problem does this paper attempt to address?