From Where Do Tweets Originate?: a GIS Approach for User Location Inference

Qunying Huang,G. Cao,Caixia Wang
DOI: https://doi.org/10.1145/2755492.2755494
2014-01-01
Abstract:A number of natural language processing and text-mining algorithms have been developed to extract the geospatial cues (e.g., place names) to infer locations of content creators from publicly available information, such as text content, online social profiles, and the behaviors or interactions of users from social networks. These studies, however, can only successfully infer user locations at city levels with relatively decent accuracy, while much higher resolution is required for meaningful spatiotemporal analysis in geospatial fields. Additionally, geographical cues exploited by current text-based approaches are hidden in the unreliable, unstructured, informal, ungrammatical, and multilingual data, and therefore are hard to extract and make meaningful correctly. Instead of using such hidden geographic cues, this paper develops a GIS approach that can infer the true origin of tweets down to the zip code level by using and mining spatial (geo-tags) and temporal (timestamps when a message was posted) information recorded on user digital footprints. Further, individual major daily activity zones and mobility can be successfully inferred and predicted. By integrating GIS data and spatiotemporal clustering methods, this proposed approach can infer individual daily physical activity zones with spatial resolution as high as 20 m by 20 m or even higher depending on the number of digit footprints collected for social media users. The research results with detailed spatial resolution are necessary and useful for various applications such as human mobility pattern analysis, business site selection, disease control, or transportation systems improvement.
What problem does this paper attempt to address?