Abstract:Abstract This paper examines the population heterogeneity of travel behaviours from a combined perspective of individual actors and collective behaviours. We use a social media dataset of 652,945 geotagged tweets generated by 2,933 Swedish Twitter users covering an average time span of 3.6 years. No explicit geographical boundaries, such as national borders or administrative boundaries, are applied to the data. We use spatial features, such as geographical characteristics and network properties, and apply a clustering technique to reveal the heterogeneity of geotagged activity patterns. We find four distinct groups of travellers: local explorers (78.0%), local returners (14.4%), global explorers (7.3%), and global returners (0.3%). These groups exhibit distinct mobility characteristics, such as trip distance, diffusion process, percentage of domestic trips, visiting frequency of the most-visited locations, and total number of geotagged locations. Geotagged social media data are gradually being incorporated into travel behaviour studies as user-contributed data sources. While such data have many advantages, including easy access and the flexibility to capture movements across multiple scales (individual, city, country, and globe), more attention is still needed on data validation and identifying potential biases associated with these data. We validate against the data from a household travel survey and find that despite good agreement of trip distances (one-day and long-distance trips), we also find some differences in home location and the frequency of international trips, possibly due to population bias and behaviour distortion in Twitter data. Future work includes identifying and removing additional biases so that results from geotagged activity patterns may be generalised to human mobility patterns. This study explores the heterogeneity of behavioural groups and their spatial mobility including travel and day-to-day displacement. The findings of this paper could be relevant for disease prediction, transport modelling, and the broader social sciences.

Confounds and Consequences in Geotagged Twitter Data

A Large-Scale Empirical Study of Geotagging Behavior on Twitter

Location Inference for Non-Geotagged Tweets in User Timelines [Extended Abstract]

Location Inference for Non-geotagged Tweets in User Timelines

Geolocation differences of language use in urban areas

Race, Religion and the City: Twitter Word Frequency Patterns Reveal Dominant Demographic Dimensions in the United States

Identifying Data Noises, User Biases, and System Errors in Geo-tagged Twitter Messages (Tweets)

Characterizing Interconnections and Linguistic Patterns in Twitter

Understanding the spatio-temporal characteristics of Twitter data with geotagged and non-geotagged content: two case studies with the topic of flu and Ted (movie)

Geolocated Social Media Posts are Happier: Understanding the Characteristics of Check-in Posts on Twitter

Where did you tweet from? Inferring the origin locations of tweets based on contextual information

A Comparative Analysis of Content-based Geolocation in Blogs and Tweets

Enhanced geocoding precision for location inference of tweet text using spaCy, Nominatim and Google Maps. A comparative analysis of the influence of data selection

HisRect: Features from Historical Visits and Recent Tweet for Co-Location Judgement

Geo-temporal Twitter demographics

Predicting Demographics of High-Resolution Geographies with Geotagged Tweets

The Geography of Happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place

From individual to collective behaviours: exploring population heterogeneity of human mobility based on social media data

Gender identity and lexical variation in social media

Explore Spatiotemporal and Demographic Characteristics of Human Mobility via Twitter: A Case Study of Chicago

Understanding temporal and spatial patterns of urban activities across demographic groups through geotagged social media data