Abstract:The rapid growth of location-based services (LBS) has yielded massive amounts of data on human mobility. Effectively extracting meaningful representations for user-generated check-in sequences is pivotal for facilitating various downstream services. However, the user-generated check-in data are simultaneously influenced by the surrounding objective circumstances and the user's subjective intention. Specifically, the temporal uncertainty and spatial diversity exhibited in check-in data make it difficult to capture the macroscopic spatial-temporal patterns of users and to understand the semantics of user mobility activities. Furthermore, the distinct characteristics of the temporal and spatial information in check-in sequences call for an effective fusion method to incorporate these two types of information. In this paper, we propose a novel Spatial-Temporal Cross-view Contrastive Representation (STCCR) framework for check-in sequence representation learning. Specifically, STCCR addresses the above challenges by employing self-supervision from "spatial topic" and "temporal intention" views, facilitating effective fusion of spatial and temporal information at the semantic level. Besides, STCCR leverages contrastive clustering to uncover users' shared spatial topics from diverse mobility activities, while employing angular momentum contrast to mitigate the impact of temporal uncertainty and noise. We extensively evaluate STCCR on three real-world datasets and demonstrate its superior performance across three downstream tasks.

What problem does this paper attempt to address?

The paper mainly addresses the data characteristics of user check-in sequences in Location-Based Services (LBS) and proposes a novel Spatio-Temporal Cross-View Contrastive Representation Learning framework (STCCR) to solve several key issues existing in current methods when handling check-in sequences. 1. **Spatio-Temporal Uncertainty**: Due to the influence of users' subjective intentions and objective environmental factors, the temporal information in check-in data is uncertain, making it difficult to accurately capture users' intentions. For example, while it is possible to predict that a user might go for a meal next, the exact arrival time is hard to determine due to various factors. 2. **Spatial Diversity**: Users' activity locations are highly diverse, and even within similar time periods, the activity locations can be completely different. For instance, on weekdays and weekends, users' activity locations usually revolve around different themes (such as work-related or leisure activities), and even on the same type of day (like two weekdays or two weekends), the specific locations rarely repeat. 3. **Effective Fusion of Spatio-Temporal Information**: Spatial information is discrete and diverse, while temporal information is continuous but uncertain, making it challenging to effectively fuse the two types of information together. To address the above challenges, the paper proposes the STCCR framework, with its main contributions as follows: - **Proposing a novel Spatio-Temporal Cross-View Contrastive Representation Learning framework**, which performs self-supervised learning from the perspectives of spatial themes and temporal intentions, and promotes the effective fusion of spatio-temporal information through a cross-view contrastive strategy. - **Adopting an angular momentum contrastive method** to handle the inherent uncertainty of temporal information, by adding a soft margin to the contrastive learning training, filtering out temporal noise, and thereby better capturing users' temporal intentions. - **Performing contrastive clustering in the spatial dimension**, identifying shared spatial themes by exploring high-level semantic information in check-in sequences, thus overcoming the issue of location diversity. In the experimental section, the paper conducts extensive evaluations on three real-world datasets, validating the superior performance of STCCR on three downstream tasks, including next location prediction, trajectory-user linking, and temporal prediction. These results demonstrate the effectiveness and generalization ability of the proposed model.

Spatial-Temporal Cross-View Contrastive Pre-training for Check-in Sequence Representation Learning

Multi-view Self-Supervised Contrastive Learning for Multivariate Time Series

Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos

STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation.

MMCPP: A MULTI-MODAL CONTRASTIVE PRE-TRAINING MODEL FOR PLACE REPRESENTATION BASED ON THE SPATIO-TEMPORAL FRAMEWORK

Align Yourself: Self-supervised Pre-training for Fine-grained Recognition via Saliency Alignment.

STS-CCL: Spatial-Temporal Synchronous Contextual Contrastive Learning for Urban Traffic Forecasting

Attentive spatial-temporal contrastive learning for self-supervised video representation

Contrastive Trajectory Learning for Tour Recommendation

Heterogeneous Contrastive Learning: Encoding Spatial Information for Compact Visual Representations

Cross-view motion consistent self-supervised video inter-intra contrastive for action representation understanding

Improving Next Location Recommendation Services With Spatial-Temporal Multi-Group Contrastive Learning

Contrastive Attraction and Contrastive Repulsion for Representation Learning

Spatio-Temporal Meta Contrastive Learning

Scene Text Recognition with Self-supervised Contrastive Predictive Coding

TempCLR: Temporal Alignment Representation with Contrastive Learning

Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training

SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations

Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition

Learnable Query Contrast and Spatio-temporal Prediction on Point Cloud Video Pre-training