A Latent Hawkes Process Model for Event Clustering and Temporal Dynamics Learning with Applications in GitHub.
Shengzhong Liu,Shuochao Yao,Dongxin Liu,Huajie Shao,Yiran Zhao,Xinzhe Fu,Tarek F. Abdelzaher
DOI: https://doi.org/10.1109/icdcs.2019.00128
2019-01-01
Abstract:Large volumes of event data are becoming increasingly available on online social networks. These events are usually causally dependent to each other, reflecting the interactions and collaborations among different parties. Learning and interpreting the temporal patterns and dynamics within these event streams plays an important role in many practical applications, such as trend prediction and anomaly detection. Since causal dependencies can be reflected in both event time (i.e., when) and event content (i.e., who and what), we thus develop a user community based generative model, called latent Hawkes process (LHP), taking into account both-side information to illustrate the generation of such inter-dependent event streams on GitHub repositories, where each attribute is assumed to be generated by interplays between correlated latent communities. Through learning of our model, two functionalities are fulfilled concurrently: event clustering (i.e., community discovery) and temporal dependency learning among these clusters (i.e., dependency profiling). To do so, we design an EM-based framework integrating sequential Monte Carlo sampling to estimate model parameters in an end-to-end manner. Through experiments on practical GitHub event data, we validate the effectiveness of LHP in extracting user community structures and learning their correlated temporal dynamics. Such knowledge further enables us to gain new insights into the development status of software, such as the project persistence and anomaly detection.