H4M: Heterogeneous, Multi-source, Multi-modal, Multi-view and Multi-distributional Dataset for Socioeconomic Analytics in the Case of Beijing

Yaping Zhao,Shuhui Shi,Ramgopal Ravi,Zhongrui Wang,Edmund Y. Lam,Jichang Zhao
DOI: https://doi.org/10.48550/arXiv.2208.12542
2022-08-11
Computers and Society
Abstract:The study of socioeconomic status has been reformed by the availability of digital records containing data on real estate, points of interest, traffic and social media trends such as micro-blogging. In this paper, we describe a heterogeneous, multi-source, multi-modal, multi-view and multi-distributional dataset named "H4M". The mixed dataset contains data on real estate transactions, points of interest, traffic patterns and micro-blogging trends from Beijing, China. The unique composition of H4M makes it an ideal test bed for methodologies and approaches aimed at studying and solving problems related to real estate, traffic, urban mobility planning, social sentiment analysis etc. The dataset is available at: https://indigopurple.github.io/H4M/index.html
What problem does this paper attempt to address?