Seeing the Unseen Network: Inferring Hidden Social Ties from Respondent-Driven Sampling

Lin Chen,Forrest W. Crawford,Amin Karbasi
DOI: https://doi.org/10.48550/arXiv.1511.04137
2015-12-02
Abstract:Learning about the social structure of hidden and hard-to-reach populations --- such as drug users and sex workers --- is a major goal of epidemiological and public health research on risk behaviors and disease prevention. Respondent-driven sampling (RDS) is a peer-referral process widely used by many health organizations, where research subjects recruit other subjects from their social network. In such surveys, researchers observe who recruited whom, along with the time of recruitment and the total number of acquaintances (network degree) of respondents. However, due to privacy concerns, the identities of acquaintances are not disclosed. In this work, we show how to reconstruct the underlying network structure through which the subjects are recruited. We formulate the dynamics of RDS as a continuous-time diffusion process over the underlying graph and derive the likelihood for the recruitment time series under an arbitrary recruitment time distribution. We develop an efficient stochastic optimization algorithm called RENDER (REspoNdent-Driven nEtwork Reconstruction) that finds the network that best explains the collected data. We support our analytical results through an exhaustive set of experiments on both synthetic and real data.
Social and Information Networks,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to infer the structure of hidden social networks from data collected based on Respondent - Driven Sampling (RDS). Specifically, researchers are faced with how to reconstruct the potential connections among participants in the recruitment process in incomplete social network data, which usually lack specific identity information for privacy protection reasons. By proposing a flexible stochastic model to describe the RDS recruitment process on a partially - observed network structure and deriving the likelihood of the observed time series, the paper aims to provide a method that can use existing recruitment time series, the degrees of participants (i.e., the number of people they know), coupon information, and who recruited whom, etc., to estimate unknown parameters and the underlying social network structure. To achieve this goal, the authors propose an efficient stochastic optimization algorithm named RENDER (REspoNdent - Driven nEtwork Reconstruction), which can find the network structure that best explains the collected data. The paper also verifies the accuracy and reconstruction performance of the RENDER algorithm through a series of experiments on synthetic and real data. In particular, the authors apply the RENDER algorithm to reconstruct the social network in an RDS study of injecting drug users in St. Petersburg, Russia. Overall, the core problem of this paper is to develop an effective method that can recover the social network structure within hidden social groups from RDS data while respecting privacy, which is of great significance for epidemiological and social science research.