Abstract:Respondent-driven sampling (RDS) is an approach to sampling design and analysis which utilizes the networks of social relationships that connect members of the target population, using chain-referral methods to facilitate sampling. RDS typically leads to biased sampling, favoring participants with many acquaintances. Naive estimates, such as the sample average, which are uncorrected for the sampling bias, will themselves be biased. To compensate for this bias, current methodology suggests inverse-degree weighting, where the "degree" is the number of acquaintances. This stems from the fundamental RDS assumption that the probability of sampling an individual is proportional to their degree. Since this assumption is tenuous at best, we propose to harness the additional information encapsulated in the time of recruitment, into a model-based inference framework for RDS. This information is typically collected by researchers, but ignored. We adapt methods developed for inference in epidemic processes to estimate the population size, degree counts and frequencies. While providing valuable information in themselves, these quantities ultimately serve to debias other estimators, such a disease's prevalence. A fundamental advantage of our approach is that, being model-based, it makes all assumptions of the data-generating process explicit. This enables verification of the assumptions, maximum likelihood estimation, extension with covariates, and model selection. We develop asymptotic theory, proving consistency and asymptotic normality properties. We further compare these estimators to the standard inverse-degree weighting through simulations, and using real-world data. In both cases we find our estimators to outperform current methods. The likelihood problem in the model we present is convex, and thus efficiently solvable. We implement these estimators in an R package, chords, available on CRAN.

New Survey Questions and Estimators for Network Clustering with Respondent-driven Sampling Data

Respondent-driven sampling bias induced by clustering and community structure in social networks

Seeing the Unseen Network: Inferring Hidden Social Ties from Respondent-Driven Sampling

The Sensitivity of Respondent-driven Sampling Method

Identification of Homophily and Preferential Recruitment in Respondent-Driven Sampling

The graphical structure of respondent-driven sampling

Respondent-driven sampling on directed networks

Novel sampling design for respondent-driven sampling

Modeling and Analysing Respondent Driven Sampling as a Counting Process

Hidden population size estimation from respondent-driven sampling: a network approach

Linked Ego Networks: Improving estimate reliability and validity with respondent-driven sampling

Simple estimators for network sampling

Network Structure and Biased Variance Estimation in Respondent Driven Sampling

Respondent-driven sampling and an unusual epidemic

Estimating hidden population size from a single respondent-driven sampling survey

Unweighted regression models perform better than weighted regression techniques for respondent-driven sampling data: results from a simulation study

Binary regression analysis with network structure of respondent-driven sampling data

Neighbourhood Bootstrap for Respondent-Driven Sampling

Design-adherent estimators for network surveys

Assessing Reliability of Naïve Respondent-driven Sampling Samples by Using Repeated Surveys Among People Who Inject Drugs (PWID) in New Jersey

Evaluation of Logistic Regression Applied to Respondent-Driven Samples: Simulated and Real Data