Respondent-driven sampling bias induced by clustering and community structure in social networks

Luis Enrique Correa Rocha,Anna Ekeus Thorson,Renaud Lambiotte,Fredrik Liljeros

DOI: https://doi.org/10.1111/rssa.12180

2015-03-20

Abstract:Sampling hidden populations is particularly challenging using standard sampling methods mainly because of the lack of a sampling frame. Respondent-driven sampling (RDS) is an alternative methodology that exploits the social contacts between peers to reach and weight individuals in these hard-to-reach populations. It is a snowball sampling procedure where the weight of the respondents is adjusted for the likelihood of being sampled due to differences in the number of contacts. In RDS, the structure of the social contacts thus defines the sampling process and affects its coverage, for instance by constraining the sampling within a sub-region of the network. In this paper we study the bias induced by network structures such as social triangles, community structure, and heterogeneities in the number of contacts, in the recruitment trees and in the RDS estimator. We simulate different scenarios of network structures and response-rates to study the potential biases one may expect in real settings. We find that the prevalence of the estimated variable is associated with the size of the network community to which the individual belongs. Furthermore, we observe that low-degree nodes may be under-sampled in certain situations if the sample and the network are of similar size. Finally, we also show that low response-rates lead to reasonably accurate average estimates of the prevalence but generate relatively large biases.

Applications,Social and Information Networks,Data Analysis, Statistics and Probability,Physics and Society

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the bias problem in Response - Driven Sampling (RDS) in social networks due to network structure characteristics (such as social triangles, community structures, and the heterogeneity of the number of connections). Specifically, the research focuses on the following points: 1. **The influence of network structure on RDS**: The research explores how triangular relationships, community structures, and the heterogeneity of node degrees in social networks affect the sampling process of RDS. These structural characteristics may cause samples to be concentrated in certain sub - regions of the network, thus affecting the coverage and representativeness of sampling. 2. **Recruitment trees and wave distribution**: By simulating the RDS process under different network structures, the research analyzes the size of recruitment trees and the wave distribution situation. This helps to understand the sampling efficiency and coverage of RDS under different response rates. 3. **Estimation bias**: The research evaluates the performance of RDS estimators under different network structures and response rates, especially the estimation bias and Design Effect (D.E.). The design effect measures the increase in sample size required when using RDS compared to simple random sampling. 4. **Bias in practical applications**: The research also examines the bias problems that may be encountered when using RDS in real - world social networks, especially when the infection rate is related to the node degree or community size. Through these analyses, the paper aims to provide a deeper understanding of the application of RDS in hidden populations and propose suggestions for improving the RDS method to reduce the bias caused by network structures.

Respondent-driven sampling bias induced by clustering and community structure in social networks

The Sensitivity of Respondent-driven Sampling Method

Respondent-driven sampling on directed networks

Modeling and Analysing Respondent Driven Sampling as a Counting Process

New Survey Questions and Estimators for Network Clustering with Respondent-driven Sampling Data

The graphical structure of respondent-driven sampling

Network Structure and Biased Variance Estimation in Respondent Driven Sampling

Identification of Homophily and Preferential Recruitment in Respondent-Driven Sampling

Linked Ego Networks: Improving estimate reliability and validity with respondent-driven sampling

Seeing the Unseen Network: Inferring Hidden Social Ties from Respondent-Driven Sampling

Novel sampling design for respondent-driven sampling

Hidden population size estimation from respondent-driven sampling: a network approach

Estimating hidden population size from a single respondent-driven sampling survey

Respondent-driven sampling and an unusual epidemic

An Empirical Analysis of the Impact of Recruitment Patterns on RDS Estimates among a Socially Ordered Population of Female Sex Workers in China.

Neighbourhood Bootstrap for Respondent-Driven Sampling

Binary regression analysis with network structure of respondent-driven sampling data

Unweighted regression models perform better than weighted regression techniques for respondent-driven sampling data: results from a simulation study

Simple estimators for network sampling

Evaluation of Logistic Regression Applied to Respondent-Driven Samples: Simulated and Real Data

Reduced Bias for respondent driven sampling: accounting for non-uniform edge sampling probabilities in people who inject drugs in Mauritius