Update on respondent-driven sampling: Theory and practical considerations for studies of persons who inject drugs

Abstract

In the last 5 years, more than 600 articles using respondent-driven sampling has been published. This article aims to provide an overview of this sampling technique with an update on the key questions that remain when using respondent-driven sampling, with regard to its application and estimators. Respondent-driven sampling was developed by Heckathorn in 1997 and was based on the principle of individuals recruiting other individuals, who themselves were recruited in previous waves. When there is no sampling frame, respondent-driven sampling has demonstrated its ability to capture individuals belonging to “hidden” or “hard-to-reach” populations in numerous epidemiological surveys. People who use drugs, sex workers, or men who have sex with men are notable examples of specific populations studied using this technique, particularly by public agencies such as the Centers for Disease Control and Prevention in the United States. Respondent-driven sampling, like many others, is based on a set of assumptions that, when respected, can ensure an unbiased estimator. Based on a literature review, we will discuss, among other topics, the effect of violating these assumptions. A special focus is made on surveys of persons who inject drugs. Publications show two major thrusts—methodological and applied researches—for providing practical recommendations in conducting respondent-driven sampling studies. The reasons why respondent-driven sampling did not work for a given population of interest will usually provide important insights for designing health-promoting interventions for that population.

Keywords

Respondent-driven sampling persons who inject drugs bias variance

Introduction

It is crucial to study populations that are at higher risk of contracting infectious diseases in order to implement interventions to prevent transmission of these diseases. People who use drugs, men who have sex with men, sex workers, and some immigrants are examples of populations that are more exposed and therefore vulnerable to HIV, hepatitis B and C, and sexually transmitted infections, in particular (Gile et al., 2015; Le et al., 2010).

However, it is difficult to conduct a sero-epidemiological survey within these populations because of the illicit nature of some practices, such as drug use or sex work (depending on the country’s legislation). Moreover, these populations are often stigmatized, and the individuals who comprise them are hard to reach because their practices are hidden (hence the term “hidden population”) and their living conditions make it difficult for interviewers to approach them on account of location (e.g. in the case of squatting) or sometimes for safety reasons.

When no sampling frame exists and traditional survey techniques cannot be used, techniques specifically designed for hard-to-reach populations have been proposed and used (Magnani et al., 2005; Spreen, 1992). Examples of such sampling techniques are snowball (Goodman, 1961), network (Birnbaum and Sirken, 1965; Granovetter, 1976; Sudman et al., 1988), targeted (Watters and Biernacki, 1989), random walk (Klovdahl, 1989), adaptive cluster (Thompson and Seber, 1996), time-space (Mackellar et al., 1996; Muhib et al., 2001), and link-tracing (Chow and Thompson, 2003; Félix-Medina and Thompson, 2004; Thompson and Frank, 2000). Capture–recapture is another method to estimate a population size (Ruiz et al., 2016). However, from a statistical standpoint, some of these techniques are not based on a random selection of individuals. This can lead to a bias in the estimates of epidemiological indicators and their variances since some individuals in the population of interest may have a zero probability of being recruited. Other techniques use, in the first stage, lists of places in which individuals belonging to the population of interest can be interviewed, as in the case of time-location sampling (Karon and Wejnert, 2012). As a result, these sampling methods are not useful for surveying hard-to-reach populations.

To overcome these statistical and practical limitations, a new method, called respondent-driven sampling (RDS), was developed by the sociologist Douglas Heckathorn (1997) in the late 1990s. The objectives of this sampling were to build a sample of socially networked individuals belonging to a hidden or hard-to-reach population and produce an unbiased estimate of the functions of interest (e.g. prevalence or strength of association) in this population.

Starting in the early 2000s, this sampling method, seen as innovative particularly for dealing with selection bias, was used to a great extent to survey hard-to-reach populations. Between 2003 and 2007, it was already used in more than 120 studies in 20 different countries, representing more than 32,000 individuals recruited for the only behavioral and biological HIV studies (Malekinejad et al., 2008; Montealegre et al., 2013; White et al., 2015). In 2005, the American Centers for Disease Control and Prevention (CDC, 2009) alone surveyed more than 13,000 drug users in 20 US cities. RDS is also often used in low-income countries, in order to implement surveys on hard-to-reach populations, due to the low cost of this type of study and its ability of reaching many people within a very short time period. In 2010, a special issue in this journal (Vol 5, issue 2) was devoted to methods for hard-to-reach populations. In this issue, RDS was discussed (Johnston and Sabin, 2010) and compared with time-location sampling (Semann, 2010). In 2013 (since the mid-1990s), 460 studies from 69 countries used RDS (White et al., 2015).

Researchers’ enthusiasm for this method was the consequence of their many years of frustration with having only the previous methods at their disposal, all of which were known to have major drawbacks. However, RDS, based on strong assumptions, began to be widely used before there was enough time to determine the validity of the method, assess the conditions for applying it, and ensure that the underlying assumptions were respected. Its use outpaced its methodological development, which made the results of some studies open to question.

This article aims to briefly describe the principle of RDS, to identify the main estimators used, in particular the RDS-II estimator that is unbiased under certain assumptions, and to describe the behavior of the RDS-II estimator when certain assumptions are violated. Finally, practical considerations for RDS studies applied to persons who inject drugs (PWID) are discussed.

Principle of RDS

The principle of this sampling is fairly simple. First, the study investigator looks for individuals (called seeds) who belong to the population of interest and know a sizable number of individuals in it. The investigator contacts these seeds and, in a location adapted to the survey, administers a questionnaire to them, possibly supplemented by medical examinations and/or biological sampling. When they leave, they are given one or more coupons that bear a unique identifier, the address of the survey premises, and the name of the study. Each seed is to give the coupons to people he or she knows or in some cases more precisely with people with whom they had sexual intercourse or had shared injecting equipment. Quite often, participants are paid to take part in the study and recruit others. Once the seeds have distributed the coupons to their peers, the latter go to the premises to complete the questionnaire and the medical examinations and to recruit others. These persons recruited by the seeds constitute Wave 1, as illustrated in Figure 1. When they have finished, the individuals recruited in Wave 1 are given coupons that they in turn give to others, who make up Wave 2 and also go to the survey location. This process is repeated until the pre-determined sample size is reached. The numbers on the coupons identify who recruited whom to allow researchers to reconstruct the recruitment chains.

Figure 1.

Simplified representation of the first three waves of RDS recruitments. Each circle represents an individual and each arrow represents the distribution of a coupon from an individual to another. For a given wave, individuals are not recruited at the same time and the social network of each individual is not represented.

RDS can be viewed as a technique for populations described by small-world theory, in which any individual in a given population may be indirectly associated, via his or her social network, with any other individual belonging to the same population through approximately six intermediaries (Killworth and Bernard, 1978). According to this theory, starting with a sampling method based on recruitment chains, any individual should be able to be included in the sample with a strictly positive probability.

For a given individual i, the true number of his or her relationships is called the degree, noted $d_{i}$ , that is, the size of this individual’s social network within the population of interest. Thus, if the focus is on drug users and if a user knows n users in his or her social network, he or she will be considered to have a degree equal to n. There is probably a difference between the true degree and the reported degree, noted . When an individual tends to recruit persons who resemble him or her, especially with regard to the variable of interest, this is classically defined as homophily, even if several definitions coexist (Crawford et al., 2015; Tyldum and Johnston, 2014). Homophily will be high (1.0) if the infected persons (or non-infected persons, as the case may be) recruit only other infected (or non-infected) persons.

Estimators

RDS is a complex stochastic process since in theory, it is a branching process without replacement, on an arbitrary graph of social relationships that begins with a seed convenience sample (Gile and Handcock, 2010). Thus, if we are interested in a population composed of two groups of individuals (e.g. group A: infected persons and group B: non-infected persons), we can expect that the properties of the estimator of the infected proportion, noted $P_{A}$ , will not be easy to determine. Those properties will depend both on the characteristics of the social network, preferential recruitment (uncontrolled by the investigator), and on choices (controlled) in the sampling in terms of the number of coupons, the number of waves, and so on.

Several estimators have been proposed to estimate a proportion. Two of the most popular estimators are RDS-I, also called the classical estimator or the SH (Salganik–Heckathorn) estimator (Heckathorn, 2002), and RDS-II, also called the VH (Volz–Heckathorn) estimator (Volz and Heckathorn, 2008). We note $s$ , the respondent-driven sample, and $s_{A}$ and $s_{B}$ , the sample of individuals belonging to group A and group B, respectively. We note $n_{A}$ and $n_{B}$ , the two sample sizes.

RDS-I (or SH) estimator

Following classical notations (Tomas and Gile, 2011), we note $\hat{C_{A B}}$ , the proportion of individuals recruited by members of group A who are members of group B; $\hat{C_{B A}}$ , the proportion of individuals recruited by members of group B who are members of group A; and ${\hat{\bar{D}}}_{g}$ , an estimate of the mean degree in group g (g = A or B)

{\hat{\bar{D}}}_{g} = \frac{n_{g}}{\sum_{i \in s_{g}} (\frac{1}{{\tilde{d}}_{i}})}

The RDS-I estimator for $P_{A}$ is given by

{\hat{P}}_{A} = \frac{\hat{C_{B A}}}{\hat{C_{B A}} + \hat{C_{A B}} (\frac{{\hat{\bar{D}}}_{A}}{{\hat{\bar{D}}}_{B}})}

RDS-II (or VH) estimator

The RDS-II estimator for $P_{A}$ is given by

{\hat{P}}_{A} = \frac{\sum_{i \in s_{A}} \frac{1}{{\tilde{d}}_{i}}}{\sum_{i \in s} \frac{1}{{\tilde{d}}_{i}}}

Using simulations that compare the mean square errors of the two estimators, it was shown that the performance of RDS-II was almost always superior to that of RDS-I (Gile and Handcock, 2010).

The RDS-II estimator is currently the estimator used in RDSAT (respondent-driven sampling analysis tool), a free program for analyzing data from RDS surveys. Its variance is estimated using a bootstrap method (Salganik, 2006). It has been shown that the RDS-II estimator is asymptotically unbiased under the following assumptions (Volz and Heckathorn, 2008):

The sample is selected with replacement.

The sampling fraction is small.

Each individual recruits only one individual (number of coupons = 1).

The respondents state precisely what their degree in the network is.

Recruitment of each individual (including seeds) is random.

Relationships are reciprocal (undirected network).

Population consists of one connected component (every individual can be reached by a finite path from any other individual).

The estimator’s lack of bias is therefore based on a priori assumptions, and it is legitimate to ask how this estimator will behave if one or more of those assumptions are violated. Since RDS is a complex process, the estimator’s properties are studied through simulations and not through analytical developments. Several recent publications have measured the performance of RDS-II and shown that this estimator could be biased in some circumstances (see subsection “Performances of the RDS-II estimator”).

In 2011, Gile (2011) proposed a new estimator called RDS-SS. It is based on successive sampling, equivalent to probability proportional to size without replacement sampling. RDS-SS iteratively estimates both the degree distribution and the inclusion probabilities. Gile shows that the RDS-SS estimator offers an interesting alternative to the RDS-II estimator in terms of bias related to the sampling fraction and the ratio of the average number of degrees between infected and non-infected persons. That being said, this estimator, like others, can induce biases; these are presented in a summary table in an article by Tomas and Gile (2011). Recently, these three RDS estimators have been implemented in the R (R Core Team, free statistical software) packages RDS (Handcock et al., 2009) and RDS Analyst (Handcock et al., 2013). However, the great majority of researchers used a publicly available software application (RDSAT) in which only RDS-II estimator is implemented. We will therefore come back to the latter’s performance.

Performances of the RDS-II estimator

Gile and Handcock (2010) measured the performance of RDS-II, in particular the procedure for selecting seeds (non-infected, random and infected seeds), the behavior of respondents (whether homophily is weak or strong (individuals recruit more in their own group)), the sampling fraction, and the mean degree according to disease status (infected or non-infected). The authors show that a bias is induced by seed selection and the level of homophily. The real prevalence (simulated) is underestimated when the seeds are non-infected, and this underestimation is greater when homophily is strong. This is due to the fact that when homophily is strong, non-infected seeds tend to recruit non-infected individuals who will in turn recruit non-infected individuals. Ultimately, the sample will be essentially composed of non-infected persons, leading to an underestimation of prevalence. Bias is close to zero when seeds are selected randomly, regardless of the level of homophily. When the seeds are infected, prevalence will be overestimated, and the stronger the homophily, the greater this overestimation will be.

Gile and Handcock also showed that the bias of the estimator depends on the ratio of the mean number of degrees of infected persons to that of non-infected persons and the sampling fraction. Bias increases when both the sampling fraction and the ratio increase. Thus, when infected persons have a higher mean number of degrees than non-infected persons, prevalence is increasingly underestimated. Another study examined in-depth simulations to test the violation of each assumption (Lu et al., 2012). The main findings of that study indicate a major bias when the network is one-directional (i.e. when one individual may know another individual but not vice versa) or when respondents choose to recruit individuals who have characteristics correlated with the variable of interest (such as disease status). On the other hand, the authors of that study consider that the estimator is robust with regard to sampling without replacement, low response rate, some degree-reporting errors, and the seed selection criterion. These conclusions are different from those of Gile and Handcock as regards sampling without replacement and seed selection.

A recent simulation-based study showed significant bias if degrees are inaccurately reported (Mills et al., 2014). The authors demonstrated that obtaining correct degrees for individuals reporting low degrees is of particular importance because these individuals have higher weights and are less likely to be infected. It is thus crucial to recover accurate degrees through some relevant questions which represent a real challenge.

The properties of the RDS-II estimator can also be considered in terms of the size of the design effect. Using real data, Goel and Salganik (2010) showed that the design effect could be sizable. Based on their data, they obtain a range for the design effect between 5.7 and 58.3 and a median of 11. In relation to epidemiological surveys using more traditional survey designs, this shows that the variance of the estimator is high. This variance increases when the number of coupons increases and homophily increases (Lu et al., 2012). Recently, even if design effects varied across countries and populations, researchers recommended a design effect between 2 and 4 in RDS studied to estimate the sample size (Johnston et al., 2013; Wejnert et al., 2012).

Violation of assumptions

The simulation studies show that the bias and variance of the RDS-II estimator depend on a set of assumptions (listed in subsection “Estimators”) that cannot be controlled by the person in charge of the survey. It can therefore be expected that in reality, these assumptions will not be checked to potential biases. The literature shows that some of the assumptions (assumptions 3–6) are indeed often violated. Examples of this are use of more than one coupon (Johnston et al., 2008; Malekinejad et al., 2008), respondents have difficulty stating precisely what their degree is (Marsden, 2005), non-random recruitment (Frost et al., 2006; Liu et al., 2012; Wang et al., 2005), or not all relationships are two-way (Abramovitz et al., 2009; Iguchi et al., 2009; Ma et al., 2007; Paquette et al., 2011).

Practical considerations for RDS studies of PWID

RDS has probably been used for more studies of PWID than with all other “hard-to-reach” populations combined, and a fair amount of practical knowledge for conducting RDS studies of PWID populations has accumulated (Malekinejad et al., 2008; Rudolph et al., 2011). Whether the RDS assumptions noted above will hold in any specific study of PWID will depend upon a number of practical concerns as well as the underlying theory.

First, how can the researchers determine that the social structure of the population to be studied conforms to the RDS assumption of a fully networked structure? Specifically, whether there are no separations within the population that would have large effects on recruitment. There may be critical divisions within the local PWID population that would greatly reduce the likelihood that people in one subgroup might recruit people in another group. Examples would include groups that inject different drugs, or members of different racial/ethnic groups, or PWID who live in different geographic areas of the same city (Linton et al., 2015).

Qualitative/ethnographic research can often be used to identify potential subgroups within the overall local PWID population where it would be very unlikely that a member of one group would recruit a member of the other group. If this does appear to be the case, then it may be best to reformulate the research as two studies of two different PWID populations. This will, of course, require a sample size for each of the subgroups large enough for the needed statistical analyses. And it will greatly increase the cost of the study. There is no ironclad decision rule for using qualitative data for making the decision to conduct separate RDS studies for different groups within a total local PWID population.

Second, is the sample size large enough to reach “equilibrium”? Equilibrium occurs for an RDS sample, when the important characteristics of the sample (gender, HIV status, age, race/ethnicity, drugs injected, etc.) remain constant over following waves of subject recruitment. Equilibrium is an indication that subject recruitment is no longer determined by the characteristics of the initial seeds. It often requires sample sizes of several hundred to know that one has reached equilibrium. Failure to reach equilibrium creates a strong suspicion that the estimators are biased.

Third, the study needs to have the capability of handling large numbers of subjects at once. One of the virtues of RDS is that it will typically recruit many subjects within a short time period. RDS recruitment is geometric, that is, the number of potential subjects—person with coupons who meet the study eligibility criteria grows quickly. If each subject recruits an average of two additional persons who desire to participate in the study, then the number of persons wanting to participate will double with each recruitment wave. Working with large numbers of subjects then requires scheduling of the research appointments. This can be done as the coupons are given to each subject (the coupon is valid for only a specific time on a specific date) or by asking subjects to come to the research site to schedule an appointment. As people who use drugs are usually not very good at keeping precise appointments, precise scheduling means that some potential subjects will not participate because they did not present at the research site at the scheduled time. There may be important difference between subjects who do and subjects who do not keep tight appointments, creating another source of possible bias in the study. Having some flexibility to take subjects even though they do not present at the proper time would reduce such bias, but becomes quite difficult when the study staff are trying to process large number of subjects.

Furthermore, to avoid and to control for potential duplication in some surveys, a combination of biometric measures of each respondent can be used (e.g. length of each forearm) (Heckathorn et al., 2002; Uuskula et al., 2011) or other specific identifiers (e.g. mother’s maiden name, birth date) (Rudolph et al., 2011).

This section has not attempted to address all of the practical issues that frequently arise in RDS studies of PWID. Rather, it attempts to broaden the discussion beyond the theoretical assumption to some of the practical issues that may be equally important to conducting and interpreting an RDS study. In general, these issues arise from the success of RDS as a method for rapidly recruiting large numbers of subjects within a scientific framework that, if the theoretical assumptions can be met and the practical problems minimized, can combine the most sophisticated sampling method and the greatest cost-efficiency for studying hard-to-reach populations such as PWID.

Discussion

Recent publications show two major thrusts in research on RDS. The first thrust is methodological research. There is a need to continue studying the properties of existing estimators and to improve the estimators and their variances. The second thrust is applied research, which consists in verifying the assumptions when a survey is conducted and addressing the practical issues in conducting RDS studies. These assumptions have often been ignored in the past, making it difficult to discuss the accuracy of the results produced. Very recent articles show that this research is ongoing, with a realization that the method had to be evaluated from different angles (Dombrowski et al., 2013; Lansky et al., 2012; Liu et al., 2012; McCreesh et al., 2011, 2012; Rudolph et al., 2013; Salganik, 2012; Wylie and Jolly, 2013).

However, some questions remain unanswered for anyone wanting to carry out RDS. As regards estimators, the question arises as to whether to continue using the RDS-II estimator and its bootstrap variance or whether the RDS-SS estimator should be used instead, knowing that it has been recently implemented only in the statistical software R. From a more practical standpoint, this raises further questions for the survey investigator: Are the conditions right (i.e. the assumptions are true) to use this sampling method? Should preliminary studies be carried out to determine the characteristics of the social network? Should this type of survey be ruled out in some cases? Salganik (2012) calls for the data from RDSs to be made available so that the evaluation and development of this sampling method can be completed. White et al. (2012) have proposed a set of information to be reported in RDS studies adapted from the STROBE guidelines developed for cross-sectional studies (Von Elm et al., 2007). Finally, diagnostic tools and practical recommendations have been recently proposed to be applied before, during, and after data collection to improve RDS sampling and inference (Gile et al., 2015).

This seems all the more important since this method is being extended to areas other than the study of hard-to-reach populations, particularly its use in telephone surveys (Lee et al., 2011), in web surveys (Schonlau et al., 2014; Stein et al., 2014), or even in recruiting participants to assess the effectiveness of HIV-prevention measures in clinical trials (Solomon et al., 2013).

RDS can be used simply as a time-efficient method of recruiting “hard-to-reach” populations. Paying subjects to recruit other subjects will be usually more cost-effective than paying research staff to recruit subjects in populations with strong social networks. There will be many occasions, however, when RDS does not work as well as the researchers hoped. There may be very high positive homophilies and failure to reach equilibria in the data, suggesting that there is not a single network in the population of interest (a “small world”) but rather the population is fragmented into two or more subpopulations that need to be considered separately. It is also possible that the population of interest is not socially networked. For example, commercial sex workers who use the Internet for attracting customers may not be sufficiently networked with each other to sustain recruitment chains. Men who have sex with men who meet in anonymous locations, such as parks or restrooms, also may not be able to recruit each other. For PWID, one of the key of success is to recruit seeds with large networks and with whom people are confident; the question of confidence is crucial for improving the recruitment of PWID with RDS.

In such situations, where either the underlying assumptions in RDS theory do not hold or RDS recruiting does not produce large numbers of subjects, the data may still be analyzed as a convenience sample, with the knowledge that the sample is not representative of the underlying population of interest. Most importantly, if RDS does not work for a given population of interest, the reasons why RDS did not work will usually provide important insights for designing health-promoting interventions for that population. Both RDS and many health-promoting interventions rely on positive peer relationships, and if RDS does not work, the interventions are also likely to not work.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Author biographies

Lucie Léon is biostatistician at the Direction of Infectious Diseases of Santé publique France, the French National Public Health Agency in France. She has a Master of Science and is currently working on her doctoral thesis in Biostatistics/Epidemiology.

Don Des Jarlais, Ph.D. is Director of Research for the Baron Edmond de Rothschild Chemical Dependency Institute at Beth Israel Medical Center, a Senior Research Fellow with the National Development and Research Institutes, Inc. and a Guest Investigator at Rockefeller University in New York. As a leader in the fields of AIDS and injecting drug use, Dr. Des Jarlais has published extensively on these topics.

Marie Jauffret-Roustide is Ph.D sociologist at Santé publique France and at the French Institute of Health and Medical Research in France. She leads sero-epidemiological and sociological studies on drug use practices and social processes of at-risk practices among people who inject drugs and crack users.

Yann Le Strat is Ph.D. biostatistician at the Direction of Infectious Diseases of Santé publique France, the French National Public Health Agency in France.

References

Abramovitz

Volz

Strathdee

. (2009) Using respondent-driven sampling in a hidden population at risk of HIV infection: Who do HIV-positive recruiters recruit? Sexually Transmitted Diseases 36(12): 750–756. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-73949137536&partnerID=40&md5=ad170ecf764cbd76b1ed109c405745b0

Birnbaum

Sirken

(1965) Design of sample surveys to estimate the prevalence of rare diseases: Three unbiased estimates. Vital and Health Statistics 2(11): 1–8.

Centers for Disease Control and Prevention (CDC) (2009) HIV-associated behaviors among injecting-drug users—23 cities, United States, May 2005-February 2006. Morbidity and Mortality Weekly Report 58: 329–332.

Chow

Thompson

(2003) Estimation with link-tracing sampling designs a Bayesian approach. Survey Methodology 29(2): 197–205.

Crawford

Aronow

Zeng

. (2015) Identification of homophily and preferential recruitment in respondent-driven sampling (Unpublished Work). Available at: https://arxiv.org/abs/1511.05397

Dombrowski

Khan

Moses

. (2013) Assessing respondent driven sampling for network studies in ethnographic contexts. Advances in Anthropology 3(1): 1–9.

Félix-Medina

Thompson

(2004) Combining link-tracing sampling and cluster sampling to estimate the size of hidden populations. Journal of Official Statistics 20(1): 19–38.

Frost

SDW

Brouwer

Firestone Cruz

. (2006) Respondent-driven sampling of injection drug users in two U.S.-Mexico border cities: Recruitment dynamics and impact on estimates of HIV and syphilis prevalence. Journal of Urban Health 83(7 Suppl.): i83–i97. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-33845654037&partnerID=40&md5=04f1fb2665808d750c313682b6d9d490

Gile

(2011) Improved inference for respondent-driven sampling data with application to HIV prevalence estimation. Journal of the American Statistical Association 106(493): 135–146. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-79954461374&partnerID=40&md5=beb8e610516c8027f776ee9dec7bc6d6

10.

Gile

Handcock

(2010) Respondent-driven sampling: An assessment of current methodology. Sociological Methodology 40(1): 285–327. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-77949340082&partnerID=40&md5=05636a2fae2eeb516971b0d27e0cf5ac

11.

Gile

Johnston

Salganik

(2015) Diagnostics for respondent-driven sampling. Journal of the Royal Statistical Society: Series A 178(1): 241–269.

12.

Goel

Salganik

(2010) Assessing respondent-driven sampling. Proceedings of the National Academy of Sciences USA 107(15): 6743–6747.

13.

Goodman

(1961) Snowball sampling. Annals of Mathematical Statistics 32(1): 148–170.

14.

Granovetter

(1976) Network sampling: Some first steps. American Journal of Sociology 81: 1267–1303.

15.

Handcock

Fellows

Gile

(2013) RDS analyst: Analysis of respondent-driven sampling data. Working paper.

16.

Handcock

Gile

Neely

(2009) RDS: R Functions for Respondent-Driven Sampling. R Package version 0.10.

17.

Heckathorn

(1997) Respondent-driven sampling: A new approach to the study of hidden populations. Social Problems 44(2): 174–199. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-0003992773&partnerID=40&md5=f004da404fee33b6cde964af2bfbdcaa

18.

Heckathorn

(2002) Respondent-driven sampling II: Deriving valid population estimates from chain-referral samples of hidden populations. Social Problems 49(1): 11–34.

19.

Heckathorn

Semann

Broadhead

. (2002) Extensions of respondent-driven sampling: A new approach to the study of injection drug users aged 18–25. AIDS and Behavior 6(1). Available at: http://www.respondentdrivensampling.org/reports/steering.pdf

20.

Iguchi

Ober

Berry

. (2009) Simultaneous recruitment of drug users and men who have sex with men in the United States and Russia using respondent-driven sampling: Sampling methods and implications. Journal of Urban Health 86(Suppl. 1): S5–S31. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-68349084634&partnerID=40&md5=ab7b971d61fec7ba7b3883847b1a2413

21.

Johnston

Sabin

(2010) Sampling hard-to-reach populations with respondent driven sampling. Methodological Innovations Online 5(2): 38–48.

22.

Johnston

Chen

Silva-Santisteban

. (2013) An empirical examination of respondent driven sampling design effects among HIV risk groups from studies conducted around the world. AIDS and Behavior 17(6): 2202–2210. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-84879028371&partnerID=40&md5=e373905562557b90aa2b5e704dcfd53f

23.

Johnston

Malekinejad

Kendall

. (2008) Implementation challenges to using respondent-driven sampling methodology for HIV biological and behavioral surveillance: Field experiences in international settings. AIDS and Behavior 12(Suppl. 1): S131–S141. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-46149085847&partnerID=40&md5=f455fb9cdecde405f993194d07214e45

24.

Karon

Wejnert

(2012) Statistical methods for the analysis of time-location sampling data. Journal of Urban Health 89(3): 565–586.

25.

Killworth

Bernard

(1978) The reversal small-world experiment. Social Networks 1: 159–192.

26.

Klovdahl

(1989) Urban social networks: Some methodological problems and possibilities. In: Kochen

(ed.) The Small World. Norwood, MA: Ablex Publishing, pp. 176–210.

27.

Lansky

Drake

Wejnert

. (2012) Assessing the assumptions of respondent-driven sampling in the national HIV Behavioral Surveillance System among injecting drug users. Open AIDS Journal 6: 77–82.

28.

Barin

. (2010) Population-based HIV-1 incidence in France, 2003–08: A modelling analysis. The Lancet Infectious Diseases 10(10): 682–687.

29.

Lee

Ranaldi

Cummings

. (2011) Given the increasing bias in random digit dial sampling, could respondent-driven sampling be a practical alternative? Annals of Epidemiology 21(4): 272–279.

30.

Linton

Cooper

Kelley

. (2015) HIV infection among people who inject drugs in the United States: Geographically explained variance across racial and ethnic groups. American Journal of Public Health 105(12): 2457–2465.

31.

Liu

. (2012) Assessment of random recruitment assumption in respondent-driven sampling in egocentric network data. Social Networks 1(2): 13–21.

32.

Bengtsson

Britton

. (2012) The sensitivity of respondent-driven sampling. Journal of the Royal Statistical Society Series A: Statistics in Society 175(1): 191–216. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-84855955571&partnerID=40&md5=8171bf9fa45db898315965e6f8dff954

33.

Zhang

. (2007) Trends in prevalence of HIV, syphilis, hepatitis C, hepatitis B, and sexual risk behavior among men who have sex with men. Results of 3 consecutive respondent-driven sampling surveys in Beijing, 2004 through 2006. Journal of Acquired Immune Deficiency Syndromes 45(5): 581–587.

34.

McCreesh

Frost

SDW

Seeley

. (2012) Evaluation of respondent-driven sampling. Epidemiology 23(1): 138–147. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-81255204005&partnerID=40&md5=0bcbb318f0a1ed8bf3fe201ef885f3b9

35.

McCreesh

Johnston

Copas

. (2011) Evaluation of the role of location and distance in recruitment in respondent-driven sampling. International Journal of Health Geographics 10: 56.

36.

Mackellar

Valleroy

Karon

. (1996) The Young Men’s Survey: Methods for estimating HIV seroprevalence and risk factors among young men who have sex with men. Public Health Reports 111(Suppl. 1): 138–144.

37.

Magnani

Sabin

Saidel

. (2005) Review of sampling hard-to-reach and hidden populations for HIV surveillance. AIDS 19 (Suppl. 2): S67–S72.

38.

Malekinejad

Johnston

Kendall

. (2008) Using respondent-driven sampling methodology for HIV biological and behavioral surveillance in international settings: A systematic review. AIDS and Behavior 12(4 Suppl): S105–S130.

39.

Marsden

(2005) Recent developments in network measurement. In: Carrington

Scott

Wasserman

(eds) Models and Methods in Social Network Analysis. New York: Cambridge University Press, pp. 8–30.

40.

Mills

Johnson

Hickman

. (2014) Errors in reported degrees and respondent driven sampling: Implications for bias. Drug and Alcohol Dependence 142: 120–126.

41.

Montealegre

Johnston

Murrill

. (2013) Respondent driven sampling for HIV biological and behavioral surveillance in Latin America and the Caribbean. AIDS and Behavior 17(7): 2313–2340.

42.

Muhib

Lin

Stueve

. (2001) A venue-based method for sampling hard-to-reach populations. Public Health Reports 116(Suppl. 1): 216–222.

43.

Paquette

Bryant

(2011) Use of respondent-driven sampling to enhance understanding of injecting networks: A study of people who inject drugs in Sydney, Australia. International Journal of Drug Policy 22(4): 267–273.

44.

Rudolph

Crawford

Latkin

. (2011) Subpopulations of illicit drug users reached by targeted street outreach and respondent-driven sampling strategies: Implications for research and public health practice. Annals of Epidemiology 21(4): 280–289.

45.

Rudolph

Fuller

Latkin

(2013) The importance of measuring and accounting for potential biases in respondent-driven samples. AIDS and Behavior 17(6): 2244–2252.

46.

Ruiz

O’Rourke

Allen

(2016) Using capture-recapture methods to estimate the population of people who inject drugs in Washington, DC. AIDS and Behavior 20: 363–368.

47.

Salganik

(2006) Variance estimation, design effects, and sample size calculations for respondent-driven sampling. Journal of Urban Health 83(6 Suppl.): i98–i112.

48.

Salganik

(2012) Commentary: Respondent-driven sampling in the real world. Epidemiology 23(1): 148–150. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-83655183647&partnerID=40&md5=6a865f6ae1f465b8389c378ecc8789eb

49.

Schonlau

Weidmeir

Kapteyn

(2014) Recruiting an internet panel using respondent-driven sampling. Journal of Official Statistics 30(2): 291–310.

50.

Semann

(2010) Time-space sampling and respondent-driven sampling with hard-to-reach populations. Methodological Innovations Online 5(2): 60–75.

51.

Solomon

Lucas

Celentano

. (2013) Beyond surveillance: A role for respondent-driven sampling in implementation science. American Journal of Epidemiology 178(2): 260–267. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-84880546484&partnerID=40&md5=22f8419fccc8b8fe2b47978f6bcc0242

52.

Spreen

(1992) Rare populations, hidden populations, and link-tracing designs: What and why? Bulletin of Sociological Methodology 36: 34–58.

53.

Stein

van Steenbergen

Chanyasanha

. (2014) Online respondent-driven sampling for studying contact patterns relevant for the spread of close-contact pathogens: A pilot study in Thailand. PLoS ONE 9(1): e85256.

54.

Sudman

Sirken

Cowan

(1988) Sampling rare and elusive populations. Science 240(4855): 991–996.

55.

Thompson

Frank

(2000) Model-based estimation with link-tracing sampling designs. Survey Methodology 26(1): 87–98.

56.

Thompson

Seber

(1996) Adaptative Sampling. New York: John Wiley & Sons.

57.

Tomas

Gile

(2011) The effect of differential recruitment, non-response and non-recruitment on estimators for respondent-driven sampling. Electronic Journal of Statistics 5: 899–934. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-83655205915&partnerID=40&md5=14a36018c4b1d1b3f64b9801798560cf

58.

Tyldum

Johnston

(2014) Applying Respondent Driven Sampling to Migrant Populations. New York: Palgrave Macmillan.

59.

Uuskula

Des Jarlais

Kals

. (2011) Expanded syringe exchange programs and reduced HIV infection among new injection drug users in Tallinn, Estonia. BMC Public Health 11: 517.

60.

Volz

Heckathorn

(2008) Probability based estimation theory for respondent driven sampling. Journal of Official Statistics 24(1): 79–97.

61.

Von Elm

Altman

Egger

. (2007) The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: Guidelines for reporting observational studies. PLoS Med 4(10): e296.

62.

Wang

Carlson

Falck

. (2005) Respondent-driven sampling to recruit MDMA users: A methodological assessment. Drug and Alcohol Dependence 78(2): 147–157.

63.

Watters

Biernacki

(1989) Targeted sampling: Options for the study of hidden populations. Social Problems 36(4): 416–430.

64.

Wejnert

Pham

Krishna

. (2012) Estimating design effect and calculating sample size for Respondent-driven sampling studies of injection drug users in the United States. AIDS and Behavior 16(4): 797–806. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-84863717003&partnerID=40&md5=1b3f117e7f19f8dd458d1e2bb2739dc5

65.

White

Hakim

Salganik

. (2015) Strengthening the reporting of observational studies in epidemiology for respondent-driven sampling studies: “STROBE-RDS” statement. Journal of Clinical Epidemiology 68(12): 1463–1471.

66.

White

Lansky

Goel

. (2012) Respondent driven sampling–where we are and where should we be going? Sexually Transmitted Infections 88(6): 397–399.

67.

Wylie

Jolly

(2013) Understanding recruitment: Outcomes associated with alternate methods for seed selection in respondent driven sampling. BMC Medical Research Methodology 13: 93.