Abstract
Adolescents often engage in behaviors such as substance use and risky sexual activity that can lead to negative health and psychological consequences for themselves and others. Accurate measurement of these behaviors in surveys is challenging given that the behaviors are often viewed as undesirable and/or are illegal, so it is important to test the psychometric properties of instruments used to assess adolescent risk behaviors. The current study aimed to assess the test-retest reliability of a widely used measure of youth risk-taking behavior, the Youth Risk Behavior Survey (YRBS). A sample of 156 at-risk adolescents aged 16–18 years (81% male; 61% White) completed the YRBS retrospectively across intervals ranging from 3 to 12 days during their stay in a residential program at which they were under close supervision and had limited ability to engage in new risk behaviors. Participants were asked to complete the YRBS based on their “typical” (pre-program) behavior at both administrations, which were 10–14 weeks into their stay. The reliability of responses was assessed using kappa and weighted kappa analyses. Findings indicate moderate to substantial reliability for nearly all items, suggesting that at-risk youth reliably reported their engagement in health risk behaviors across multiple administrations and supporting the psychometric strength of the YRBS measure for use with this population.
Keywords
Survey data suggest that adolescents often engage in health risk behaviors (e.g., substance use, risky sexual behaviors) which can lead to negative physical and emotional consequences for themselves and others (Achhab et al., 2016; McPherson et al., 2013). The leading causes of death among 15- to 19-year-olds in the U.S. are unintentional injuries (36.3%), intentional self-harm (22.9%), and assault (16.9%; Heron, 2019). These deaths, many of which may be preventable, have been linked to health risk behaviors including drinking alcohol, engaging in unprotected sexual activities, smoking cigarettes, and carrying weapons (Achhab et al., 2016; Brener et al., 2003; Kann et al., 2018; Rosario et al., 2014). Substance use is a particularly important contributor to health risk behaviors as it can lower inhibitions and increase the likelihood that youths will engage in other dangerous behaviors, such as driving while intoxicated or riding in a vehicle where the driver is intoxicated (Yellman et al., 2020), and being involved in sexual violence or nonconsensual sexual activity (Basile et al., 2020). It is therefore important to determine the utility of instruments designed to collect information about self-reported substance use and other health risk behaviors among adolescents by evaluating the psychometric properties of these measures. This is an important first step to understanding the larger phenomenon of adolescent health risk behavior and determining ways to mitigate its effects.
Health risk behaviors related to substance use and unprotected sex are of particular concern given their high prevalence rates among adolescents (Johnston et al., 2020). In the 2019 Monitoring the Future survey of over 42,000 youths, the most commonly used substances among adolescents within the last 30 days were alcohol (18.2%), marijuana (15.6%), and vaping nicotine (18.1%; Johnston et al., 2020). Some adolescents begin engaging in substance-related risk-taking behaviors in early adolescence (ages 12–13 years), with prevalence of use increasing with age across substances (Charles et al., 2017; Johnston et al., 2020; Substance Abuse and Mental Health Services Administration, 2017). For instance, in the 2019 Monitoring The Future adolescent survey, rates of 30-day alcohol (29.3%), marijuana (22.3%), and vaping nicotine (25.5%) use among 12th graders were higher than rates of use among eighth graders (7.9%, 6.6%, and 9.6%, respectively; Johnston et al., 2020). Individuals who begin using substances at an earlier age are at an increased risk of developing substance use disorders, with one national study indicating that 14 years is the median age of onset for a DSM-IV-TR diagnosis of a substance abuse (Swendsen et al., 2012). The prevalence of youth engaging in risky sexual behaviors (e.g., multiple sexual partners, no contraceptive use) is also high, which may contribute to long-term health consequences including unintended pregnancy, sexually transmitted infections, or HIV/AIDS (Szucs et al., 2020). Additionally, risky sexual behaviors may increase the likelihood of sexual violence victimization during childhood and adolescence (Basile et al., 2020). Importantly, the prevalence rates of health risk behaviors are not evenly distributed in the population. At-risk adolescents, which include adolescents who have a) participated in delinquent activity, b) been charged with a crime, c) dropped out of school, or d) come from socioeconomically disadvantaged backgrounds (Boyce et al., 2008; Mason et al., 2010; Wang & Fredricks, 2014) are more likely to engage in these behaviors than are youths from other backgrounds. Thus, determining the psychometric properties of self-report measures of these behaviors among at-risk youths is especially important to understanding the nature of their health risk behaviors.
Health risk behaviors are typically measured via retrospective self-report questionnaires that cannot typically be verified in a way that is ethical, feasible, or cost-effective (Rosenbaum, 2009). However, the accuracy of these self-reports may be compromised due to several factors. Some health risk behaviors are sensitive in nature; therefore, many respondents may not be willing to explicitly report them (Norwood et al., 2016). In addition, some behaviors may be difficult to accurately recall, leading to an underreporting of behaviors (Nyitray et al., 2010). Recall difficulty may be especially significant when respondents are asked to remember their substance use, as being under the influence of substances may impair the respondent’s memory (Norwood et al., 2016). Finally, some behaviors may be purposefully under- or over-reported by adolescents because engaging in such behaviors is considered socially undesirable or desirable, respectively (Norwood et al., 2016). Such factors can make it difficult to properly interpret and apply results from health risk behavior self-report measures.
Evaluating the properties of measures of adolescent self-reported health risk behaviors is useful for many reasons. For example, it can increase confidence in data collected on the prevalence of, and trends in, health risk behaviors, which can inform health policies and practices at national, regional, and international levels (Bae et al., 2010; Brener et al., 2003). Additionally, results based on these measures can assist in identifying target populations who may benefit from prevention and intervention efforts (Baheiraei et al., 2012). One way to examine the confidence that researchers can have in self-reported health risk behaviors is to examine the test-retest reliability of assessment instruments. Test-retest reliability is the extent to which a measure yields consistent results when administered across two timepoints (Polit, 2014). It is often assessed using intra-class coefficients for continuous data and kappas or weighted kappas for ordinal and dichotomous data collected across timepoints (Cicchetti et al., 1990; Polit, 2014). For measures asking about lifetime substance use and other historical information, responses should be similar when the measure is given multiple times within a relatively short interval.
Test-retest reliability has been examined across multiple health risk behavior studies among adolescents, with far more research focusing on measures of substance use than those that assess risky sexual behavior. One measure of substance use that has been examined in this manner is the CRAFFT, which is a 6-item screener named for the mnemonic that covers the topics in the six items. It is used to help identify adolescents who are at-risk for engaging in problematic substance use (Levy et al., 2004). Results from prior studies indicate that the CRAFFT has acceptable (i.e., ICC >0.70; De Vet et al., 2011; κ > .41; Cohen, 1960) test-retest reliability among adolescents in outpatient primary care settings when assessed at 1-week time intervals (ICC= 0.93; Levy et al., 2004). Other substance use measures that have been tested in similar studies include the Form 90 Drug and Alcohol (ICC = 0.71; Slesnick & Tonigan, 2004), the Problem Oriented Screening Instrument for Teenagers (ICCs= 0.72-0.88; Knight et al., 2001), and the Substance Abuse Subtle Screening Inventory (ICCs = 0.81-0.92; Feldstein & Miller, 2007), all of which have demonstrated acceptable test-retest reliability in adolescent samples assessed across 1- and 2-week time intervals. Though less research exists on test-retest reliability of risky sexual behaviors, previous research indicates that adolescents tend to provide adequately reliable information regarding risky sexual behaviors. Using the Sexual Risk Behavior Assessment Schedule for Homosexual Youths interview, Schrimshaw and colleagues (2006) found that sexual minority youths reliably (ICCs = 0.77-0.97) reported risky sexual behaviors after a 2-week period. In another study, the authors found that adolescent females reliably reported information about risky sexual behaviors over a 1- to 2-week period (ICC = 0.74; Sieving et al., 2005). Similar results have been found when testing sexual risk behavior reporting among racial and ethnic minority adolescent samples (ICCs = 0.65-1.00, Sneed et al., 2001).
In terms of more comprehensive measures of engagement in high-risk behaviors, the Adolescent Risk-Taking Questionnaire demonstrated acceptable test-retest reliability on most scales when adolescents were tested after a 1-week recall period (rs = 0.35-0.80; Gullone et al., 2000). Flisher and colleagues (2004) also found that adolescents reliably (κs ≥ .41) recalled information regarding various risk behaviors when asked in a dichotomous (i.e., “yes” or “no”) format.
One of the most common and well-established self-report measures of health-risk behaviors among adolescents in the United States is the Youth Risk Behavior Survey (YRBS), a health risk assessment measure that was developed in 1989 by the Centers for Disease Control and Prevention (CDC; Kolbe et al., 1993). The YRBS focuses on risky behaviors that typically develop during adolescence and young adulthood. Sample behaviors include tobacco use, drug use, alcohol use, self-harming behaviors, and risky sexual behaviors. Respondents provide information on their engagement in the behavior, including the frequency, age of onset, and consequences of the behavior. The YRBS has demonstrated acceptable or greater test-retest reliability (i.e., kappa mean scores at 58.1% or higher) among community middle (Zullig et al., 2006) and high school (Raghupathy & Hahn-Smith, 2012) samples when assessed at 2-week intervals. In addition, an adapted version of the YRBS used in Korea demonstrated moderate to excellent test-retest reliability on a majority of items assessed (Bae et al., 2010). However, another study found greater stability in estimates over time for items related to sex, drugs, alcohol, and tobacco as compared to other risk behaviors (e.g., weight control behaviors; Rosenbaum, 2009). Other research has found that several items on the YRBS significantly differed across a 2-week period (Brener et al., 2002). Taken together, these findings provide promising indicators of the test-retest reliability of the YRBS but also suggest a need for more research on this measure, especially in diverse populations that have not been well-represented in previous studies. Expanding research can help determine the confidence with which the YRBS can be used to accurately estimate the prevalence of risk behaviors in a variety of adolescent populations.
The present study aims to evaluate the test-retest reliability of the YRBS among at-risk adolescents over a short period of time (3–12 days). Specifically, the study compares responses to 32 health-risk related questions involving five types of risk behaviors (i.e., alcohol use, marijuana use, other drug use, tobacco use, and risky sexual behaviors) using kappa (dichotomous items) and weighted kappa (ordinal items) analyses. Kappa and weighted kappa were chosen as they are appropriate analyses for these dichotomous and ordinal data (vs. ICC; Cicchetti et al., 1990). Based on prior literature and expectations for sound psychometrics, it is hypothesized that all YRBS items will demonstrate acceptable or better test-retest reliability.
Methods
Participants
Demographic Variables.
Measures
Demographics
Participants were asked basic demographic questions including their age, gender, ethnicity, arrest history, and who they lived with before enrolling in YCA.
Youth Risk Behavior Survey
Test-Retest Reliability Across Health Behaviors on the YRBS.
Note: The underlined kappa on the first line of each subsection represents the mean kappa for that category. *denotes dichotomous items that were analyzed using kappa. Remaining items were ordinal items analyzed using weighted kappa. The Time 1/2 percentages reflect % endorsing the higher risk categories on the ordinal scale.
Procedure
This study received approval from the Institutional Review Board of the corresponding author’s institution. Informed consent to recruit participants and conduct the study at YCA was obtained from the program director, who serves as guardian ad litem for adolescents during their stay. All youths who were residents at the time of data collection were eligible to participate in the study. Groups of residents who shared living quarters and general schedules were brought to a classroom where trained research assistants explained the study procedures and goals to the entire group. This was repeated for the multiple groups in residence at times convenient for each group’s schedule. Participants provided informed assent (younger than 18) or consent (18 years old) before participation. Participation was voluntary, and youths’ status in the program was unaffected by their decision to participate in the study. Program staff did not receive results for specific participants and information about residents who did or did not participate in the study was not shared with the program. Data were collected in group testing sessions via computers equipped with Qualtrics survey software over an approximately 2-week period as part of a larger battery of self-report measures. The mean length of time that youths had been in the program at the first testing session was 12.4 weeks (range: 10–14 weeks). The YRBS survey was administered twice, with the amount of time between administrations ranging from three to 12 days.
Results
The sample consisted of 156 participants who were administered the YRBS survey on two separate days with a range of three to 12 days (M = 8.28, SD = 3.13) between administrations. Data were screened for missing or problematic values and to determine if there were any outliers. No issues were observed. Kappa (16 dichotomous items) and weighted kappa (16 ordinal items) analyses were conducted in SPSS 25 to determine the test-retest reliability estimates from each item, each category, and across all items measured (Cohen, 1968). To examine whether testing comparisons between measures completed within a very short window differed from those with a slightly longer test-retest window, analyses were conducted without participants who only had 3 days between administrations (n = 24; remainder of participants had 6–12 days between administrations). The reliability of items was similar (e.g., same classification of level of agreement) when participants with 3 days were included as compared to when they were excluded. As such, all participants were retained.
Information about the prevalence of high-risk behavior is presented in Table 2 along with the reliability data described below. Ordinal items (e.g., age at first use) were collapsed into two groups of responses based on being relatively higher versus lower risk behavior. The percentages in the Time 1/Time 2 columns in Table 2 thus reflect the number of participants who endorsed the ordinal responses deemed higher risk (e.g., use <13 years old).
Reliability of the YRBS
The test-retest reliability estimates of all items used from the YRBS survey are displayed in Table 2. Dichotomous items were analyzed using kappa and ordinal items were analyzed using weighted kappa. None of the items had kappas between .81 and 1.00 (almost perfect agreement). However, there were 15 items (48.9%) that had kappas between .61 and .80 (substantial agreement) and 16 items (50.0%) that had kappas between .41 and .60 (moderate agreement). One item (“During your life, how many times have you taken steroid pills or shots without a doctor’s prescription?”) had fair agreement (i.e., kappa between .21 and .40; κ = .36). The average kappa across all items was in the substantial agreement range (κ = .63).
Reliability estimates were also examined by each category. Overall, items related to marijuana use were most reliably reported (κ = .77) followed by the sexual behaviors category (κ = .67), the cigarette and tobacco category (κ = .61), and the alcohol category (κ = .61). All of these categories had test-retest reliability in the substantial agreement range. The items related to other drug use had moderate test-retest reliability (κ = .51).
All items within the sexual behaviors category had kappas ranging from .52 to .79 indicating moderate to substantial agreement. Within the sexual behaviors category, the most consistently reported items related to the sex of partners with whom they have had sexual contact during their life (κ = .79), how many people they have had sexual intercourse with during their life (κ = .75), and which sexual orientation best describes them (κ = .75).
Within the cigarettes/tobacco category, items had kappas ranging between .43 and .73, which indicates moderate to substantial agreement. The most consistently reported items within the cigarettes/tobacco category were related to whether they have ever tried cigarette smoking (κ = .73), how many cigarettes they smoke per day on the days they smoke (κ = .70), and how old they were the first time they smoked a whole cigarette (κ = .64).
Items in the alcohol category had kappas ranging from .52 to .66 indicating moderate to substantial agreement. Within the alcohol category, the most consistently reported item was how often they have five or more drinks of alcohol in a sitting (κ = .66) and the least consistently reported item reported was about their age at first drink (κ = .52). Items within the marijuana category had very similar reliability values (κs = .76-78). Within the other drugs category, items had kappas ranging between .36 to .60 indicating fair to moderate agreement. The items that were most consistently reported within this category related to how many times within their life they have used ecstasy or MDMA (κ = .60), hallucinogenic drugs (κ = .58), and methamphetamine during their life (κ = .58). The least reliably reported item related to steroid use (κ = .36).
Discussion
The current study assessed the test-retest reliability of three categories (i.e., alcohol and other drugs, tobacco, and risky sexual behaviors) of the Youth Risk Behavior Survey (YRBS; CDC, 2018) on a sample of “at-risk” adolescents (ages 16–18) in a National Guard Youth ChalleNGe Program site located in the southeastern United States. Overall, findings indicate moderate to substantial test-retest reliability for all items except one about lifetime steroid use, suggesting the at-risk youth reliably reported health risk behaviors over short intervals. The average kappa across all measures was in the substantial agreement range. An examination of each category revealed that all items within the tobacco, alcohol, and sexual behaviors categories displayed moderate to substantial agreement, whereas items in the marijuana category were all in the substantial range and those in the other drugs category were in the fair to moderate range. Overall, the test-retest reliability of the YRBS was acceptable in this study, though not as high for other drug use as it was for other categories of risk behavior. This indicates that adolescents in the YCA program were relatively consistent in how they reported a range of risk behaviors.
Based on these findings, the YRBS demonstrated test-retest reliability that is comparable to that of similar risk behavior assessment instruments such as the CRAFFT (Levy et al., 2004), Form 90 Drug and Alcohol (Slesnick & Tonigan, 2004), Substance Abuse Subtle Screening Inventory (Feldstein & Miller, 2007), and Sexual Risk Behavior Assessment Scale for Homosexual Youths (Schrimshaw et al., 2006). It should be noted that although all of these scales have been found to have at least adequate test-retest reliability, the statistical tests used to determine this (ICCs) in prior studies of these instruments are not appropriate for use with the dichotomous and ordinal data produced by the YRBS so the values cannot be compared directly across studies. All but one item in this study had κs ≥ .41, indicating moderate or substantial reliability, and this is comparable to a study examining the test-retest reliability of a yes/no dichotomous risk behavior survey examined among adolescents (Flisher et al., 2004). Notably, the YRBS performed as well as other risk behavior measures while being tested in a unique, residential sample with greater engagement in risk behaviors than the general population. Additionally, the test-retest reliability estimates for the YRBS found in the current study were similar to those found when the YRBS was administered to community middle (mean κ = .63; Zullig et al., 2006) and high school students (mean κ = .58; Raghupathy & Hahn-Smith, 2012), as well as a national sample of adolescents in the United States (median tetrachoric correlation = .87; Rosenbaum, 2009). Therefore, results support the notion that the YRBS can successfully be implemented as a tool to assess for a variety of risk behaviors in settings where such behaviors are likely more prevalent than in the community. This is important to consider given that the YRBS is a comprehensive measure of risk behaviors and its use may be more cost-effective than the use of multiple narrowband instruments.
In the current study, higher test-retest reliability was found among items assessing risky sexual behaviors, tobacco, alcohol, and marijuana use relative to those assessing other drug use. There are a few potential explanations as to why participants may have reported this type of information more consistently across time points. First, although confidentiality was explained to all adolescents before they participated, some individuals may have felt reluctant to disclose information regarding use of “hard drugs” that, if known by others, could result in serious legal repercussions (Brener et al., 2003). Second, research has shown that substance use during adolescence may affect brain development and cognition (Squeglia et al., 2009) so it is possible that adolescents who use a wider variety of substances or who use stronger substances have genuine difficulty recalling their use. Third, participants may have been confused by some items. For example, two of the items with the lowest test-retest reliability in this study were “During your life, how many times have you taken steroid pills or shots without a doctor's prescription?” and “During the past 30 days, on how many days did you smoke cigars, cigarillos, or little cigars?” Participants may have been unsure what these less commonly used substances were or whether they had used them, so they answered differently at different timepoints. Additionally, any disagreement between the two timepoints would have a stronger impact on reliability statistics for these low base rate behaviors than they would for behaviors that were more common in the sample.
This study evaluated test-retest reliability of the YRBS using kappa and weighted kappa estimates across a short recall period (3–12 days). Additionally, the sample was fairly large and more racially diverse than those in previous studies on assessing risk behavior. The study also involves an “at-risk” sample of primarily male adolescents, which is an important group to study, though it is also the case that results may not generalize to other, different populations. Another limitation of this study is that self-reported results on the YRBS could not be verified and may have been subject to reporting biases. Despite these limitations, the results of this study suggest that the YRBS generally demonstrates adequate test-retest reliability and can be used with confidence in populations like the one in this study. Future research should continue to focus on the psychometrics of measures of self-reported health risk behaviors so that researchers are able to use the most reliable and valid assessments to identify individuals and target populations who may benefit from prevention and intervention efforts. Additionally, it would be beneficial for future studies to examine whether the terminology used for different substances on the YRBS match the names that adolescents are familiar with and whether all substances and methods used by adolescents (e.g., vaping) are adequately covered by this measure. It is possible that updating the items to more closely match adolescents’ experiences would improve test-retest reliability as well as the validity of results.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
