Abstract
Anxiety disorders (ADs) frequently lead to significant impairment across important domains of youth functioning. Yet until recently, clinical research and assessment have largely neglected the measurement of anxiety-related impairment. In this article, we review the evidence for five extant rating scales of youth anxiety-related impairment, guided by widely used evaluative criteria. Emerging psychometric data show the potential utility of these rating scales for achieving different assessment functions. Of the five scales, the Child Anxiety Impact Scale, particularly the parent-report version, has been the most researched one. Promising psychometric data support its use for assessing anxiety-related impairment in school, social, and family/home domains of functioning. We conclude with recommendations for growing this research base and for incorporating these rating scales into the youth AD clinical and research assessment process.
Approximately one in three young people meet diagnostic criteria for an anxiety disorder (AD) by the time they reach 18 years of age, leading to significant impairment across school, social/peer, and family/home domains of functioning (Merikangas et al., 2010). For example, separation anxiety disorder (SAD) often leads to school attendance problems, contributing to impairment in youth’s academic performance. Social anxiety disorder (SoAD) often leads to social isolation, contributing to impairment in youth’s peer relationships. Generalized anxiety disorder (GAD) often leads youth to seek excessive parental reassurance, contributing to impairment in youth’s family relationships. Anxiety-related impairment presents many challenges in youth’s day-to-day lives and is often observable and concerning to those involved in youth’s lives, including parents and teachers. As such, impairment has been conceptualized as a socially and ecologically valid indicator of psychopathological conditions, including ADs, and constitutes a major reason for treatment referral and marker of treatment outcome (Fabiano & Pelham, 2016).
The assessment of impairment is also a formal part of the diagnostic process for ADs and other mental health conditions. Since the third edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-III; American Psychiatric Association [APA], 1980), impairment has been conceptualized as a distinct criterion, separate from symptoms, needed for the diagnosis of most disorders. In the current version, DSM-5-TR (APA, 2022), the impairment criterion for the most common ADs of childhood—SAD, SoAD, and GAD—is that the fear or anxiety “causes clinically significant distress or impairment in social, academic, occupational, or other important areas of functioning.” For selective mutism, impairment in the school or social domain is required for diagnosis (“The disturbance interferes with educational or occupational achievement or with social communication”). Beyond the DSM, the International Classification of Functioning, Disability, and Health (ICF; World Health Organization, 2001) also emphasizes the role of impairment in diagnosis (Üstün et al., 2010). In sum, anxiety-related impairment is a socially and ecologically valid indicator of disorder, a marker of treatment need and prognosis, and an integral aspect of diagnosis.
Impairment Versus Severity
Despite the emphasis on impairment in the DSM and ICF, it has been largely neglected in youth AD research, including in randomized clinical trials (RCTs). Instead, focus has been on anxiety symptom severity and its assessment (Dickson et al., 2022). Symptom severity has often been used to convey impairment. That is, the severity of an individual’s anxiety symptoms is used to infer degree of impairment (Rapee et al., 2012). However, symptom severity and impairment are separate, albeit related, constructs. Symptom severity refers to the frequency, extent, and duration of behavioral or emotional symptoms for a given clinical disorder. Impairment refers to the way in which symptoms interfere with daily functioning and activities. For example, the GAD symptom of excessive worry about multiple topics for more days than not, even if severe, does not capture how much these worries get in the way of a child’s schooling, social life, or family relations. For many youth and their caregivers, the latter would be the impetus for treatment as opposed to the symptom in and of itself.
The importance of assessing separately and not conflating anxiety severity and impairment has empirical support as well. Meta-analyses show no more than moderate correlations between anxiety symptom severity and functional impairment (e.g., r = .34; McKnight et al., 2016). In addition, many children and adolescents experience substantial anxiety-related impairment though not meeting DSM diagnostic criteria for an AD (Angold et al., 1999). For example, Comer et al. (2012) found that approximately 15% of treatment-seeking anxious youth (N = 650, ages 5–19 years) did not meet diagnostic criteria for a specified AD, mainly because they did not meet symptom counts. Youth AD RCTs also show that high baseline impairment levels are associated with poor treatment outcomes, including following cognitive-behavioral therapy (CBT), even when controlling for anxiety symptom severity (Compton et al., 2014; Wergeland et al., 2016). Altogether, these data show that symptom severity and impairment are complementary but separate constructs and highlight the likely utility of assessing both.
Current Practices in Anxiety Assessment
Youth anxiety symptom severity measures (youth and parent versions) are a primary treatment outcome in all youth AD RCTs. The early trials of the 1990s relied largely on the Revised Children’s Manifest Anxiety Scale (Reynolds & Richmond, 1978), with later trials relying more on multidimensional scales such as the Multidimensional Anxiety Scale for Children (March et al., 1997), Screen for Child Anxiety Related Disorders (SCARED; Birmaher et al., 1997), and Spence Children’s Anxiety Scale (Spence, 1998). We recently conducted reviews of these youth self-report (Etkin, Shimshoni, et al., 2021) and parent-report (Etkin, Lebowitz, & Silverman, 2021) anxiety symptom rating scales using a set of criteria developed for evaluating measures’ psychometric properties (Hunsley & Mash, 2008; Youngstrom et al., 2017). Our reviews revealed that most extant anxiety symptom rating scales had supportive evidence for multiple assessment functions (e.g., screening, discrimination; Silverman & Kurtines, 1996). The youth- and parent-report SCARED (SCARED-C/P; Birmaher et al., 1999) stood out because it is publicly available, relatively brief (41 items), and has evidence of good-to-excellent norms, internal consistency, test-retest reliability, content validity, construct validity, discriminative validity, treatment sensitivity, and validity generalization.
In addition to youth and parent anxiety symptom rating scales, most youth AD RCTs use a diagnostic interview to determine inclusion and exclusion criteria and/or remission following treatment. The Anxiety Disorders Interview Schedule for Children and Parents (ADIS-C/P; Silverman & Albano, 1996) is the “gold standard” interview for differential diagnosis (Creswell et al., 2021). Symptom severity and impairment are rated on separate scales, but as per the DSM, final determination of whether a child meets diagnostic criteria for a given disorder integrates information about severity and impairment. In an earlier review, we found the ADIS-C/P has evidence of excellent interrater reliability, test-retest reliability, content validity, construct validity, validity generalization, and clinical utility for differentiating between ADs and ADs from related disorders (Byrne et al., 2018).
Relatively few youth AD RCTs have included a measure of anxiety impairment alongside symptom severity rating scales and diagnostic interviews (Dickson et al., 2022). Those that did typically included a general functional impairment measure, not one specific to anxiety. The most widely used general impairment measure is the clinician-rated Children’s Global Assessment Scale (CGAS; Shaffer et al., 1983). With respect to impairment measures that are specific to anxiety, the clinician-rated Pediatric Anxiety Rating Scale (PARS; Research Units on Pediatric Psychopharmacology Anxiety Study Group, 2002) is the most widely used one. Both the CGAS and PARS are unidimensional or “global,” yielding a total impairment score. Global measures quantify the severity of impairment independent from symptoms/diagnoses. However, global measures do not provide information about anxiety-related impairment in specific domains of functioning (i.e., school, social, family). There are also measures that assess impairment in specific domains of functioning, but not specifically related to anxiety (see Villarreal et al., 2021, for a review).
Present Study
We focus here on reviewing measures that assess impairment specifically related to anxiety. This article thereby extends previous reviews of the more widely used anxiety symptom severity scales (Etkin, Lebowitz, & Silverman, 2021; Etkin, Shimshoni, et al., 2021), diagnostic interviews and global clinician-rated impairment measures (Byrne et al., 2018; Rapee et al., 2012), and non-anxiety-specific impairment measures (Villarreal et al., 2021). Anxiety-related impairment measures may provide richer information than global impairment measures to guide anxiety treatment planning and assess treatment progress/outcome. Anxiety-related impairment measures that moreover assess different domains of functioning may provide especially rich and clinically useful information (e.g., to aid in selecting treatment targets in impaired domains). Our focus is on rating scales, which have high pragmatic utility. They are brief and simple to administer, often available for multiple informants and at no cost, and versatile for achieving the assessment goals of screening, treatment planning, and monitoring (Silverman & Kurtines, 1996). We utilize the criteria developed by Hunsley and Mash (2008) and Youngstrom et al. (2017) as a guide to preliminarily evaluate these scales’ psychometric properties. We acknowledge that the ratings we assign are not static and will evolve over time as additional research accumulates and as the criteria themselves are refined. Our aim is therefore to summarize these scales’ currently available psychometric data, identify areas for additional research, and inform potential use of anxiety-related impairment rating scales in clinical and research endeavors.
Method
We first conducted literature searches for the youth and parent rating scales that assess anxiety-related impairment; specifically, the Child Anxiety Impact Scale (Langley et al., 2004), Child Sheehan Disability Scale (Whiteside, 2009), Child Anxiety Life Interference Scale (Lyneham et al., 2013), Adolescent Life Interference Scale for Internalizing Symptoms (Schniering et al., 2021), and the Overall Anxiety Severity and Impairment Scale (Comer et al., 2022). We searched for the names of these five scales in Google Scholar to identify articles examining their psychometric properties, which underlie their suitability for performing different assessment functions.
We used the set of criteria described earlier to guide our evaluation of these scales’ psychometric properties. Specifically, we evaluated their norms, internal consistency, test-retest reliability, content validity, construct validity, discriminative validity, and treatment sensitivity (Hunsley & Mash, 2008; Youngstrom et al., 2017). The first author rated each of these psychometric properties as “adequate,” “good,” or “excellent” based on current research; co-authors reviewed the research and corroborated ratings. Specific benchmarks for these ratings are found in Table 1. Psychometric details of each study for each scale, on which the evaluative ratings are based, are found in Table 2. Importantly, the small total number of psychometric studies for each scale limited our stringent application of criteria that require evidence from multiple studies to make a rating. With this caveat in mind, we first summarize the ratings for each scale, next synthesize the state of the evidence base, and conclude with preliminary recommendations regarding each scale’s appropriateness for performing different assessment functions and future directions (De Los Reyes & Langer, 2018).
Rubric of Criteria for Evaluating Measures’ Psychometric Properties (De Los Reyes & Langer, 2018).
Note. AUC = Area under the curve; M = Mean; SD = Standard deviation.
Anxiety-Related Impairment Rating Scales’ Psychometric Properties and Evaluations.
Note. AD = Anxiety disorders; GAD = Generalized anxiety disorder; SAD = Separation anxiety disorder; SoAD = Social anxiety disorder; OCD = Obsessive-compulsive disorder; RCT = Randomized controlled trial; CBT = Cognitive-behavioral therapy; IOP = Intensive outpatient anxiety program; ROC = Receiver operating characteristics; AUC = Area under the curve; ICC = Intraclass correlation; CFA = Confirmatory factor analysis; EFA = Exploratory factor analysis; ANCOVA = Analysis of covariance; PPP = Positive predictive power; ANOVA = Analysis of variance; HLM = Hierarchical linear modeling; CAIS-C/P = Child Anxiety Impact Scale-Child and Parent; CBCL = Child Behavior Checklist; ADIS-C/P = Anxiety Disorders Interview Schedule for Children and Parents; SASC-R = Social Anxiety Scale for Children - Revised; MASC = Multidimensional Anxiety Scale for Children; SDQ = Strengths and Difficulties Questionnaire; SCARED = Screen for Child Anxiety Related Emotional Disorders; PARS = Pediatric Anxiety Rating Scale; CGAS = Children's Global Assessment Scale; SDS = Sheehan Disability Scale; SCAS = Spence Children's Anxiety Scale; BASC = Behavioral Assessment System for Children; CGI = Clinical Global Impressions Scale; CESD = Center for Epidemiological Studies Depression Scale; PANAS = Positive and Negative Affect Scale.
Results
Child Anxiety Impact Scale
The Child Anxiety Impact Scale was developed by Langley et al. (2004) as the first measure to assess anxiety-related impairment across domains of school, social, and home/family. It was designed for use in treatment as a baseline and outcome measure and to identify treatment targets. It was developed originally as a parent-report scale (CAIS-P); a youth self-report version (CAIS-C) was developed later by Langley et al. (2014). Both youth and parent versions contain 27 items in response to the prompt, “In the past month, how much trouble has your child [you] had doing the following things because of feeling nervous, anxious or afraid?” Examples of items include “Getting to school on time” (school domain; 10 items), “Making new friends” (social domain; 11 items), and “Getting ready for bed” (family/home domain; 6 items). Items are rated on a four-point Likert-type scale ranging from 0 (not at all) to 3 (very much).
The CAIS-P is rated good for norms given that it has descriptive data from clinical samples and one matched community sample (Ns = 92–488). The CAIS-C is rated adequate for norms, as there are descriptive data from clinical samples only, not community. Internal consistency is rated good for the CAIS-C/P, as most Cronbach alphas are above .80 for both the total score and subscale scores. Content validity for the CAIS-C/P is rated adequate, as test developers created a representative set of items for each domain assessed but did not utilize external judges (e.g., experts, pilot participants). Construct validity is rated excellent for the CAIS-C/P, as they demonstrate convergent, divergent, and incremental validity. Confirmatory factor analysis further supported the hypothesized three-factor solution for the CAIS-C/P; the specified factor structure provided an adequate fit to the data. Discriminative validity is rated excellent for the CAIS-P; parent-report mean scores distinguish clinical and nonclinical samples, and receiver operating characteristics (ROC) analyses show that scores identify youth recovery from any AD, SAD, SoAD, and GAD with area under the curve scores (AUCs) above .75. The CAIS-C, however, did not achieve AUCs above .70, and therefore has not yet been shown to have adequate discriminative validity. Treatment sensitivity for the CAIS-C/P is rated excellent, as they have evidence of significant change in scores from baseline to post-treatment in several RCTs of different youth anxiety treatment approaches (e.g., individual and group CBT; medication; Creswell et al., 2017; Taylor et al., 2018). No study has yet to evaluate test-retest reliability of the CAIS-C/P.
Child Sheehan Disability Scale
The Child Sheehan Disability Scale youth (CSDS) and parent versions (CSDS-P) were developed by Whiteside (2009) by modifying the Sheehan Disability Scale (SDS; Sheehan, 1986). The SDS is a widely used and validated measure of adult functional impairment due to psychiatric disorders; it was extended downward by rewording items and generating more youth impairment domains. At the time, the 27-item CAIS-P was the only available domain-specific youth anxiety-related impairment rating scale. Whiteside (2009) therefore developed the CSDS/P to fulfill the need for a briefer (3–5 items) domain-specific rating scale with corresponding youth and parent versions. The CSDS consists of three total items assessing anxiety-related impairment in school (“How much have your fears and worries messed things up with school and homework?”), social (“How much have your fears and worries messed things up with friends?), and family/home domains (“How much have your fears and worries messed things up at home?”). The CSDS-P includes the same three items plus two additional items assessing the degree that the child’s symptoms interfere with the parent’s work and social functioning. As such, the CSDS-P has two subscales, parent and child interference, and the CSDS has one total scale only. For both versions, items are rated on an 11-point Likert-type scale ranging from 0 (not at all) to 10 (very, very much).
The CSDS/P are rated good for norms, as there are descriptive data for community and different types of clinical samples (i.e., outpatient, intensive outpatient, residential; Ns = 107–1,481). They are also rated good for internal consistency, as Cronbach’s alphas are above .80 for total scale scores (CSDS-P subscale alphas have not been reported). The CSDS-P is rated good for test-retest reliability given a correlation of .89 over 6 weeks; the CSDS is rated adequate (r = .44). The CSDS/P are rated adequate for content validity, as they were modified versions of an adult measure and did not undergo additional scale development procedures (e.g., pilot testing). The CSDS/P also are rated good for construct validity, demonstrated by concurrent, convergent, divergent, and incremental validity. To date, their factor structures have been evaluated with principal components analysis (exploratory procedures), not confirmatory factor analysis. Results supported the hypothesized two components for the parent version and one component for the youth version. Discriminative validity is rated adequate for the CSDS and good for the CSDS-P. They both can discriminate among clinical samples (e.g., outpatient and residential) and clinical from community samples with acceptable sensitivity, specificity, and positive predictive power. Only the CSDS-P, not the CSDS, could discriminate among comorbid and noncomorbid anxiety diagnoses in youth. The CSDS/P also show evidence of significant change from baseline to post-treatment in two psychometric studies but are rated adequate for treatment sensitivity because they have not been used widely in RCTs or shown consistent evidence of change in different treatment modalities (e.g., Brennan et al., 2022; Schneider et al., 2018)
Child Anxiety Life Interference Scale
The Child Anxiety Life Interference Scale was developed by Lyneham et al. (2013) as a youth-report (CALIS-C) and parent-report (CALIS-P) measure of anxiety-related impairment in domains of functioning both within and outside the home. Authors also sought to capture the extent that youth anxiety symptoms impair their parents’ functioning in addition to their own, thereby increasing the scope of impairment relative to the CAIS-P. In addition, authors sought to develop a measure that was not a downward extension of adult impairment measure, like the CSDS/P. The CALIS-C/P contain an initial question asking for a rating of upset/distress caused by anxiety, followed by the prompt, “How much do fears and worries make it difficult for you [your child] to do the following things?” Four items assess situations outside of the home (e.g., “Being with friends outside of school”), and five items assess situations at home (e.g., “Daily activities such as getting ready for school, going to sleep and homework”). The CALIS-P contains an additional prompt, “How much do your child’s fears and worries interfere with your everyday life in the following areas.” Seven items assess interference the child’s anxiety causes in the parent’s life (e.g., “Your relationship with friends”; “Your ability to go out to activities/events without your child”). Items are rated on a five-point Likert-type scale ranging from 0 (not at all) to 4 (a great deal).
The CALIS-C/P are rated adequate for norms based on data from a two-site clinical sample (N = 622) and smaller community sample (N = 40). Internal consistency for the CALIS-C/P is rated good, as Cronbach’s alphas are above .80 for all total and subscales, except for the youth- and parent-rated at-home interference subscales (αs = .70–.78). The CALIS-C/P are also rated good for test-retest reliability, with correlations mostly above .70 over the span of 2–3 months. Content validity is rated adequate, as items represent all domains authors intended to assess, but authors did not utilize additional content-development procedures (e.g., pilot testing). The CALIS-C/P are rated adequate for construct validity. They have demonstrated convergent, divergent, and incremental validity, but to date, they have been subjected to exploratory factor analysis, not confirmatory. Results support the hypothesized two-factor solution for the youth version and three-factor solution for the parent version. Discriminative validity for the CALIS-C/P is also rated adequate because mean scores significantly differed between a matched clinical and community sample. However, discriminative validity has not yet been tested in more clinically realistic conditions (i.e., not just comparing clinical and community samples, Table 1) or with ROC analyses. The CALIS-C/P are rated excellent for treatment sensitivity. Lyneham et al. (2013) showed significant effects of treatment arm and significant differences in mean scores when comparing waitlist and group CBT. They have also been used in other RCTs of different treatment types and modalities, showing significant and positive change from baseline to post-treatment (e.g., acceptance and commitment therapy; in-person and online CBT; Hancock et al., 2018; Rapee et al., 2021).
Adolescent Life Interference Scale for Internalizing Symptoms
The ALIS-I was developed by Schniering et al. (2021) as the first multidimensional measure of impairment related to anxiety and depression symptoms specifically for adolescents (ages 11–18 years; self-report only). The ALIS-I contains 26 items; adolescents rate the frequency with which they experience life interference due to their internalizing problems in four areas: withdrawal/avoidance (e.g., “Stayed away from activities”; 9 items), somatic symptoms (e.g., “Felt sick”; 3 items), peer problems (“Been left out of groups”; 4 items), and problems with study/work (e.g., “Struggled to do my work”; 6 items). Items are rated on a five-point Likert-type scale ranging from 0 (not at all) to 4 (all the time).
The ALIS-I is rated adequate for norms, as there are descriptive data for a clinical sample (N = 266) and a smaller community sample (N = 63) from three clinic sites. Internal consistency, assessed with McDonald’s omega, is rated good, with values ranging from .76 (somatic symptoms scale) to .94 (total scale). Test-retest reliability is also rated good. The scale was administered at baseline and post-treatment assessments for a waitlist group (n = 31), and intraclass correlations (ICCs) for the total scale and subscales ranged from .48 (peer problems scale) to .73 (withdrawal/avoidance scale). Content validity is rated excellent based on the scale-development procedures and item representation. An initial item pool was generated based on the literature and clinical experience of authors, which was then subjected to expert opinion from four senior clinical psychologists in the field and revised. Pilot testing with adolescents was then conducted with the revised form of the questionnaire, using the double-interview method to assess comprehension (Foddy, 1993). Items deemed too complex or redundant in content were excluded before the final version’s psychometric properties were tested. The ALIS-I is rated adequate for construct validity. An exploratory factor analysis demonstrated the hypothesized four factors, and convergent validity was shown through positive correlations with other internalizing measures (rs = .68–.81, ps < .001). Discriminative validity is also rated adequate, as independent samples t-tests showed that the mean total and subscale scores significantly differed for the clinical and community participants, with medium-to-large effect sizes. Treatment sensitivity has not yet been evaluated, and it has not yet been used in clinical trials.
Overall Anxiety Severity and Impairment Scale for Youth
The Overall Anxiety Severity and Impairment Scale for Youth (OASIS-Y) was developed by Comer et al. (2022) as a downward extension of the adult self-report OASIS (Norman et al., 2006). The OASIS is a five-item measure assessing anxiety severity and frequency and anxiety-related avoidance and impairment over the past week. The OASIS-Y is a parent-report measure only (i.e., parents rate their perspective of their child’s anxiety); there is no corresponding youth-report version. The OASIS-Y, similar to the original adult version, has one item assessing anxiety severity (“When your child feels anxious, how intense or severe is their anxiety?”), one item assessing frequency (“How often does your child feel anxious?”), and one item assessing avoidance (“How often does your child avoid situations, places, objects, or activities because of anxiety or fear?”). Two items, respectively, assess the degree to which anxiety interferes with youth’s (a) social life and relationships and (b) schoolwork or school/camp attendance. Authors further added two impairment items not adapted from the adult version assessing the degree to which youth’s anxiety interferes with (a) the family’s ability to function and (b) the parent’s/caregiver’s own personal functioning, work performance, or quality of life. Each of these seven items, which comprise a total scale, are rated on a five-point Likert-type scale with scores ranging from 0 to 4, with corresponding anchors for severity, frequency, avoidance, and impairment.
The OASIS-Y is rated adequate for norms, as there are descriptive data available for one ethnically diverse clinical sample (N = 132), although no community sample data have yet to emerge. It is rated good for internal consistency, given an alpha above .80 for the total scale. Content validity is rated adequate, as items were adapted from an adult version of the scale and not subjected to any further scale-development procedures (e.g., pilot testing). It also is rated adequate for construct validity. The hypothesized one-factor solution was supported with confirmatory factor analysis, and there is also support for convergent, divergent, and incremental validity. Finally, the OASIS-Y is rated adequate for treatment sensitivity as scores significantly declined from baseline to post-assessment during behavioral anxiety treatment and at a steeper rate for treatment responders than for treatment non-responders. Since the OASIS-Y has been only recently developed, it has not yet been used in other clinical trials. Test-retest reliability and discriminative validity for the OASIS-Y also have not yet been evaluated.
Discussion
Our review covers the five youth- and parent-report rating scales currently available to assess anxiety-related impairment in children and adolescents. We found that each scale operationalizes domains of impaired functioning differently. The CAIS-C/P measures impairment in the school, social, and home/family domains of functioning, each represented by a subscale. The ALIS-I also contains school and peer impairment subscales, in addition to somatic symptoms and withdrawal/avoidance subscales which are conceptualized as other areas of impairment. The CSDS/P and OASIS-Y include items assessing impairment in school, social, and family domains, but also other items related to youth anxiety (i.e., items assessing severity, frequency, and avoidance for the OASIS-Y; parent impairment for the CSDS-P), and each yield total/global scores only. Finally, the CALIS-C/P measures impairment in the domains of home, outside the home, and parents’ lives. On one hand, the measures’ differences may reflect reasonable, diverse perspectives of what may constitute impairment relating to youth ADs. On other hand, the differences may reflect the need for improved clarity about how anxiety-related impairment is conceptualized and measured. This will likely require research testing the incremental validity or utility in assessing specific domains of impairment over each other and over global assessment of impairment.
Although based on limited research, our criteria-driven evaluation reveals promising initial support for these anxiety-related impairment scales’ psychometric properties. We note the research is limited because only eight psychometric studies across the five scales exist at the time of writing this article. As such, the ratings we assigned to each scale are preliminary, as they are based on 1–3 studies per scale. The research is promising though because the data initially support each scale’s use for different assessment functions, including screening, treatment planning, and monitoring. We next provide a summary of the psychometric properties and corresponding recommendations for assessment functions.
Summary of Psychometric Properties
Norms
Of the five scales, only the CSDS/P and the CAIS-P were rated good for norms; the CAIS-C, CALIS-C/P, ALIS-I, and OASIS-Y were rated adequate. As research on these scales grows, we hope to see more studies include larger, more diverse samples from both community and clinical settings. Such studies would include descriptive data broken down by sociodemographic factors (e.g., youth gender and age), clinical factors (e.g., diagnoses and treatment setting), and total and subscale scores. These data will be instrumental in helping clinicians and researchers more accurately interpret scores. Nationally standardized measures designed to assess impairment in other populations/clinical presentations have been published (Goldstein & Naglieri, 2016) and may serve as a guide in this pursuit.
Internal Consistency
All five scales were rated good for internal consistency based on Cronbach’s alphas, or McDonald’s omega in the case of the ALIS-I, for total scale scores and subscale scores where applicable (i.e., for CAIS-C/P, CALIS-C/P, and ALIS-I). Our prior evaluative reviews also found evidence to support ratings of good or excellent internal consistency for all anxiety symptom rating scales (Etkin, Lebowitz, & Silverman, 2021; Etkin, Shimshoni, et al., 2021). Internal consistency is especially important to research in which study aims rest on reliably evaluating associations between anxiety-related impairment and symptom measures and other variables of interest. Given its importance, we note the caveat that ratings simply capture the magnitude of internal consistency and not the level of nuance involved in interpreting these estimates. Indeed, for common indices of internal consistency—namely Cronbach’s alpha—lower estimates can be expected for scales targeting broad constructs/multiple facets or with few items, and high estimates might indicate a narrow assessment that lacks sufficient content validity (McNeish, 2018). As such, these ratings should be interpreted in light of measures’ other characteristics and psychometric properties.
Test-Retest Reliability
Three of the five scales, the CSDS/P (rated adequate/good), CALIS-C/P (rated good), and ALIS-I (rated good) have evidence that scores are stable across time in community or clinical samples. Therefore, these scales are currently the best candidates for inclusion in longitudinal research on youth anxiety. The CAIS-C/P and OASIS-Y currently have not undergone test-retest reliability evaluation. Interestingly, research on test-retest reliability was similarly sparse for anxiety symptom rating scales, suggesting a future direction for the youth anxiety assessment literature as a whole (Etkin, Lebowitz, & Silverman, 2021; Etkin, Shimshoni, et al., 2021). As data accumulate, an important consideration will be the expected stability versus variability of the constructs assessed in each scale, which directly impacts the interpretation and adequacy of test-retest reliability estimates.
Content Validity
Four of the five scales were rated adequate for content validity because items reflected the constructs authors set out to measure. Only the ALIS-I was rated excellent because its development involved using experts and pilot participants to independently evaluate and refine items to ensure they were representative of content domains. While this is an important aspect of content validity, there are other aspects not captured in the criteria and long-standing debates about the best way to define and assess content validity (see Sireci, 1998). Regarding the scales’ items, the CSDS/P and OASIS-Y have the fewest number of items (3 and 7, respectively), with one item per domain of functioning assessed (e.g., both scales have one item to assess school impairment) that contribute to total scale scores. In contrast, both the CAIS-C/P and CALIS-C/P contain multiple items assessing each domain of functioning that are reflected by subscale scores (beyond the total scale score). The CAIS-C/P and CALIS-C/P may be especially useful for treatment planning and case conceptualization purposes because they assess specific aspects of functioning within the three larger domains. For example, they can be used to answer questions such as: Is a child’s impairment in school related to completing schoolwork, attendance, or both? In contrast, the shorter CSDS/P and OASIS-Y scales may be useful for frequent progress monitoring to ascertain whether impairment is changing more generally across domains.
Construct Validity
Ratings were variable across the five scales regarding construct validity, and there was likewise substantial variability related to the methods used to assess this criterion. The CAIS-C/P received the only rating of excellent because they have supportive data from more than one study for the multiple types of validity, including incremental. Importantly, the CAIS-C/P also have support for their construct validity from confirmatory factor analysis. Although not specified in the criteria, confirmatory factor analysis is an optimal way to establish a scale’s construct validity because it identifies and confirms the factors underlying a hypothesized construct and the patterns of item-factor relationships (Brown, 2015). This is especially important for impairment rating scales given past conflation of impairment with other aspects of anxiety, namely severity (Rapee et al., 2012).
The CSDS/P were rated good because they have supportive data from more than one study of multiple types of validity, including incremental (see Table 2). Yet, they do not yet have evidence of construct validity from confirmatory factor analysis (exploratory only). The CALIS-C/P, ALIS-I, and OASIS-Y were each rated adequate, despite not yet having independent replication, because they showed other important aspects of construct validity. The OASIS-Y has supportive data for multiple types of validity within the one published study and a confirmatory factor analysis supporting its hypothesized one-factor structure (Comer et al., 2022). Of note, this factor contains not only impairment items but also items assessing anxiety frequency, severity, and avoidance. To avoid conflating these distinct indicators of anxiety, it would be beneficial for future research to investigate the reliability, validity, and utility of the four impairment items as a distinct factor/subscale. The CALIS-C/P have evidence of convergent and divergent validity, and for the ALIS-I, convergent only. These scales’ structures are also supported by exploratory factor analysis and have not yet been subject to confirmatory factor analysis. Moving forward, construct validity should be assessed with the utmost consideration for conceptual issues that have plagued the construct of impairment. For example, researchers could think carefully about whether associations with symptom severity scales represent convergent or divergent validity. Further research evaluating these scales’ incremental validity will also help determine their relative utility, although it was promising to see that some of this evidence already exists.
Discriminative Validity
The CAIS-P was the only scale rated excellent because it could distinguish youth within a clinical sample with and without different AD diagnoses using ROC analyses. The CSDS-P was rated good because it could distinguish youth receiving different levels of anxiety treatment (e.g., outpatient vs. residential treatment), as well as youth with comorbid and noncomorbid ADs. These studies provide impressive examples of assessing a scale’s discriminative validity within different clinical settings where youth likely suffer from differing degrees of impairment. The CALIS-C/P, CSDS, and ALIS-I each were rated adequate, as they could distinguish clinically anxious and community youth based on mean scale scores. To increase ratings for these three scales, research using ROC analyses would provide additional confidence in the use of these scales for screening purposes. Finally, in the only study to examine discriminative validity of the CAIS-C with ROC analyses, impairment scores did not distinguish youth with and without AD diagnoses. Future research is needed to replicate and extend this study to determine whether anxiety-related impairment assessed with the CAIS-C can predict clinically meaningful distinctions.
Treatment Sensitivity
Our evaluation of treatment sensitivity derived primarily from the psychometric studies covered herein, but we also considered the extent to which these scales were used in treatment studies (for reviews, see Becker et al., 2011; Dickson et al., 2022). The CAIS-C/P and CALIS-C/P were rated excellent for treatment sensitivity and are highly recommended for inclusion in clinical trials to assess changes in impairment from baseline to post-treatment (see also Creswell et al., 2021). The CSDS/P were rated adequate because the psychometric evidence shows mean scores significantly decreased following treatment, but more clinical trial research is needed. The ALIS-I and OASIS-Y have no psychometric research on treatment sensitivity, and since they were only recently developed, they have not yet been included in clinical trials.
Limitations and Future Directions
Each rating scale we reviewed has psychometric strengths and unique features that contribute to their utility for assessing anxiety-related impairment. We have also noted that the scales all vary in their conceptualizations of anxiety-related impairment, and so direct comparisons among them should be made with caution. We further caution the readers not to conclude that those scales with no or adequate ratings (e.g., the lack of discriminative validity evidence for the CAIS-C) are not suitable for a given assessment purpose. Rather, this is pending determination as the research base grows. In addition, we noted throughout the article several caveats and limitations of the evaluative criteria. At the same time, the criteria offer a way to standardize evaluation of measures. This initial evaluation is needed to move youth anxiety assessment research forward, as it is still in its infancy relative to intervention research. The criteria themselves may likewise evolve over time as there is a greater opportunity and need to evaluate the research with additional nuance.
In using these criteria to assign ratings, we also acknowledge that psychometric properties are only attributable to scores within a particular test administration. Only if they have been widely replicated across populations and contexts can psychometric properties be safely considered static properties of the measure itself. We therefore recommend research studying the generalization validity of these rating scales within different demographic groups and settings (Hunsley & Mash, 2008; Youngstrom et al., 2017). While not a focus of this review, there are studies evaluating translations of these scales that offer some evidence of their generalization validity. For example, there are studies evaluating the CALIS-C/P in Spanish and Portuguese clinical and community samples (Marques et al., 2015; Orgilés et al., 2019; Orgilés et al., 2020; Pereira et al., 2015), the CSDS in a Swedish clinical sample (Soler et al., 2021), and the CAIS-P in a Japanese community sample (Okawa et al., 2023). Examining measurement invariance across different sample characteristics (e.g., gender, ethnicity, Pina et al., 2009) is another way to capture a measure’s generalization validity. One study we reviewed (Langley et al., 2014) tested measurement invariance by youth age and gender. We encourage additional such studies for all anxiety-related impairment scales. Beyond the criteria used herein, additional guidelines for conducting psychometric studies of impairment measures are available (Naglieri & McGoldrick, 2016). In our view, conducting additional psychometric research may be a more fruitful path than developing yet more measures that will only need to start from scratch with regard to their psychometric evaluations.
While our aim was to highlight and present data for anxiety-related impairment scales, anxiety evaluations are ideally a multi-method, multi-informant process. As such, the assessment of anxiety-related impairment should not stand on its own but should be interpreted in the context of these other clinical data. Beyond anxiety symptom rating scales and diagnostic interviews, measures of family functioning and impairment could be useful to gain a fuller picture/conceptualization of youth’s anxiety-related impairment and may also suggest treatment targets. This is because the level of impairment that youth experience is at least to some degree contingent on family members’ role (e.g., parents allowing a child to stay home from school). Behavioral and neurobiological approaches may provide a sense of anxiety-related impairment and offer important benefits (for reviews, see Byrne et al., 2018; Silverman & Ollendick, 2005). A nice example of this is assessing impairment and symptoms related to social anxiety by observing and rating youths’ ability to interact with confederates or complete speech tasks (e.g., Follet et al., 2023). Overall, though, the psychometric data base for these methods is less developed than symptom measures and diagnostic interviews. These methods also have more utility in research than clinical contexts. A future direction with potential for great clinical utility is to interface the current impairment measures with technology-based approaches (Holmes et al., 2018). Ecological momentary assessment, for example, may be an especially effective way for youth and their parents to report on anxiety-related impairment in real time. This is still a new area, however, and basic data capturing reliability and validity are needed.
Conclusion
This is the first article to summarize and preliminarily evaluate the psychometric data for youth anxiety-related impairment scales. Given the nascent state of the literature on these rating scales, it is encouraging that each measure has initial evidence of psychometric properties that earned good-to-excellent ratings. Notwithstanding limitations and recommendations for future research, there are some key points we wish to underscore. Synthesizing results from the current and prior reviews, we recommend the ADIS-C/P as a diagnostic interview, the SCARED-C/P as an anxiety symptom rating scale, and the CAIS-C/P as an anxiety-related impairment rating scale. This combination of measures would have utility for most clinical and research assessment purposes. We recommend the CAIS-C/P overall given that they (a) have the largest research base relative to other scales at this time, (b) assess the “key” domains of functioning (school, social, family/home) in separate subscales, and (c) earned our ratings of excellent for construct validity, discriminative validity (parent-report), and treatment sensitivity, for the subscale and total scale scores. Still, compared with youth- and parent-report anxiety symptom rating scales, there is much research needed to build the evidence base for youth anxiety-related impairment scales. We hope the current article serves as an impetus for this much-needed research.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by National Institutes of Health grants R01MH119299, R33MH115113, and R01DK117651. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
