Assessment of Depression in Adults and Youth

Abstract

This article selectively reviews the key issues and measures for the assessment of depressive disorders and symptoms in youth and adults. The first portion of the article addresses the nature and conceptualization of depression and some key issues that must be considered in its assessment. Next, the diagnostic interview and clinician- and self-administered rating scales that are most widely used to diagnose, screen for, and assess the severity of depression in adults and youth are selectively reviewed. In addition, the assessment of three transdiagnostic clinical features (anhedonia, irritability, and suicidality) that are frequently associated with both depression and other forms of psychopathology is discussed. The article concludes with some broad recommendations for assessing depression in research and clinical practice and suggestions for future research.

Keywords

assessment depression depressive disorders depressive symptoms review

Assessment of Depression

Depressive disorders are among the most common mental disorders (Kessler & Bromet, 2013) and are among the most burdensome disorders of any kind according to the World Health Organization’s metric of years lived with disability (James et al., 2018). Hence, depression is a top priority for clinical research, practice, and health care policy. Each of these domains requires reliable and valid assessment of depression.

This article focuses on the assessment of depressive disorders, particularly major depressive disorder (MDD) and persistent depressive disorder (PDD), from the perspective of the Diagnostic and Statistical Manual of Mental Disorders, 5th edition text revision (5th ed.; DSM-5TR; American Psychiatric Association [APA], 2022). The classification of depression in the International Classification of Disease, 11th edition (ICD-11; World Health Organization, 2019) is similar to the DSM-5TR in most respects, with analogous assessment-related issues and instruments.

In preparing this article, I first searched for publications in the past 10 years that provided broad reviews of the assessment of depression in adults and/or youth using Google Scholar. I systematically used combinations of terms such as “depression” and “mood disorders” with terms such as “assessment,” “measures,” “rating scales,” “questionnaires,” “inventories,” and “interviews.” I then scoured these sources for further references. Next, I used these sources to determine which issues and assessment instruments to include in the article. I then did another round of literature searches focusing on specific issues and instruments, with a particular emphasis on systematic reviews and meta-analyses. Where possible, I have relied on systematic reviews and meta-analyses in my appraisals and recommendations.

I begin by discussing the construct of depression and some key issues that should be considered in assessing depression. I will then review measures that are widely used to diagnose, screen for, and assess the severity of depression in adults and youth. I will also briefly discuss the assessment of some important transdiagnostic clinical features that are frequently associated with both depression and other forms of psychopathology. There are a vast number of measures to assess depression, and the literatures on many are quite extensive, so the review is selective, and I will not go into depth on any specific measure. Finally, I will make some recommendations for assessing depression in research and clinical practice and offer a few suggestions for future research.

The Construct of Depression

Descriptions of the phenomenology of depression prior to the advent of explicit diagnostic criteria include a number of symptoms that do not appear in the DSM-5TR criteria for depressive disorders, such as anxiety symptoms (e.g., worry, panic attacks), obsessions and compulsions, somatic symptoms (e.g., headaches, gastrointestinal problems), and depersonalization/derealization (Kendler, 2016). Beginning with the development of explicit diagnostic criteria in the early 1970s (Feighner et al., 1972) and continuing to the present day, the construct of depression was narrowed by eliminating symptoms that overlapped with anxiety and somatoform disorders to maximize its distinctiveness from other diagnoses (Kendler, 2016). As a result, the content of older depression rating scales often includes a number of symptoms that appear nonspecific from the vantage point of the DSM-5TR but were commonly viewed as part of the construct of depression prior to the Diagnostic and Statistical Manual of Mental Disorders (3rd ed.; DSM-III; APA, 1980). Conversely, symptoms such as hypersomnia and increased appetite/weight gain that were rarely noted in the older descriptive texts have been included in the diagnostic criteria and become part of the contemporary construct of depression (Kendler, 2016). Importantly, the subset of MDD symptoms included in DSM-5TR do not have greater empirical support than many of the symptoms that were not included (Fried, Epskamp, et al., 2016; McGlinchey et al., 2006).

The Feighner and DSM criteria specify the minimum number and duration of symptoms to qualify for diagnoses. The Feighner criteria required a minimum duration of 4 weeks for an episode of MDD. This criterion was reduced to 2 weeks in the DSM-III to allow for more rapid diagnosis of emergent episodes. However, it also broadened the diagnosis, likely increasing the number of cases with transient and relatively benign periods of dysphoria (Klein, in press; Wakefield & Schmitz, 2014). Notably, the symptom and duration thresholds are not based on data—rather they were established by consensus of a group of experts. Therefore, they should be considered “meaningful but arbitrary” (Turkheimer, 2017) and may or may not be optimal.

An important implication is that there is a distinction between diagnostic constructs and the criteria that are used to define them. The DSM-5 criteria index, but do not constitute, diagnostic constructs (Kendler, 2017). This distinction has important implications for assessment. If the content of measures is tied too closely to the DSM criteria, it is not possible to test whether other indicators are better, forfeiting the opportunity to strengthen the validity of both the criteria and measures as well as the underlying construct. Thus, in developing measures of depression, it is best to use a broad item pool that includes symptoms that are not in the DSM criteria (cf. Loevinger, 1957) and information on duration and prior course should be fine-grained enough to empirically determine optimal criteria and cutoffs (Klein, in press).

The categorical nature of DSM diagnoses implies that they are discrete entities. However, a variety of lines of research indicate that depression shades into nonpathological emotional states and individual differences (Ruscio, 2019). These findings indicate that depression is better conceptualized dimensionally, as falling along a continuum of severity. Indeed, individuals with subthreshold levels of depression can also experience significant functional impairment and are at increased risk for developing major depressive episodes compared with those with few or no depressive symptoms (Lee et al., 2019). However, even if our categorical diagnoses do not “carve nature at its joints,” they are a convenient and efficient to way to communicate information about quantitative phenotypes if the cut-offs imposed on the underlying dimensions are associated with clinically meaningful correlates and outcomes. An implication is that measures assessing depression should provide continuous scores but also have meaningful, if arbitrary, cut-points.

If depression is a dimensional phenomenon, it raises the question of whether depression is comprised of a single or multiple dimensions. Most depression scales sum all items to create a total score. If a scale has a multidimensional, rather than unidimensional, structure, total scores will reflect different contributions from each dimension, clouding interpretation (Fried et al., 2022).

Factor analytic studies of depression rating scales have yielded inconsistent findings regarding the number and nature of dimensions that best represent their items (Brouwer et al., 2013; Fried et al., 2022; Watson & O’Hara, 2017). However, careful scale development can produce depression measures with a coherent general scale that can be broken down into a number of highly correlated subscales (e.g., Watson & O’Hara, 2017). This suggests that depression can be viewed at several levels of resolution.

Most depression measures focus on assessing symptoms. However, the course of depression is at least as important for prognosis (e.g., episode duration and recurrence) and treatment planning (i.e., type, intensity, and duration of treatment; Klein, in press). Course can also be conceptualized dimensionally (Klein, 2008). For example, Pettit et al. (2009) identified a factor consisting of age of onset, and number and duration of episodes that had good predictive validity in an 11-year follow-up.

Although it is not a focus of this article, it is important to acknowledge that several alternative approaches to classification have emerged since the publication of DSM-5. The Hierarchical Taxonomy of Psychopathology (HiTOP; Kotov et al., 2017) includes a number of dimensions, derived via factor analyses of disorders, symptoms, and traits, that are arranged in a hierarchical fashion. Higher-order factors reflect shared variance (analogous to comorbidity) between lower-order factors. One of the HiTOP’s greatest strengths is that psychopathology can be conceptualized at multiple levels. However, the level that is most useful for any particular purpose remains to be determined and is an important research priority.

In HiTOP, the depressive disorders fall within the internalizing spectrum, and at the next level, the distress subfactor, which also includes generalized anxiety disorder (which is reminiscent of the broader pre-DSM-III conceptualization of depression), as well as aspects of posttraumatic stress disorder and borderline personality disorder. The level beneath subfactors consists of syndromes, which have not yet been well-defined. However, preliminary work suggests that the syndromes break down into depression and the specific anxiety disorders, although depression may further split into separate cognitive and vegetative syndromes (Waszczuk et al., 2017). HiTOP is still a work in progress, and measures designed to assess its constructs are currently being developed (Simms et al., 2022).

The Research Domains Criteria (RDoC; Kozak & Cuthbert, 2016) focuses on a core set of psychobiological domains, each with multiple constructs, whose neural circuitry has been at least partially delineated and which are relevant to psychopathology. These domains can be observed across species and assessed in a dimensional manner across multiple units of analysis (e.g., molecules, neural circuits, physiology, behavior, and self-report). RDoC intentionally does not address clinical phenotypes—for example, depression does not appear in this system. Phenotypes are to be discovered inductively by identifying the clinical manifestations associated with dysfunction in one or more core domains or constructs. However, certain domains and constructs (e.g., Loss in the Negative Valence Systems domain, and Reward Responsiveness in the Positive Valence Systems domain) are especially relevant to depression-related psychopathology. The RDoC website offers suggestions for measures reflecting these constructs across levels of analysis: https://www.nimh.nih.gov/research/research-funded-by-nimh/rdoc.

Both categorical (e.g., DSM-5TR) and dimensional (e.g., HiTOP) approaches to classification assume that psychopathology should be conceptualized as latent constructs, and that symptoms are fallible indicators of these constructs. However, another perspective, network theory (Borsboom, 2017), rejects this premise. Rather than assuming the latent construct of depression causes symptoms to manifest, this perspective posits that particular symptoms cause other symptoms, and that depression is an emergent phenomenon produced by a network of causal associations between symptoms (as well as nonsymptom influences). Network theory poses a major challenge to traditional approaches to assessment. Rather than assuming that symptoms are interchangeable and determining if the required number of symptoms is present to qualify for a diagnosis or summing symptom items to create a total severity score, the network perspective suggests that each symptom should be examined individually and the pattern of interrelationships between symptoms examined using between- or within-subjects designs (Fried et al., 2022). However, a common problem with network research is that it measures each symptom with a single item. Single items are notoriously unreliable. To measure individual symptoms reliably, it will be necessary to develop multi-item scales for each symptom or treat each symptom as a latent construct with multiple indicators.

Considerations in Assessing Depression

Development

The DSM-5TR criteria for MDD and PDD are identical in children and adolescents and in adults with two exceptions. First, irritability can substitute for depressed mood in MDD (but, curiously, not PDD) in youth. Irritability is a common, but relatively nonspecific, symptom of depression across the lifespan. However, as youth may have more difficulty reporting their subjective emotional states, the mood criterion was broadened for this group. Second, the duration required for PDD is 1 year in youth, rather than 2 years for adults. Presumably, this difference is because a year is a much larger proportion of the lifetime of a child or adolescent than an adult. However, these differences have not received much empirical scrutiny.

Although research in this area is surprisingly limited, it appears that depressive disorders are generally characterized by similar symptoms throughout the lifespan (Klein et al., 2017). However, the loadings of depressive symptoms on a latent depression factor increase after age 12, suggesting greater coherence of the construct after the transition to adolescence (Morken et al., 2021). Genetic influences are also weaker in prepubertal than postpubertal depression, and the stability (or homotypic continuity) of depression increases after age 12 (Klein et al., 2017; Morken et al., 2021).

Older adults frequently have significant, often multiple, medical conditions. As a result, it can be challenging to distinguish depressive symptoms (e.g., fatigue, insomnia) from the effects of general medical disorders. For most older adults, depressive episodes are recurrences of an earlier onset condition. However, in a subgroup, particularly those with no prior history, depression may be secondary to cardiovascular disease. The putative subtype of “vascular depression” is characterized by a late-life onset, a negative family history of depression, apathy, poor executive functioning on neuropsychological tests, a history of hypertension, nonspecific findings of white matter hyperintensities in magnetic resonance imaging, a poor response to antidepressant medication, and an increased risk of developing dementia (Steffens, 2019). Thus, even though depression may be phenotypically similar across development, causal processes, mechanisms, and challenges in assessment differ across the lifespan.

Data Source

Most measures for assessing depression rely on clinicians’ evaluations of depressed individuals’ reports and behavior (e.g., semistructured diagnostic interviews and clinician rating scales) or individuals’ reports of their symptoms (self-rating scales). Self-ratings are highly economical, as they do not require clinicians to conduct the assessment. Clinicians’ evaluations have the advantage that raters can elaborate on questions, elicit examples, and probe inconsistencies to ensure that respondents understand what is being asked and to evaluate the clinical significance of their responses. In addition, clinicians can utilize observations of respondents’ behavior (which is particularly important for assessing psychomotor retardation or agitation) and obtain information that can be difficult to elicit with a highly structured format like a self-report questionnaire, such as past history and course of depression and suicide risk.

Agreement between clinician and self-rating scales is moderate, but higher when similar formats are employed. Patients tend to rate themselves as being more severely depressed than clinicians. However, both sources of information contribute unique variance in predicting outcomes, even when identical scales are used (Dozois et al., 2020; Uher et al., 2012).

Informants’ reports are often useful, especially for youth and individuals who are psychotic or otherwise limited in their ability to provide valid information. Reports of informants, such as parents and teachers, are especially important for younger children, whose cognitive processes and language abilities are less developed than older youth and adults. Younger children have particular difficulty reporting on temporal characteristics; therefore informants must be relied on for information on the onset and duration of symptoms and previous episodes (Dougherty et al., 2018). Nonetheless, it is still important to obtain children’s reports because informants may not be aware of the child’s feelings and thoughts. Indeed, children report higher levels of depressive symptoms than their parents and teachers (Jensen et al., 1999; Makol & Polo, 2018).

Informants are less essential for adolescents, who are more reliable reporters than children (Edelbrock et al., 1985), and because parents are less involved in the day-to-day lives of adolescents. However, informants can be useful in reporting on adolescents’ externalizing problems, which teens may minimize (Dougherty et al., 2018).

Although obtaining data from multiple sources is optimal, it is complicated by the fact that agreement between informants is only fair to moderate (Achenbach et al., 2005; De Los Reyes et al., 2015). Informants tend to agree more when they have the same relationship to the target, observe the target in the same context, and for observable behaviors (e.g., externalizing, as opposed to internalizing, symptoms). However, informant discrepancies can provide meaningful information, such as the situational specificity of the target’s behavior (De Los Reyes et al., 2015). Moreover, youth, parent, teacher, and clinician ratings each account for unique variance in predicting outcomes (Cohen et al., 2019; Ferdinand et al., 2003; Verhulst et al., 1997).

Due to the modest agreement between sources, clinicians and researchers often seek to integrate the conflicting information. Approaches to integrating data from multiple sources include rating the feature or diagnosis as present if any source reports it or using only the variance shared by sources by treating each of their reports as indicators of a latent construct and considering nonshared variance as measurement error. The approach that most closely mirrors clinical practice is the “best-estimate” procedure, in which the assessor uses their clinical judgment to evaluate the credibility of each source’s report and weighs them accordingly in reaching a decision (Klein et al., 1994).

However, as suggested above, discrepancies between sources can be meaningful, as they may reflect context-specific behavior (De Los Reyes et al., 2013). Thus, it can be informative to examine each informant’s data separately. De Los Reyes et al.’s (2013) Operations Triad Model provides a means of determining whether informant discrepancies reflect situation-specific behavior or measurement error.

Populations and Contexts

The assessment of depressive disorders is relevant for a wide variety of populations and contexts—inpatient and outpatient mental health facilities, primary and specialty medical care settings, schools, forensic contexts, and the community (e.g., population screening, epidemiological research). The severity of depression can differ across populations and contexts (e.g., milder in community and more severe in mental health settings). Measures (and items on the same measure) often vary in their sensitivity to the severity level of depression. Item–response theory methods can provide estimates of where on the distribution of a latent trait each measure or item provides the most information. For example, Olino et al. (2012) found that in a community sample of older adolescents, the Center for Epidemiological Studies Depression Scale (CES-D; Radloff, 1977) provided the most information at lower levels of depression, while the Beck Depression Inventory (BDI; Beck et al., 1988) provided more information at higher levels of depression. Thus, the CES-D may be more useful for epidemiological studies, while the BDI may be better-suited for clinical samples. Ideally, measures should include items that are sensitive to different levels of severity so that together they assess the full range of severity of the construct.

An important question is whether the underlying latent construct assessed by a measure is comparable across groups, such as individuals of different genders, sexual orientations, races, ethnicities, socioeconomic positions, cultures, and geographic regions. This issue can be examined by using confirmatory factor analysis to test for measurement invariance (MI). If an instrument is examined in two groups and found to lack invariance, then it cannot be assumed to measure the same construct in each group. MI of depression measures has primarily been examined for self-rated scales. This literature indicates that depression measures are often invariant across a variety of populations (e.g., Lindley & Bauerband, in press; Mellick et al., 2019; Olino, 2020; Patel et al., 2019; but see Fried et al., 2022 for some exceptions). Future work should test MI in diagnostic interviews and clinician rating scales.

However, MI cannot determine whether existing measures omit symptoms that are important in characterizing depression in some groups but not others. For example, it has been posited that in males, depression can take the form of externalizing behaviors such as anger, aggression, risk-taking, and substance misuse that are not included in most depression measures, and that this omission accounts for the higher prevalence of depression in females (Martin et al., 2013). Similarly, there is evidence that depression is experienced as musculoskeletal pain, headaches, anger, and/or loneliness in some cultures (Haroz et al., 2017), raising questions about whether the measures developed by North American and European investigators and used in most cross-cultural research capture culture-specific idioms of distress that are analogous to the Western construct of depression (Kirmayer et al., 2022). The generalizability of Western conceptualizations and measures of depression to other contexts and cultures should be a priority for future research. In addition, there is a need for investigators in non-Western cultures to study how depression-related symptoms are conceptualized and assessed in communities that live in vastly different contexts.

Measures

Aims of Assessment

The major aims of clinical assessment include screening, diagnosis and prognosis, case conceptualization and treatment planning, and treatment monitoring and evaluation (see Dougherty et al., 2018 for a more extensive discussion). In screening, large numbers of individuals are assessed for early identification and intervention or inclusion in research. To minimize costs and burden, brief self-rating scales are typically utilized. Individuals scoring above the case threshold should then receive more in-depth assessment.

Diagnosis and prognosis involve assessing diagnostic criteria for depressive disorders and exclusionary diagnoses (e.g., bipolar disorder), severity of symptoms (including the presence of symptoms with particular significance for clinical decision-making such as suicidality and psychotic symptoms), prior history and course of depression, comorbid mental and general medical disorders, and treatment history. Diagnostic interviews, supplemented by clinician or self-rating scales, are generally the primary means of collecting this information.

These data are also required for case conceptualization and treatment planning. In addition, a comprehensive assessment of personal, interpersonal, and systemic factors is necessary to provide clues to the development and maintenance of symptoms and functional impairment and to determine the focus of treatment. I will not review the many constructs and measures that can be included in such a comprehensive assessment, but for helpful discussions see Dougherty et al. (2018) and Persons et al. (2018).

Treatment monitoring and evaluation involves systematically assessing the degree of change in target symptoms and impairments to determine whether treatment should be continued, intensified, augmented, changed, or terminated. Clinician and self-rating scales are typically used for this purpose. I will focus on measures of depressive symptoms; for assessing functioning see Dougherty et al. (2018), Persons et al. (2018), and Sheehan (2022). There is considerable evidence that administering rating scales throughout the course of treatment (i.e., measurement-based care) enhances patient outcomes (D’Avanzato & Zimmerman, 2017; Tasca et al., 2019).

Diagnostic Interviews

Structured diagnostic interviews were developed because of the limited interrater reliability of standard (i.e., unstructured) clinical interviews (e.g., Regier et al., 2013). Structured diagnostic interviews can be semistructured, where a trained interviewer (optimally a clinician, but often a trained research assistant) follows a set format but can ask follow-up questions, elicit examples, resolve inconsistencies, and observe the respondent’s behavior, and is ultimately tasked with rating items based on their best judgment. In contrast, in fully structured interviews, the interviewer adheres to a script and accepts the interviewee’s responses at face value. Fully structured interviews (e.g., the Composite International Diagnostic Interview; Kessler & Üstün, 2004) are primarily used in epidemiological research where costs preclude using clinically trained interviewers to assess large samples. This review is limited to semistructured interviews.

If interviewers are properly trained and supervised, the reliability of semistructured appears to be substantially higher than unstructured interviews (e.g., Williams et al., 1992). In addition, semistructured interviews yield a greater number of diagnoses, suggesting that unstructured interviews overlook clinically significant psychopathology, and are more accurate compared with a gold-standard assessment procedure (Basco et al., 2000). Semistructured diagnostic interviews are widely used in research, but rarely employed in clinical practice, most likely because they take longer to administer than unstructured interviews and clinicians may not be reimbursed for the additional time. However, many semistructured interviews have a modular format, so clinicians can choose the modules that are most relevant for each patient.

Semistructured interviews typically assess the criteria necessary for diagnosing depression and the most common comorbid disorders. Thus, they are designed to yield categorical diagnoses, which, although clinically useful, have lower reliability and statistical power than dimensional measures (Markon et al., 2011). It is generally possible to sum symptom ratings to produce dimensional scores. However, due to the skippout structure of most diagnostic interviews, if the “gate” questions for a disorder (e.g., depressed mood and loss of interest/pleasure) are absent, the remaining symptoms are not rated, potentially underestimating scores of individuals without the core symptoms. However, it is reasonable to question whether accessory symptoms are really reflections of the disorder if there is no evidence of the core symptoms. That is, should sleep or concentration problems in the absence of depressed mood and anhedonia “count” toward a depression score, or are they better viewed as indicators of other disorders or problems? Nonetheless, investigators are beginning to develop interviews with minimal skippouts to provide both categorical and dimensional assessments (e.g., Shankman et al., 2018).

A second limitation of most semistructured interviews is that their assessment of the course of psychopathology is cursory, despite its implications for prognosis and decisions regarding intensity and duration of treatment. To assess course in a more detailed fashion, it is very helpful to work collaboratively with the respondent to complete a timeline mapping the trajectory of the disorder since onset (e.g., see the procedure described by McCullough et al., 2016).

The most widely used semistructured diagnostic interview for adults is the Structured Clinical Interview for DSM-5 (First et al., 2015). The SCID was originally designed for researchers and the research version (SCID-5-RV) is the most widely used and best-studied version. In addition, there is a briefer version of the SCID for clinicians (SCID-5-CV), a version for assessing inclusion/exclusion criteria for clinical trials (SCID-5-CTV), and an even briefer version designed to be completed in 30 min or less (QuickSCID-5). Finally, there are separate SCID interviews for DSM-5 personality disorders (SCID-5-PD) and the DSM-5 alternative model for personality disorders (SCID-5-AMPD).

The research version of the SCID was designed for clinically trained interviewers. If additional sources of information (e.g., informants, case records) are available, the interviewer can take it into consideration in their ratings. The SCID begins with a screening module with gate questions for most disorders, the responses to which determine which modules for specific disorders will be administered subsequently. There are variants of the interview designed for patients unlikely to have psychotic disorders and for nonpatients. The DSM-5 research version of the SCID added new modules for several childhood disorders that can persist into adulthood (e.g., attention deficit hyperactivity disorder, separation anxiety disorder)—an important feature that has been overlooked for too long.

Administration time for the SCID-5-RV version ranges from 30 to 120 min depending on the breadth of the respondent’s psychopathology and the respondent’s and interviewer’s interview styles. Interrater reliability for depressive disorders is moderate to substantial (Williams et al., 1992; Zanarini et al., 2000). As a considerable portion of the research on depression in adults has used the SCID to establish diagnoses, the convergent, discriminant, and construct validity of the interview are inextricably intertwined with that of the diagnostic construct (see Klein, in press for a more extensive discussion).

The most widely used semistructured diagnostic interview for children and adolescents (6–18 years) is the Kiddie Schedule for the Affective Disorders and Schizophrenia (K-SADS; Townsend et al., 2020). There are many modified versions (Ambrosini, 2000), hence it cannot be assumed that all studies using the K-SADS have used the same interview. Like the SCID, there is now a streamlined version for clinical use (Townsend et al., 2020).

The K-SADS was designed to be used by clinically trained interviewers. It is administered separately to the youth and a parent. Differences between informants should be reconciled and final ratings are derived using a best-estimate approach. Similar to the SCID, the initial module consists of screening questions that determine the disorder-specific modules to be administered subsequently. Administration times for the parent and child interviews (research version) range from 30 min to 2.5 hr each, depending on the breadth of the child’s psychopathology and the styles of the respondents and interviewer. However, the time required for administration has been considerably reduced in a new computerized version (Townsend et al., 2020). Interrater reliability ranges from adequate to excellent for depressive disorders, and numerous sources of evidence support the K-SADS’ convergent and construct validity (Dougherty et al., 2018; Townsend et al., 2020).

The MINI International Neuropsychiatric Interview has versions for adults (MINI; Sheehan et al., 1997) and children (MINI-KID; Sheehan et al., 2010). Whereas the SCID and K-SADS assess both current and lifetime diagnoses, the MINI is limited to current diagnoses for most disorders. Compared with the SCD, the MINI takes about half the time to administer but provides less information. Both the MINI and MINI-KID show good interrater reliability and convergent validity (Sheehan, 2022).

Rating Scales

Clinician- and self-administered rating scales are useful for assessing severity and response to treatment. Typically, scores reflect the number and intensity of symptoms experienced during the previous 1 to 2 weeks. Clinician rating scales are interviews focusing on a circumscribed area (e.g., depression symptoms). They include a number of items that are rated by the interviewer, and often, but not always include a set of required or suggested questions and probes. Self-rating scales cover similar content; however, respondents directly report on their own symptoms (self-report) or informants (e.g., parents, teachers) report on another individual’s symptoms. Unlike diagnostic interviews, clinician and self-administered rating scales do not collect sufficient information to make diagnoses (e.g., duration and exclusion criteria are not assessed) or obtain information on development and course and comorbid conditions.

There are a number of issues in considering rating scales. First, rating scale scores are generally interpreted as indices of severity, but the construct of severity is contentious and poorly defined (Zimmerman et al., 2018). For example, should it refer only to symptoms, and if so, should they all be weighted equally or are some symptoms more indicative of severity than others (e.g., suicidality) and should therefore be weighted more heavily (Fried et al., 2022)? In addition, should other factors, such as psychosocial functioning, be included (Zimmerman et al., 2018)? The rating scales reviewed below focus solely on symptoms and sum all items weighting them equally.¹ However, this issue is an important area for future research.

Second, if a rating scale is used to monitor treatment response, it should be sensitive to change—a property that has been demonstrated for most widely used depression rating scales (Sheehan, 2022). In principle, the scale should also demonstrate MI over time (i.e., the structural properties of the measures should be comparable across repeated assessments). However, many widely used clinician and self-rating scales fail to show MI across repeated assessments, probably due restriction of range and regression to the mean (Fried, van Borkulo, et al., 2016). Unfortunately, this problem may be unavoidable in many contexts in which rating scales are needed (e.g., most patients seeking treatment will have elevated scores, and high scores are generally inclusion criteria for clinical trials). In addition, some studies in unselected community samples have reported attenuation effects in which scores diminish simply as a function of repeated assessments, making it difficult to distinguish the effects of treatment and time from methodological artifact (Dougherty et al., 2018; Dozois et al., 2020). These problems are priority areas for future conceptual and empirical work.

Finally, a number of different conventions and cutoffs have been applied to rating scales to define clinically important concepts such as response, remission, relapse, recovery, and recurrence (Sheehan, 2022). Unfortunately, empirical support for these definitions is often absent or even negative (de Zwart et al., 2019).

Clinician-Administered Rating Scales

The three most widely used clinician rating scales for depression in adults are the Hamilton Rating Scale for Depression (HAM-D; Hamilton, 1960), the Montgomery- Åsberg Depression Rating Scale (MADRS; Montgomery & Åsberg, 1979), and the Inventory of Depressive Symptomatology—Clinician version (IDS-C; Rush et al., 1996) or its briefer version, the Quick IDS-C (QIDS-C; Rush et al., 2003). The HAM-D is the most widely used clinician rating scale and takes 20 to 30 min to administer. It predates the development of explicit diagnostic criteria and includes a number of symptoms that are not DSM MDD criteria due to their nonspecificity. At the same time, the original and some subsequent versions of the HAM-D did not assess a number of DSM MDD criteria. The original version included 17 items. Most items focused on behavioral, somatic, and vegetative symptoms with fewer items reflecting cognitive and affective symptoms. In addition, the scale included four diagnostically nonspecific items (e.g., hypochondriasis, loss of insight) that were not intended to but are often counted toward the total score, resulting in 21 items. Factor analytic studies of the HAM-D have not produced consistent results.

There are also a number of alternative versions of the HAM-D (for reviews see Carrozzino et al., 2020; Dozois et al., 2020; Sheehan, 2022). Some of these versions have expanded coverage of depressive symptoms, including a 24-item HAM-D that adds three cognitive symptoms and a 29-item version that includes reverse vegetative symptoms (e.g., increased appetite/weight and sleep). In addition, some investigators have developed short versions of 6 to 8 items.

The original HAM-D consisted of rating scales for each item but did not have suggested questions or probes, which limited interrater reliability. However, a number of semistructured interview versions of the HAM-D have been developed and show greater interrater reliability than unstructured versions (e.g., Williams, 1988; Williams et al., 2008). The HAM-D has a number of conceptual and psychometric limitations, including uneven coverage of depressive symptoms, a varying number of response options, lack of interval consistency across items, poor interrater reliability of some items, and items that fail to load on a latent severity dimension in item-response theory analyses (Bagby et al., 2004; Dozois et al., 2020). However, many of these problems have been rectified in more recent versions of the scale (Carrozzino et al., 2020).

The MADRS is a 10-item scale that was developed empirically to maximize sensitivity to the effects of antidepressant medication and is widely used in clinical trials (Sheehan, 2022). It takes approximately 10 min to administer. The content of the MADRS only partially overlaps with DSM-5 MDD criteria. It exhibits good psychometric properties (Dozois et al., 2020; Sheehan, 2022). However, some studies have found that the MADRS is less sensitive to change than some versions of the HAM-D (Carrozzino et al., 2020). Like the HAM-D, it does not have a standard set of probes, which hampers interrater reliability. However, a semistructured interview version has been developed (Williams & Kobak, 2008).

The IDS-C was developed to provide more comprehensive coverage of depression symptoms than the HAM-D and MADRS. It has a semistructured interview format and 30 items. The IDS-C has been extensively evaluated, and there is substantial support for its interrater reliability and convergent validity (Rush et al., 1996). The QIDS-C includes 16 items covering the DSM MDD criteria; it retains the IDS-C’s favorable psychometric properties (Rush et al., 2003).

The most widely used clinician rating scale for youth is the 17-item Children’s Depression Rating Scale—Revised (CDRS-R; Poznanski et al., 1979; Poznanski & Mokros, 1999). The CDRS was developed to assess the severity of depression in children aged 6–12 years. It is administered separately to the child and an adult informant, with the clinician subsequently integrating the data using clinical judgment. Its psychometric properties are generally favorable (Dougherty et al., 2018). The CDRS is often used with adolescents as well, although evidence for its reliability and validity in teens is lacking (Stallwood et al., 2021).

Self-Administered Rating Scales

There are numerous self-rated scales for depression in adults and youth, a number of which have been shown to possess strong psychometric properties in a variety of samples and most of which are highly intercorrelated (see Dozois et al., 2020; Fried et al., 2022). Given space constraints, I will focus on the four scales for adults and two scales for children that are most widely used. Other frequently used scales that are not discussed here include the CES-D (Radloff, 1977) and the Depression Anxiety Stress Scales (Lovibond & Lovibond, 1995), both of which provide only limited coverage of the range of depression symptoms. I also will not discuss self-rated depression scales that were designed for specific contexts and populations, such as the postpartum period (see Sultan et al., 2022 for a review) and older adults (see Balsamo et al., 2018 for a review). Finally, I will not discuss omnibus rating scales designed to assess a much broader range of symptoms and impairments (e.g., Achenbach et al., 2017; Kraus et al., 2005). Although these measures capture a great deal of information in an efficient manner, their value for assessing depression specifically is less clear.

Currently, some of the most widely used and best-studied self-rated scales for adults are the Beck Depression Inventory, second edition (BDI-II; Beck et al., 1996), the self-report versions of the IDS (IDS-SR: Rush et al., 1996) and QIDS (QIDS-SR; Rush et al., 2003), the Patient Health Questionnaire-9 (PHQ-9; Kroenke et al., 2001), and the expanded version of the Inventory of Depression and Anxiety Symptoms (IDAS-II; Watson et al., 2012).

The BDI-II (Beck et al., 1996) is a 21-item self-rating scale that covers most DSM MDD symptoms. It takes 5 to 10 min to complete and has good internal consistency, test–retest reliability, and convergent validity and acceptable discriminant validity; however evidence for its sensitivity to change is mixed (D’Avanzato & Zimmerman, 2017; Dozois et al., 2020; Sheehan, 2022; Wang & Gorenstein, 2013).

The IDS-SR and QIDS-SR include the same items as their respective clinician-rated versions, which facilitates comparison and integration of data from both sources. Like the clinician-rated versions, the self-report versions have good psychometric properties (Rush et al., 1996, 2003).

The PHQ-9 (Kroenke et al., 2001) has nine items, each reflecting a DSM MDD symptom criterion. There is also a two-item version (PHQ-2) tapping just depressed mood and loss of interest or pleasure. The PHQ-9’s brevity is a significant advantage, but it provides less information and may be less sensitive to change than other self-rated depression scales (D’Avanzato & Zimmerman, 2017; Sheehan, 2022). The PHQ-9 and PHQ-2 are the most widely used measures for screening in psychiatric and general medical samples. Systematic reviews and meta-analyses indicate that both exhibit moderate-good accuracy in identifying cases of MDD (Costantini et al., 2021; Negeri et al., 2021), but can produce a number of false positives compared to semi-structured interview diagnoses (Levis et al., 2020). Hence, positive cases should be followed with further evaluation.

The IDAS-II (Watson et al., 2012) is unique among the measures reviewed here due to its multilevel, transdiagnostic design. It includes 99 items reflecting 18 factor-analytically derived dimensions of internalizing symptoms (depression, mania, anxiety disorders, and posttraumatic stress and obsessive-compulsive disorders). The 18 scales can be reduced to three higher-order dimensions: distress, fear/obsessions, and positive mood. In addition, a 20-item General Depression scale can be extracted that covers most DSM MDD symptoms. There is extensive evidence supporting the IDAS-II’s internal consistency and structural, convergent, discriminant, predictive, and construct validity (e.g., Watson et al., 2012; Watson & O’Hara, 2017). However, information on its sensitivity to change is limited, making it more useful for characterizing the nature of psychopathology than monitoring and evaluating treatment.

Two of the most frequently used self-rated scales for children and adolescents are the Children’s Depression Inventory, 2nd edition (CDI-2; Kovacs, 2011) and the Mood and Feelings Questionnaire (MFQ; Costello & Angold, 1988). Both the CDI-2 and MFQ have child self-report and parent-report versions, and the CDI-2 also has a teacher-report version.

The CDI-2 is designed for youth ages 7 to 17 and includes 28 items covering a broad range of depression and associated symptoms. There is also a 12-item short form. In addition, the CDI-2 has a 17-item version for parents and a 12-item version for teachers to complete about the child. The differing number of items for the child, parent, and teacher forms reflects each informant’s unique perspectives. However, along with differences in rating formats, this makes it more challenging to compare results across informants. The CDI-2 exhibits good internal consistency, test–retest stability, and convergent and construct validity, acceptable discriminant validity, and is sensitive to change (Dougherty et al., 2018; Kovacs, 2011).

The MFQ (Costello & Angold, 1988) assesses depression in 8- to 18-year-olds. It consists of 33 items covering the DSM MDD criteria and associated symptoms. There is also a 34-item version for parents and 13-item short forms for children and parents (SMFQ; Angold et al., 1995). Unlike the CDI, the MFQ and SMFQ child and parent versions include similar items facilitating comparisons between informants; however, there is no teacher version. The MFQ has been shown to have excellent internal consistency, adequate-good test–retest stability, and good convergent and discriminant validity (Dougherty et al., 2018; Jarbin et al., 2020). The SMFQ is often used for screening in community and clinical samples, with accuracy varying by age, gender and population type (Jarbin et al., 2020).

Transdiagnostic Features

As exemplified by the transdiagnostic structure of the IDAS-II (Watson & O’Hara, 2017), most depression symptoms are also evident in other disorders. In this section, I briefly discuss the assessment of three selected transdiagnostic features: anhedonia, irritability, and suicidality. Although each of these features is assessed by one or more items in most of the diagnostic interviews and rating scales discussed above, for more reliable and detailed information, measures focusing specifically on these constructs are required. In addition to the measures described below, a number of the IDAS-II subscales may be used to assess transdiagnostic constructs (e.g., dysphoria, insomnia, suicidality, and ill-temper).

Anhedonia is a cardinal symptom of depression and has been shown to predict poor response to antidepressant pharmacotherapy (Rizvi et al., 2016; Sheehan, 2022). However, it is also characteristic of other disorders, particularly schizophrenia spectrum disorders. Anhedonia is thought to reflect dysfunction of neural reward circuitry, and can be decomposed into a number of aspects including deficits in anticipatory and consummatory pleasure (Khazanov et al., 2020; Rizvi et al., 2016). Most measures of anhedonia use a self-rating format, but there are also a number of behavioral and neural paradigms assessing various aspects of reward dysfunction (Khazanov et al., 2020; Rizvi et al., 2016). The most widely used self-rating measure is the Snaith-Hamilton Pleasure Scale (Snaith et al., 1995), which focuses on consummatory pleasure deficits. The self-rated Temporal Experience of Pleasure Scale (Gard et al., 2006) is also widely used, and assesses both anticipatory and consummatory pleasure.

The experience of pleasure may differ in different contexts (e.g., food/taste, physical or perceptual sensations, social interactions; but see Khazanov et al., 2020 for contrary evidence). Two measures that focus specifically on anhedonia in social contexts are the Revised Social Anhedonia Scale (RSAS; Mishlove & Chapman, 1985) and the Anticipatory and Consummatory Interpersonal Pleasure Scale (ACIPS; Gooding & Pflum, 2014) scales. The RSAS is most frequently used in research on schizophrenia spectrum disorders and focuses on anticipatory pleasure. The ACIPS is used in research on a variety of disorders, covers both anticipatory and consummatory pleasure, and has child, adolescent, and adult versions.

In the DSM, irritability is a cardinal symptom of depression in youth and is common in depressed adults (Sheehan, 2022). However, it is highly transdiagnostic in that irritability or closely related constructs are included in the DSM criteria for bipolar, anxiety, impulse-control, stress-related, and personality disorders (Klein et al., 2021). Recently, investigators have shown that irritability includes two separable, but highly correlated dimensions: tonic, referring to irritable, grouchy mood and being easily annoyed, and phasic, referring to temper outbursts. Studies of children and adolescents have found that tonic irritability uniquely predicts internalizing disorders and phasic irritability uniquely predicts externalizing disorders (Klein et al., 2021).

A number of measures have been developed to assess irritability, or closely related constructs such as anger, in children (Althoff & Ametti, 2021) and adults (Saatchi et al., 2023; Toohey & DiGiuseppe, 2017). One of the most commonly used measures is the Affective Reactivity Index (Stringaris et al., 2012), which was initially developed as a self-rating scale for children and adolescents, with self- and parent-forms, but has been extended to adults (Mulraney et al., 2014), and is available as a clinician rating scale (Haller et al., 2020). Existing measures do not distinguish tonic from phasic irritability, so there is a need for further scale development in this area (Klein et al., 2021).

Depression is arguably the leading risk factor for suicide and suicide attempts, but suicide rates are also elevated in most other mental disorders. Although suicide is extremely difficult to predict due to its low prevalence (Franklin et al., 2017), assessment of the spectrum of suicidal ideation and behavior is critical for treatment planning and monitoring. Numerous measures of suicidal ideation and behavior for youth and adults have been developed (for reviews see Carter et al., 2019; Runeson et al., 2017; Thom et al., 2020).

Two of the most frequently used measures are the Columbia Suicide Severity Rating Scale (C-SSRS; Posner et al., 2011) and the Scale for Suicide Ideation (SSI; Beck et al., 1997). The C-SSRS has clinician and self-rating versions, as well as a computer-automated version using interactive voice technology. It includes subscales for severity of ideation, intensity of ideation, behavior, and lethality. The SSI is also available in clinician- and self-administered formats. More recently, there has been growing interest in using ecological momentary assessment (Sedano-Capdevila et al., 2021) and other approaches to ambulatory assessment, such as physiology and physical and geospatial activity (Kleiman et al., 2021) to assess suicidality and suicide risk, although this work is still in its infancy.

Recommendations

In this section, I make some broad recommendations of measures for adults and youth for the various assessment goals. These recommendations draw on traditional aspects of reliability (internal consistency, test–retest stability, and interrater reliability, as relevant) and validity (convergent, discriminant, construct, and as relevant, sensitivity to change). I also took adequacy of coverage of the domain (or content validity) into account. Where appropriate and data were available, I considered factor structure and MI. Finally, I looked for data on incremental validity and treatment utility and took into account the few such data that are available. However, it is important to note that the psychometric properties of measures can vary as a function of the population being assessed. Hence it is important to consult the literature for data on the measure in the target population of interest.

For diagnosis and prognosis, I recommend using a semistructured diagnostic interview (SCID-5 for adults; K-SADS for youth and a parent), supplemented by a timeline to elucidate the course of depression (McCullough et al., 2016). In clinical contexts, the MINI or the clinician versions of these interviews is recommended. It may also be desirable to assess transdiagnostic features using the IDAS-II or more focused measures of specific features.

The information from the diagnostic interviews and transdiagnostic measures discussed above are also critical for case conceptualization and treatment planning. In addition, it is necessary to assess the severity of depression and a range of personal, interpersonal, and systemic factors that may contribute to development and maintenance and provide treatment targets (Dougherty et al., 2018; Persons et al., 2018). Optimally, both clinician and self-rating scales should be used to assess severity, although in many contexts only the latter will be feasible. For adults, the IDS-C and IDS-SR have the advantage of being complementary instruments; however, a semistructured version of the HAM-D and the BDI-2 or IDAS-II are also good choices. In clinical practice, the QIDS-C and QIDS-SR, or if that is too burdensome, only the QIDS-SR or BDI-2 are recommended. For children, the CDRS is recommended. For adolescents, a well-validated clinician rating scale is not available. Provisionally, one might use the CDRS for younger adolescents and a semistructured version of the HAM-D or the IDC-C/QIDS-C for older adolescents. In addition, for children and adolescents, both the child and a parent should complete the CDI-2 or the MFQ.

For treatment monitoring and evaluation, the same clinician and self-rating scales should continue to be used throughout the course of treatment, although in many instances only self-rated scales will be feasible. Unfortunately, I am not aware of evidence-based guidelines for the frequency of these assessments; however, attenuation effects, feasibility, and patient burden must all be considered. Finally, in contexts requiring screening, the PHQ-9 or PHQ-2 and the SMFQ are reasonable choices for adults and youth, respectively.

Additional Areas Needing Research

A number of conceptual and methodological challenges and gaps in the literature have been noted throughout this article. I will not repeat them here, but I will briefly comment on several other issues that have not been raised. First, for over two decades, investigators have been pointing to the need for research on the incremental validity and treatment utility of assessment tools, and have outlined a number of study designs that can provide this information (e.g., Hunsley & Meyer, 2003; Nelson-Gray, 2003). Although I have noted several important examples of such work (e.g., the incremental validity of semistructured diagnostic interviews; the treatment utility of measurement-based care), there are still very few data on the incremental validity and treatment utility of most depression measures (Dougherty et al., 2018).

Second, advances in technology have provided new opportunities and challenges. Several of the rating scales reviewed above are available in voice-activated automated formats, and many semistructured interviews can now be computer-administered. Furthermore, diagnostic interviews and clinician- and self-rating scales can be administered via the internet. However, the field is still in the early stages of assessing the equivalence of these newer formats to traditional means of administration (D’Avanzato & Zimmerman, 2017; Sheehan, 2022).

Relatedly, the use of ambulatory assessment, both active (e.g., ecological momentary assessment [EMA]) and passive (e.g., wearable devices monitoring physiology, activity, and geolocation) has mushroomed (Stange et al., 2019). In addition to offering new approaches for between-persons research, it provides exciting opportunities for within-person designs, which may have greater clinical relevance (Wright & Woods, 2020). However, the field has been slow to develop standardized, well-validated EMA assessments for depression, perhaps because surveys must be brief and the optimal time frame for assessing depressive symptoms is not uniform (e.g., mood can be assessed frequently throughout the day, but sleep and weight/appetite require longer time frames). In addition, the incremental value and clinical utility of passive ambulatory assessments are unclear (Kleiman et al., 2021).

Finally, there have been significant advances in the analysis of facial and vocal characteristics (Girard & Cohn, 2015) and natural language, both in social media and interpersonal interactions (e.g., Eichstaedt et al., 2018). These methods may ultimately provide new tools for diagnosis, screening, and monitoring clinical status, but it is still necessary to demonstrate incremental validity and clinical utility and address ethical concerns such as unintended biases (Martinez-Martin, 2019).

Limitations and Conclusion

The literature on the assessment of depression is vast and includes numerous approaches and measures. I have attempted to summarize some of the key issues, such as the nature of the depression construct and general considerations in assessment, and review of some of the most widely used diagnostic interviews, clinician and self-rating scales, and measures of relevant transdiagnostic constructs in adults and youth. I have also offered some recommendations for measures for diagnosis and prognosis, conceptualization and treatment planning, and treatment monitoring and evaluation in research and clinical contexts. It is important to note that the review is selective, rather than systematic, and that my recommendations are subjective, albeit informed by past narrative reviews and systematic reviews and meta-analyses where available. Despite the considerable effort and expense devoted to assessing depressive disorders and symptoms over the past 50 to 60 years, there is still much that is uncertain and unknown. Moreover, as alternative approaches to conceptualizing psychopathology, such as HiTOP, RDoC, and network models, develop, and as new technologies and applications emerge, the ways in which depression is conceptualized and assessed are likely to change as well.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Supported by National Institute of Mental Health grant RO1 MH069942.

ORCID iD

Daniel N. Klein

Notes

References

Achenbach

T. M.

Ivanova

M. Y.

Rescorla

L. A.

(2017). Empirically based assessment and taxonomy of psychopathology for ages 1½-90+ years: Developmental, multi-informant, and multicultural findings. Comprehensive Psychiatry, 79, 4–18.

Achenbach

T. M.

Krukowski

R. A.

Dumenci

Ivanova

M. Y.

(2005). Assessment of adult psychopathology: Meta-analyses and implications of cross-informant correlations. Psychological Bulletin, 131(3), 361–382.

Althoff

R. R.

Ametti

(2021). Measurement of dysregulation in children and adolescents. Child and Adolescent Psychiatric Clinics, 30(2), 321–333.

Ambrosini

P. J.

(2000). Historical development and present status of the schedule for affective disorders and schizophrenia for school-age children (K-SADS). Journal of the American Academy of Child & Adolescent Psychiatry, 39(1), 49–58.

American Psychiatric Association. (1980). Diagnostic and statistical manual of mental disorders (3rd ed.).

American Psychiatric Association. (2022). Diagnostic and statistical manual of mental disorders (5th ed., text rev.). American Psychiatric Publishing.

Angold

Costello

E. J.

Messer

S. C.

Pickles

(1995). Development of a short questionnaire for use in epidemiological studies of depression in children and adolescents. International Journal of Methods in Psychiatric Research, 5, 237–249.

Bagby

R. M.

Ryder

A. G.

Schuller

D. R.

Marshall

M. B.

(2004). The Hamilton Depression Rating Scale: Has the gold standard become a lead weight? The American Journal of Psychiatry, 161(12), 2163–2177.

Balsamo

Cataldi

Carlucci

Padulo

Fairfield

(2018). Assessment of late-life depression via self-report measures: A review. Clinical Interventions in Aging, 13, 2021–2044.

10.

Basco

M. R.

Bostic

J. Q.

Davies

Rush

A. J.

Witte

Hendrickse

Barnett

(2000). Methods to improve diagnostic accuracy in a community mental health setting. The American Journal of Psychiatry, 157(10), 1599–1605.

11.

Beck

A. T.

Brown

G. K.

Steer

R. A.

(1997). Psychometric characteristics of the Scale for Suicide Ideation with psychiatric outpatients. Behaviour Research and Therapy, 35(11), 1039–1046.

12.

Beck

A. T.

Steer

R. A.

Brown

G. K.

(1996). BDI-II: Beck Depression Inventory Manual (2nd ed.). Psychological Corporation.

13.

Beck

A. T.

Steer

R. A.

Carbin

M. G.

(1988). Psychometric properties of the Beck Depression Inventory: Twenty-five years of evaluation. Clinical Psychology Review, 8(1), 77–100.

14.

Borsboom

(2017). A network theory of mental disorders. World Psychiatry, 16(1), 5–13.

15.

Brouwer

Meijer

R. R.

Zevalkink

(2013). On the factor structure of the Beck Depression Inventory–II: G is the key. Psychological Assessment, 25(1), 136–145.

16.

Carrozzino

Patierno

Fava

G. A.

Guidi

(2020). The Hamilton rating scales for depression: A critical review of clinimetric properties of different versions. Psychotherapy and Psychosomatics, 89(3), 133–150.

17.

Carter

Walker

G. M.

Aubeeluck

Manning

J. C.

(2019). Assessment tools of immediate risk of self-harm and suicide in children and young people: A scoping review. Journal of Child Health Care, 23(2), 178–199.

18.

Cohen

J. R.

F. K.

Young

J. F.

Hankin

B. L.

Lee

B. A.

(2019). Youth depression screening with parent and self-reports: Assessing current and prospective depression risk. Child Psychiatry & Human Development, 50(4), 647–660.

19.

Costantini

Pasquarella

Odone

Colucci

M. E.

Costanza

Serafini

. . . Amerio

(2021). Screening for depression in primary care with Patient Health Questionnaire-9 (PHQ-9): A systematic review. Journal of Affective Disorders, 279, 473–483.

20.

Costello

E. J.

Angold

(1988). Scales to assess child and adolescent depression: Checklists, screens, and nets. Journal of the American Academy of Child & Adolescent Psychiatry, 27(6), 726–737.

21.

D’Avanzato

Zimmerman

(2017). The diagnosis and assessment of mood disorders. In DeRubeis

R. J.

Strunk

D. R.

(Eds.), The Oxford handbook of mood disorders (pp. 95–110). Oxford University Press.

22.

De Los Reyes

Augenstein

T. M.

Wang

Thomas

S. A.

Drabick

D. A.

Burgers

D. E.

Rabinowitz

. (2015). The validity of the multi-informant approach to assessing child and adolescent mental health. Psychological Bulletin, 141, 858–900.

23.

De Los Reyes

Thomas

S. A.

Goodman

K. L.

Kundey

S. M

. (2013). Principles underlying the use of multiple informants’ reports. Annual Review of Clinical Psychology, 9, 123–149.

24.

de Zwart

P. L.

Jeronimus

B. F.

de Jonge

. (2019). Empirical evidence for definitions of episode, remission, recovery, relapse and recurrence in depression: A systematic review. Epidemiology and Psychiatric Sciences, 28(5), 544–562.

25.

Dougherty

L. R.

Klein

D. N.

Olino

T. M.

(2018). Depression in children and adolescents. In Hunsley

Mash

(Eds.), A guide to assessments that work (2nd ed., pp. 99–130). Oxford University Press.

26.

Dozois

D. J. A.

Wilde

J. L.

Dobson

K. S.

(2020). Depressive disorders. In Antony

M. M.

Barlow

D. H.

(Eds.), Handbook of assessment and treatment planning for psychological disorders (pp. 335–378). Guilford Publications.

27.

Edelbrock

Costello

A. J.

Dulcan

M. K.

Kalas

Conover

N. C.

(1985). Age differences in the reliability of the psychiatric interview of the child. Child Development, 56, 265–275.

28.

Eichstaedt

J. C.

Smith

R. J.

Merchant

R. M.

Ungar

L. H.

Crutchley

Preoţiuc-Pietro

. . . Schwartz

H. A.

(2018). Facebook language predicts depression in medical records. Proceedings of the National Academy of Sciences of the United States of America, 115(44), 11203–11208.

29.

Feighner

J. P.

Robins

Guze

S. B.

Woodruff

R. A.

Winokur

Munoz

(1972). Diagnostic criteria for use in psychiatric research. Archives of General Psychiatry, 26(1), 57–63.

30.

Ferdinand

R. F.

Hoogerheide

K. N.

van der Ende

Visser

J. H.

Koot

H. M.

Kasius

M. C.

Verhulst

F. C.

(2003). The role of the clinician: Three-year predictive value of parents,’ teachers,’ and clinicians’ judgments of childhood psychopathology. Journal of Child Psychology and Psychiatry, 44, 867–876.

31.

First

M. B.

Williams

J. B. W.

Karg

R. S.

Spitzer

R. L.

(2015). Structured Clinical Interview for DSM-5—Research Version (SCID-5 for DSM-5, Research Version; SCID-5-RV). American Psychiatric Association.

32.

Franklin

J. C.

Ribeiro

J. D.

Fox

K. R.

Bentley

K. H.

Kleiman

E. M.

Huang

. . . Nock

M. K.

(2017). Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research. Psychological Bulletin, 143(2), 187–232.

33.

Fried

E. I.

Epskamp

Nesse

R. M.

Tuerlinckx

Borsboom

(2016). What are “good” depression symptoms? Comparing the centrality of DSM and non-DSM symptoms of depression in a network analysis. Journal of Affective Disorders, 189, 314–320.

34.

Fried

E. I.

Flake

J. K.

Robinaugh

D. J.

(2022). Revisiting the theoretical and methodological foundations of depression measurement. Nature Reviews Psychology, 1, 358–368.

35.

Fried

E. I.

van Borkulo

C. D.

Epskamp

Schoevers

R. A.

Tuerlinckx

Borsboom

(2016). Measuring depression over time … Or not? Lack of unidimensionality and longitudinal measurement invariance in four common rating scales of depression. Psychological Assessment, 28(11), 1354–1367.

36.

Gard

D. E.

Gard

M. G.

Kring

A. M.

John

O. P.

(2006). Anticipatory and consummatory components of the experience of pleasure: A scale development study. Journal of Research in Personality, 40, 1086–1102.

37.

Girard

J. M.

Cohn

J. F.

(2015). Automated audiovisual depression analysis. Current Opinion in Psychology, 4, 75–79.

38.

Gooding

D. C.

Pflum

M. J.

(2014). The assessment of interpersonal pleasure: Introduction of the Anticipatory and Consummatory Interpersonal Pleasure Scale (ACIPS) and preliminary findings. Psychiatry Research, 215, 237–243.

39.

Haller

S. P.

Kircanski

Stringaris

Clayton

Bui

Agorsor

. . . Brotman

M. A.

(2020). The Clinician Affective Reactivity Index: Validity and reliability of a clinician-rated assessment of irritability. Behavior Therapy, 51(2), 283–293.

40.

Hamilton

(1960). A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry, 23(1), 56–62.

41.

Haroz

E. E.

Ritchey

Bass

J. K.

Kohrt

B. A.

Augustinavicius

Michalopoulos

. . . Bolton

(2017). How is depression experienced around the world? A systematic review of qualitative literature. Social Science & Medicine, 183, 151–162.

42.

Hunsley

Meyer

G. J.

(2003). The incremental validity of psychological testing and assessment: Conceptual, methodological, and statistical issues. Psychological Assessment, 15(4), 446–455.

43.

James

S. L.

Abate

K. H.

Abay

S. M.

Abbafati

Abbasi

. . . Briggs

A. M.

(2018). Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017. The Lancet, 392(10159), 1789–1858.

44.

Jarbin

Ivarsson

Andersson

Bergman

Skarphedinsson

(2020). Screening efficiency of the Mood and Feelings Questionnaire (MFQ) and Short Mood and Feelings Questionnaire (SMFQ) in Swedish help seeking outpatients. PLOS ONE, 15(3), Article e0230623.

45.

Jensen

P. S.

Rubio-Stipec

Canino

Bird

H. R.

Dulcan

M. K.

Schwab-Stone

M. E.

Lahey

B. B.

(1999). Parent and child contributions to diagnosis of mental disorder: Are both informants always necessary? Journal of the American Academy of Child & Adolescent Psychiatry, 38, 1569–1579.

46.

Kendler

K. S.

(2016). The phenomenology of major depression and the representativeness and nature of DSM criteria. The American Journal of Psychiatry, 173(8), 771–780.

47.

Kendler

K. S.

(2017). DSM disorders and their criteria: How should they interrelate? Psychological Medicine, 47(12), 2054–2060.

48.

Kessler

R. C.

Bromet

E. J.

(2013). The epidemiology of depression across cultures. Annual Review of Public Health, 34, 119–138.

49.

Kessler

R. C.

Üstün

T. B.

(2004). The World Mental Health (WMH) survey initiative version of the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI). International Journal of Methods in Psychiatric Research, 13(2), 93–121.

50.

Khazanov

G. K.

Ruscio

A. M.

Forbes

C. N.

(2020). The Positive Valence Systems Scale: Development and validation. Assessment, 27(5), 1045–1069.

51.

Kirmayer

L. J.

Jarvis

G. E.

Gomez-Carillo

(2022). Depression across cultures. In Nemeroff

C. B.

Schatzberg

A. F.

Rasgon

Strakowski

S. M.

(Eds.), The American Psychiatric Association Publishing textbook of mood disorders (pp. 837–867). American Psychiatric Association Publishing.

52.

Kleiman

E. M.

Bentley

K. H.

Maimone

J. S.

Lee

H. I. S.

Kilbury

E. N.

Fortgang

R. G.

. . . Nock

M. K.

(2021). Can passive measurement of physiological distress help better predict suicidal thinking? Translational Psychiatry, 11(1), 611.

53.

Klein

D. N.

(2008). Classification of depressive disorders in DSM-V: Proposal for a two-dimension system. Journal of Abnormal Psychology, 117, 552–560.

54.

Klein

D. N.

(in press). Diagnosis and classification of depressive disorders. In Pettit

J. W.

Olino

T. M.

(Eds.), APA handbook of depression. American Psychological Association.

55.

Klein

D. N.

Dougherty

L. R.

Kessel

E. M.

Silver

Carlson

G. A.

(2021). A transdiagnostic perspective on youth irritability. Current Directions in Psychological Science, 30, 437–443.

56.

Klein

D. N.

Goldstein

B. L.

Finsaas

(2017). Depressive disorders. In Beauchaine

T. P.

Hinshaw

S. P.

(Eds.), Child and adolescent psychopathology (3rd ed., pp. 610–641). John Wiley.

57.

Klein

D. N.

Ouimette

P. C.

Kelly

H. S.

Ferro

Riso

L. P.

(1994). Test-retest reliability of team consensus best-estimate diagnoses of Axis I and II disorders in a family study. The American Journal of Psychiatry, 151, 1043–1047.

58.

Kotov

Krueger

R. F.

Watson

Achenbach

T. M.

Althoff

R. R.

Bagby

R. M.

. . . Zimmerman

(2017). The Hierarchical Taxonomy of Psychopathology (HiTOP): A dimensional alternative to traditional nosologies. Journal of Abnormal Psychology, 126(4), 454–477.

59.

Kovacs

(2011). Children’s Depression Inventory 2 (CDI 2) (2nd ed.). Multi-Health Systems.

60.

Kozak

M. J.

Cuthbert

B. N.

(2016). The NIMH Research Domain Criteria initiative: Background, issues, pragmatics. Psychophysiology, 53, 286–297.

61.

Kraus

D. R.

Seligman

D. A.

Jordan

J. R.

(2005). Validation of a behavioral health treatment outcome and assessment tool designed for naturalistic settings: The Treatment Outcome Package. Journal of Clinical Psychology, 61(3), 285–314.

62.

Kroenke

Spitzer

R. L.

Williams

J. B.

(2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606–613.

63.

Lee

Y. Y.

Stockings

E. A.

Harris

M. G.

Doi

S. A. R.

Page

I. S.

Davidson

S. K.

Barendregt

J. J.

(2019). The risk of developing major depression among individuals with subthreshold depression: A systematic review and meta-analysis of longitudinal cohort studies. Psychological Medicine, 49(1), 92–102.

64.

Levis

Benedetti

Ioannidis

J. P.

Sun

Negeri

Thombs

B. D

. (2020). Patient health questionnaire-9 scores do not accurately estimate depression prevalence: Individual participant data meta-analysis. Journal of Clinical Epidemiology, 122, 115–128.

65.

Lindley

Bauerband

(in press). Measurement invariance of the Depression, Anxiety, and Stress Scale (DASS-21) across cisgender sexual minority and transgender and nonbinary individuals. Psychology of Sexual Orientation and Gender Diversity. Advance online publication. https://doi.org/10.1037/sgd0000554

66.

Loevinger

(1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635–694.

67.

Lovibond

P. F.

Lovibond

S. H.

(1995). The structure of negative emotional states: Comparison of the Depression Anxiety Stress Scales (DASS) with the Beck Depression and Anxiety Inventories. Behaviour Research and Therapy, 33(3), 335–343.

68.

Makol

B. A.

Polo

A. J.

(2018). Parent-child endorsement discrepancies among youth at chronic-risk for depression. Journal of Abnormal Child Psychology, 46, 1077–1088.

69.

Markon

K. E.

Chmielewski

Miller

C. J.

(2011). The reliability and validity of discrete and continuous measures of psychopathology: A quantitative review. Psychological Bulletin, 137(5), 856–879.

70.

Martin

L. A.

Neighbors

H. W.

Griffith

D. M.

(2013). The experience of symptoms of depression in men vs women: Analysis of the National Comorbidity Survey Replication. JAMA Psychiatry, 70(10), 1100–1106.

71.

Martinez-Martin

(2019). What are important ethical implications of using facial recognition technology in health care? AMA Journal of Ethics, 21(2), E180–E187.

72.

McCullough

J. P.

Jr. Clark

S. W.

Klein

D. N.

First

M. B.

(2016). Introducing a clinical course-graphing scale for DSM-5 mood disorders. American Journal of Psychotherapy, 70(4), 383–392.

73.

McGlinchey

J. B.

Zimmerman

Young

Chelminski

(2006). Diagnosing major depressive disorder VIII: Are some symptoms better than others? The Journal of Nervous and Mental Disease, 194(10), 785–790.

74.

Mellick

Hatkevich

Venta

Hill

R. M.

Kazimi

Elhai

J. D.

Sharp

(2019). Measurement invariance of depression symptom ratings across African American, Hispanic/Latino, and Caucasian adolescent psychiatric inpatients. Psychological Assessment, 31(6), 833–838.

75.

Mishlove

Chapman

L. J.

(1985). Social anhedonia in the prediction of psychosis proneness. Journal of Abnormal Psychology, 94(3), 384–396.

76.

Montgomery

S. A.

Åsberg

(1979). A new depression scale designed to be sensitive to change. The British Journal of Psychiatry, 134(4), 382–389.

77.

Morken

I. S.

Viddal

K. R.

Ranum

Wichstrøm

(2021). Depression from preschool to adolescence–five faces of stability. Journal of Child Psychology and Psychiatry, 62(8), 1000–1009.

78.

Mulraney

M. A.

Melvin

G. A.

Tonge

B. J.

(2014). Psychometric properties of the Affective Reactivity Index in Australian adults and adolescents. Psychological Assessment, 26(1), 148–155.

79.

Negeri

Z. F.

Levis

Sun

Krishnan

. . . Thombs

B. D.

(2021). Accuracy of the Patient Health Questionnaire-9 for screening to detect major depression: Updated systematic review and individual participant data meta-analysis. British Medical Journal, 375, n2183.

80.

Nelson-Gray

R. O.

(2003). Treatment utility of psychological assessment. Psychological Assessment, 15(4), 521–531.

81.

Olino

T. M.

(2020). Clinical applications of measurement invariance. Journal of Personality Assessment, 102(5), 727–729.

82.

Olino

T. M.

Klein

D. N.

Rohde

Seeley

J. R.

Pilkonis

P. A.

Lewinsohn

P. M.

(2012). Measuring depression using item response theory: An examination of three measures of depressive symptomatology. International Journal of Methods in Psychiatric Research, 21(1), 76–85.

83.

Patel

J. S.

Rand

K. L.

Cyders

M. A.

Kroenke

Stewart

J. C.

(2019). Measurement invariance of the Patient Health Questionnaire-9 (PHQ-9) depression screener in U.S. adults across sex, race/ethnicity, and education level: Nhanes 2005-2016. Depression and Anxiety, 36(9), 813–823.

84.

Persons

J. B.

Fresco

D. M.

Ernst

J. S.

(2018). Adult depression. In Hunsley

Mash

E. J.

(Eds.), A guide to assessments that work (2nd ed., pp. 131–151). Oxford University Press.

85.

Pettit

J. W.

Lewinsohn

P. M.

Roberts

R. E.

Seeley

J. R.

Monteith

L. L.

(2009). The long-term course of depression: Development of an empirical index and identification of early adult outcomes. Psychological Medicine, 39(3), 403–412.

86.

Posner

Brown

G. K.

Stanley

Brent

D. A.

Yershova

K. V.

Oquendo

M. A.

. . . Mann

J. J.

(2011). The Columbia–Suicide Severity Rating Scale: Initial validity and internal consistency findings from three multisite studies with adolescents and adults. The American Journal of Psychiatry, 168(12), 1266–1277.

87.

Poznanski

E. O.

Cook

S. C.

Carroll

B. J.

(1979). A depression rating scale for children. Pediatrics, 64, 442–450.

88.

Poznanski

E. O.

Mokros

H. B.

(1999). Children Depression Rating Scale-Revised (CDRS-R). Western Psychological Services.

89.

Radloff

L. S.

(1977). The CES-D Scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1, 385–401.

90.

Regier

D. A.

Narrow

W. E.

Clarke

D. E.

Kraemer

H. C.

Kuramoto

S. J.

Kuhl

E. A.

Kupfer

D. J.

(2013). DSM-5 field trials in the United States and Canada, Part II: Test-retest reliability of selected categorical diagnoses. The American Journal of Psychiatry, 170(1), 59–70.

91.

Rizvi

S. J.

Pizzagalli

D. A.

Sproule

B. A.

Kennedy

S. H.

(2016). Assessing anhedonia in depression: Potentials and pitfalls. Neuroscience & Biobehavioral Reviews, 65, 21–35.

92.

Runeson

Odeberg

Pettersson

Edbom

Jildevik Adamsson

Waern

(2017). Instruments for the assessment of suicide risk: A systematic review evaluating the certainty of the evidence. PLOS ONE, 12(7), Article e0180292.

93.

Ruscio

A. M.

(2019). Normal versus pathological mood: Implications for diagnosis. Annual Review of Clinical Psychology, 15, 179–205.

94.

Rush

A. J.

Gullion

C. M.

Basco

M. R.

Jarrett

R. B.

Trivedi

M. H.

(1996). The Inventory of Depressive Symptomatology (IDS): Psychometric properties. Psychological Medicine, 26(3), 477–486.

95.

Rush

A. J.

Trivedi

M. H.

Ibrahim

H. M.

Carmody

T. J.

Arnow

Klein

D. N.

Markowitz

J. C.

Ninan

P. T.

Kornstein

Manber

Thase

M. E.

Kocsis

J. H.

Keller

M. B.

(2003). The 16-item Quick Inventory of Depressive Symptomatology (QIDS) Clinician Rating (QIDS-C) and Self-Report (QIDS-SR): A psychometric evaluation in patients with chronic major depression. Biological Psychiatry, 54, 573–583.

96.

Saatchi

Agbayani

C. J. G.

Clancy

S. L.

Fortier

M. A.

(2023). Measuring irritability in young adults: An integrative review of measures and their psychometric properties. Journal of Psychiatric and Mental Health Nursing, 30, 35–53.

97.

Sedano-Capdevila

Porras-Segovia

Bello

H. J.

Baca-García

Barrigon

M. L.

(2021). Use of ecological momentary assessment to study suicidal thoughts and behavior: A systematic review. Current Psychiatry Reports, 23(7), 41.

98.

Shankman

S. A.

Funkhouser

C. J.

Klein

D. N.

Davila

Lerner

Hee

(2018). Reliability and validity of severity dimensions of psychopathology assessed using the Structured Clinical Interview for DSM-5 (SCID). International Journal of Methods in Psychiatric Research, 27(1), e1590.

99.

Sheehan

D. V.

(2022). Rating scales and structured diagnostic interviews for mood disorders. In Nemeroff

C. B.

Schatzberg

A. F.

Rasgon

Strakowski

S. M.

(Eds.), The American Psychiatric Association Publishing textbook of mood disorders (pp. 55–90). American Psychiatric Association Publishing.

100.

Sheehan

D. V.

Lecrubier

Sheehan

K. H.

Janavs

Weiller

Keskiner

. . . Dunbar

G. C.

(1997). The validity of the Mini International Neuropsychiatric Interview (MINI) according to the SCID-P and its reliability. European Psychiatry, 12(5), 232–241.

101.

Sheehan

D. V.

Sheehan

K. H.

Shytle

R. D.

Janavs

Bannon

Rogers

J. E.

. . . Wilkinson

(2010). Reliability and validity of the Mini International Neuropsychiatric Interview for Children and Adolescents (MINI-KID). The Journal of Clinical Psychiatry, 71(3), 313–326.

102.

Simms

L. J.

Wright

A. G.

Cicero

Kotov

Mullins-Sweatt

S. N.

Sellbom

. . . Zimmermann

(2022). Development of measures for the Hierarchical Taxonomy of Psychopathology (HiTOP): A collaborative scale development project. Assessment, 29(1), 3–16.

103.

Snaith

R. P.

Hamilton

Morley

Humayan

Hargreaves

Trigwell

(1995). A scale for the assessment of hedonic tone: The Snaith-Hamilton Pleasure Scale. The British Journal of Psychiatry, 167, 99–103.

104.

Stallwood

Monsour

Rodrigues

Monga

Terwee

Offringa

Butcher

N. J.

(2021). Systematic review: The measurement properties of the Children’s Depression Rating Scale–Revised in adolescents with major depressive disorder. Journal of the American Academy of Child & Adolescent Psychiatry, 60(1), 119–133.

105.

Stange

J. P.

Kleiman

E. M.

Mermelstein

R. J.

Trull

T. J.

(2019). Using ambulatory assessment to measure dynamic risk processes in affective disorders. Journal of Affective Disorders, 259, 325–336.

106.

Steffens

D. C.

(2019). Vascular depression: Is an old research construct finally ready for clinical prime time? Biological Psychiatry, 85(6), 441–442.

107.

Stringaris

Goodman

Ferdinando

Razdan

Muhrer

Leibenluft

Brotman

M. A.

(2012). The Affective Reactivity Index: A concise irritability scale for clinical and research settings. Journal of Child Psychology and Psychiatry, 53(11), 1109–1117.

108.

Sultan

Ando

Elkhateb

George

R. B.

Lim

Carvalho

. . . O’Carroll

(2022). Assessment of patient-reported outcome measures for maternal postpartum depression using the consensus-based standards for the selection of health measurement instruments guideline: A systematic review. JAMA Network Open, 5(6), e2214885.

109.

Tasca

G. A.

Angus

Bonli

Drapeau

Fitzpatrick

Hunsley

Knoll

(2019). Outcome and progress monitoring in psychotherapy: Report of a Canadian Psychological Association Task Force. Canadian Psychology, 60(3), 165–177.

110.

Thom

Hogan

Hazen

(2020). Suicide risk screening in the hospital setting: A review of brief validated tools. Psychosomatics, 61(1), 1–7.

111.

Toohey

M. J.

DiGiuseppe

(2017). Defining and measuring irritability: Construct clarification and differentiation. Clinical Psychology Review, 53, 93–108.

112.

Townsend

Kobak

Kearney

Milham

Andreotti

Escalera

. . . Kaufman

(2020). Development of three web-based computerized versions of the Kiddie Schedule for Affective Disorders and Schizophrenia child psychiatric diagnostic interview: Preliminary validity data. Journal of the American Academy of Child & Adolescent Psychiatry, 59(2), 309–325.

113.

Turkheimer

(2017). The hard question in psychiatric nosology. In Kendler

K. S.

Parnas

(Eds.), Philosophical issues in psychiatry IV: Classification of psychiatric illness (pp. 27–44). Oxford University Press.

114.

Uher

Perlis

R. H.

Placentino

Dernovšek

M. Z.

Henigsberg

Mors

. . . Farmer

(2012). Self-report and clinician-rated measures of depression severity: Can one replace the other? Depression and Anxiety, 29(12), 1043–1049.

115.

Verhulst

F. C.

Dekker

M. C.

van der Ende

(1997). Parent, teacher, and self-reports as predictors of signs of disturbance in adolescents: Whose information carries the most weight? Acta Psychiatrica Scandinavica, 96, 75–81.

116.

Wakefield

J. C.

Schmitz

M. F.

(2014). Predictive validation of single-episode uncomplicated depression as a benign subtype of unipolar major depression. Acta Psychiatrica Scandinavica, 129, 445–457.

117.

Wang

Y. P.

Gorenstein

(2013). Psychometric properties of the Beck Depression Inventory-II: A comprehensive review. Brazilian Journal of Psychiatry, 35, 416–431.

118.

Waszczuk

M. A.

Kotov

Ruggero

Gamez

Watson

(2017). Hierarchical structure of emotional disorders: From individual symptoms to the spectrum. Journal of Abnormal Psychology, 126(5), 613–634.

119.

Watson

O’Hara

M. W.

(2017). Understanding the emotional disorders: A symptom-level approach based on the IDAS-II. Oxford University Press.

120.

Watson

O’Hara

M. W.

Naragon-Gainey

Koffel

Chmielewski

Kotov

. . . Ruggero

C. J.

(2012). Development and validation of new anxiety and bipolar symptom scales for an expanded version of the IDAS (the IDAS-II). Assessment, 19(4), 399–420.

121.

Williams

J. B.

(1988). A structured interview guide for the Hamilton Depression Rating Scale. Archives of General Psychiatry, 45(8), 742–747.

122.

Williams

J. B.

Gibbon

First

M. B.

Spitzer

R. L.

Davies

Borus

. . . Wittchen

H. U.

(1992). The structured clinical interview for DSM-III-R (SCID): II. Multisite test-retest reliability. Archives of General Psychiatry, 49(8), 630–636.

123.

Williams

J. B.

Kobak

K. A.

(2008). Development and reliability of a structured interview guide for the Montgomery Asberg Depression Rating Scale (SIGMA). The British Journal of Psychiatry, 192(1), 52–58.

124.

Williams

J. B.

Kobak

K. A.

Bech

Engelhardt

Evans

Lipsitz

. . . Kalali

(2008). The GRID-HAMD: Standardization of the Hamilton Depression Rating Scale. International Clinical Psychopharmacology, 23(3), 120–129.

125.

World Health Organization. (2019). International statistical classification of diseases and related health problems, eleventh revision.

126.

Wright

A. G.

Woods

W. C.

(2020). Personalized models of psychopathology. Annual Review of Clinical Psychology, 16, 49–74.

127.

Zanarini

M. C.

Skodol

A. E.

Bender

Dolan

Sanislow

Schaefer

. . . Gunderson

J. G.

(2000). The collaborative longitudinal personality disorders study: Reliability of axis I and II diagnoses. Journal of Personality Disorders, 14(4), 291–299.

128.

Zimmerman

Morgan

T. A.

Stanton

(2018). The severity of psychiatric disorders. World Psychiatry, 17(3), 258–275.