Abstract
Keywords
The Strengths and Difficulties Questionnaire (SDQ) [1], is a brief screening measure that is being employed increasingly for the purpose of identifying behavioural and emotional problems in children and adolescents. Developed in the UK, the instrument produces scores for each of five subscales: conduct problems; hyperactivity; emotional symptoms; peer problems; and prosocial behaviour. Each of these consists of five items. A ‘total difficulties’ score is calculated by totalling the four deficit focused subscales (i.e. all except for prosocial behaviour), and an impact score is produced based on five items, such as: ‘Do the difficulties upset or distress your child?’ Parent and teacher-forms of the SDQ are available for 3–16 year-olds, while a youth report form is available for 11–16 years olds. At less than onequarter the length of the Child Behaviour Checklist (CBCL) [2], and with evidence of comparable reliability/ validity [3], expedience and parsimony appear to underlie the SDQ's increasing popularity. Within Australia, interest in the SDQ has escalated following recent inclusion of the measure in the Mental Health Outcomes and Assessment Training (MH-OAT) protocol, a state-wide data collection protocol being used with patients in New South Wales mental health services. Data from the SDQ and other key measures are collected on admission, review, and discharge, and routinely reported to the NSW Department for Health.
While current interest in the SDQ attests to the demand for such brief measures, it is only with continued evaluation that the properties of this relatively new instrument may be understood. Furthermore, the appropriateness of this measure for use in different cultures is an issue of current relevance, with the SDQ now available worldwide in over 40 languages. The SDQ has been evaluated within the UK [6] as well as with samples of Dutch (n = 562) [5], Arabic (n = 322) [6], Swedish (n = 900) [7], Bangladeshi (n = 261) [8], German (n = 273) [9] and Finnish (n = 703) [10] children. These studies vary considerably with regard to the properties of the instrument evaluated.
Studies of the SDQ conducted with samples in the UK [4], Holland [5] and Sweden [7] have offered largely consistent support for the original five-factor structure of the SDQ in parent, teacher and youthreport formats. Unexpected findings consistent across at least two of these include numerous secondary loadings on the prosocial factor [4], [7] and a limited association between the item ‘Generally obedient…’ and the ‘conduct problems’ factor with which it is theoretically associated [4], [5], [7]. Only one available study, conducted with Arabic children [6], has failed to replicate the predicted factor structure. The authors concluded that while the original factors were somewhat evident in the sample, the individual subscales appeared to be more heterogenous or multifactorial than observed in other populations.
UK data on the original five-factor structure of the SDQ has generally suggested sound internal reliability. Goodman [4] reported mean Cronbach's 〈 coefficients of 0.73 across the respective scales. In this sample, the lowest coefficient was reported for youth self-report of peer problems (0.41), while parent and teacher-reports across scales ranged from 0.57 (parent-report peer problems) to 0.88 (Teacher-report hyperactivity).
The internal consistency of the SDQ has also been assessed in the Dutch [5], Swedish [7], Arab [6] and Finnish [10] studies. Findings across these populations have generally supported the internal reliability of the instrument (Finnish: 〈 = 0.63–0.86; Swedish: 〈 = 0.51− 0.75; Dutch: 〈 = 0.45–0.80), with the only questionable support coming from the Arabic sample [6], in which the authors attributed low to moderate coefficients (〈 = 0.18− 0.65) to the unexpected factor structure observed.
Research into the test–retest reliability of the SDQ is limited and available findings appear mixed. Only one UK study appears to have reported such evidence, with a community sample of 34 parents completing the measure 3–4 weeks following initial administration [11]. Intra-class correlations ranged from 0.44 (the ‘burden’ item from the impact scale) to 0.85 (total difficulties). However, coefficients for five SDQ subscales were not reported.
The study reported by Muris, Meesters, and van den Berg [5] appears to include the only formal assessment of test–retest reliability to come from a sample outside the UK. Parent and youth reports on the SDQ were obtained for 91 children from the study's original sample (n = 562), 2 months after initial administration. With the exception of the procosial subscale (ICC = 0.59) of the youth-report form, the intra-class correlations of all subscales were higher than 0.70.
Support for the concurrent and discriminant validity of the SDQ is available from a range of UK studies [1], [3], [4], [11]. Evidence of good concurrent validity against established self-report measures is also available from the Dutch [5], German [9] and Finnish [10] studies. In populations outside the UK, evidence of the discriminant validity of the SDQ subscales to date has been available from Bangladeshi [8] and German [9] samples only. In the latter, SDQ parent and teacher-reports on the subscales of the SDQ and CBCL performed comparably in distinguishing between community and clinic samples, while the total difficulties scale on the SDQ discriminated more accurately than that of the CBCL. The SDQ also demonstrated some advantage over the CBCL in discriminating between diagnoses within the clinical sample, with the SDQ significantly better at predicting hyperactivity [9]. Less impressive however, were the findings from the Bangadeshi sample, in which the total difficulties scale and the peers problems and prosocial subscales failed to distinguish between clinic and community groups [8]. As the SDQ was the only measure employed in this study, the extent to which these findings reflect the specific properties of the SDQ, or the actual sample, is unclear.
In summary, while evidence to date has been largely supportive of the reliability and validity of the SDQ in various populations [5],[7–10], mixed findings (e.g. [6], [8]). highlight the need for continued evaluation. With a notable lack of evidence pertaining to samples of younger children, further attention to such samples appears particularly warranted. The aim of this study was to assess the basic psychometric properties of the parent-report form of the SDQ with a large community sample of young Australian children (4–9 years). As Goodman [4] noted, studies evaluating the psychometric properties of the SDQ have been characterized by potentially unrepresentative samples, small sample sizes, or the absence of independent psychiatric diagnoses as validating criteria. The current study attempts to addresses each of these criticisms, while assessing the internal consistency, stability, external validity, and normative data and cut-offs for the measure.
Method
Sample
A sample of 1359 4–9-year-old children was recruited through 11 primary schools in the city of Brisbane, Australia's third largest city. The schools were chosen to represent a range of inner-city and suburban locations of differing socioeconomic status. Family income ranged from less than $20 000 (4%), $20–30 000 (8%), $30–70 000 (50%) to over $70 000 (38%). Education in parents ranged from elementary school (1%) through a mode of ‘finished high school’ (25%) to university educated (20%). The majority of families were two caregivers; 13% were sole parents. Children with complete data sets were available as follows: 4–6-year-old boys (n = 404); 7–9-year-old boys (n = 302); 4–6-year-old girls (n = 398); and 7–9-year-old girls (n = 255).
Permission to conduct research was obtained from the Griffith University Human Research Ethics Committee, Education Queensland, and Catholic Education Queensland. The test battery (with attached information and consent forms) was dispersed through the schools to all children within the relevant age ranges and sent home to parents. Mean return rates ranged from 32.5% to 74.8% across schools.
Of the original 1359 children, 900 were randomly selected to complete follow-up assessments at 12 months; 780 (86.7%) completed and returned the measure and 450 of these were randomly selected for telephone diagnostic interviews. Of these, 327 (72.6%) were successfully contacted and interviewed.
Measures
In addition to the parent-report SDQ, standard demographic details were collected. and school teachers completed a rating form for each participating child. This measure consists of 5-point Likert scales assessing the child on dimensions of anxiety (shy, nervous, afraid, inhibited), aggressive, impulsive-hyperactive, language, reading and writing problems. Previous research has supported the ability of teachers to accurately report on these dimensions (Strauss, Frame and Forehand, 1987) and our previous research has supported the validity of this specific system in the context of large school-based studies [12].
Diagnostic telephone interviews were conducted using the Diagnostic Interview Schedule for Children, Adolescents, and Parents (DISCAP) [13], a semistructured interview based on DSM-IV criteria with good reliability [14]. The DISCAP is used to assign DSM-IV diagnoses and identify subclinical features of DSM-IV disorders, both of which are assigned severity ratings using a six-point scale (1 = minimal impairment in functioning/symptoms rarely problematic, to 6 = very severe impairment in functioning/symptoms always problematic). As such, the diagnostic data collected during follow-up assessments were both categorical (presence or absence of respective DSM-IV diagnosis) and continuous (severity of diagnosis or subclinical diagnostic features). In the current study, a severity rating scale cut-off of 4 was used as the criteria for clinically significant problems warranting diagnosis.
Interviews were conducted by clinical psychologists using the Diagnostic Interview Schedule for Children, Adolescents, and Parents (DISCAP) [13], and completed for 327 (72.9%) of this sample. Twenty-five percent of interviews were conducted by two interviewers, positioned on separate telephone lines and kept blind to each other's written notes and diagnoses, in order to check interrater reliability of diagnoses. Interrater reliability for DISCAP interviews was high with 100% agreement on externalizing disorders and only one disagreement of internalizing disorders (κ = 0.87). The correlation between raters’ severity ratings for primary diagnosis was r = 0.96.
Results and discussion
Statistical analyses revealed no differences in the demographic and adjustment profiles of participants according to participation rates achieved in each school. Analyses were also conducted to check that there were no demographic or adjustment differences between participants selected for follow-up phases, participants who completed these measures, and the larger pool. A series of ANOVA s using sample as the independent variable and demographic measures and SDQ scores at time 1 as dependent variables confirmed the equivalence of the samples.
Table 1 shows coefficient alphas for each of the five SDQ subscales, and the total difficulties and impact scales. These range from 0.59 (peer problems) to 0.80 (hyperactivity), indicating a moderate to strong internal reliability across the subscales. These alphas are very similar to those reported by Goodman [4], with both studies showing that for parent-report, internal reliability was strongest for the hyperactivity subscale, and weakest for the peer problems subscale.
Mean scores and banding for SDQ subscales and total difficulties and impact scales
Factor structure
The factor structure of the SDQ was examined using SPSS principal components analysis with oblimin rotation, performed for girls and boys separately. Table 2 shows the pattern matrix after a five-factor solution was forced. The five factors produced are consistent with the original subscales of the SDQ. For boys, the hyperactivity factor accounted for most of the total variance (22.45%), while conduct problems explained the least (5.09%). A different pattern was observed in girls, with the prosocial factor accounting for the most variance (19.71%), and peer problems the least (5.11%).
SDQ factor analysis
For both genders, most items loaded moderately to strongly onto their predicted factors, with the factor loadings for boys found to be generally stronger that those for girls. Common to both boys and girls were cross-loadings for the item ‘Generally obedient…’ For boys, the item loaded most strongly (albeit negatively) onto the prosocial factor (− 0.38), and almost as strongly onto the hyperactivity factor (0.34). For girls it loaded most strongly onto the prosocial factor (0.46). In girls, cross-loadings were also observed for the peer problems item ‘Solitary, plays alone…’, which loaded weakly onto this predicted factor (0.39) and almost as strongly onto the emotional symptoms factor (0.30). The hyperactivity item ‘Thinks things out before acting’ loaded moderately onto the predicted factor (0.48) and to a lesser extent onto the prosocial factor (0.39).
Table 3 presents correlations between the five subscales of the SDQ. Each scale correlated significantly (p < 0.01) with every other, with these correlations ranging from −0.14 (prosocial and emotional symptoms) to 0.52 (conduct problems and hyperactivity). While the observed pattern of correlations suggested mutual associations across the five SDQ subscales, the strengths and directions of these correlations are conceptually meaningful and consistent with current knowledge of comorbidity. For example, both the conduct problems and hyperactivity subscales correlated most strongly with each other (0.52), while prosocial correlated most strongly with conduct problems (− 0.46), with the expected negative relationship.
Correlations between SDQ scales and teacher ratings
Table 3 also presents the correlations between teacher ratings of child behaviour and parent report on the SDQ. These reveal consistent cross-informant agreement between subscales of common symptom areas. Teacher-rated aggression, for example, correlated positively with SDQ conduct problems (0.35, p < 0.01), and negatively with prosocial (− 0.24, p < 0.01). As would be expected, the SDQ peer problems subscales correlated positively with teacher ratings of aggression (.15, p < 0.05), and hyperactivity (0.19, p < 0.05). With regard to internalizing symptoms, teacher ratings of anxiety correlated positively with the SDQ emotional symptoms (.21, p < 0.01).
Stability
SDQ scores at time 1 and time 2 were used to examine test–retest reliability. Clearly, 12 months is too long to conduct a traditional estimate of measurement stability. Over such a period, correlations will reflect both measurement instability as well as real changes in the child's behaviour due to maturation, environmental changes and the like. Thus, correlations between time 1 and time 2 scores will be at the lower end for stability estimates. However, we deemed these worth reporting due to the high values obtained for the current sample: hyperactivity, r = 0.77; conduct problems, r = 0.65; emotional symptoms, r = 0.71; peer problems, r = 0.61; prosocial, r = 0.64; total difficulties; r = 0.77; impact scores, r = 0.63. These show that parents’ ratings on the SDQ are remarkably stable over a 12-month period and are only marginally lower than those reported for test–retest intervals of 1–2 months [5], [11].
Validation against clinical diagnoses
DSM-IV diagnoses assigned to the interviewed sample were separated into four diagnostic groupings corresponding to the main symptom subscales of the SDQ. These were: conduct disorders (conduct disorder and/or oppositional defiant disorder); hyperactivity (attention deficit/ hyperactivity disorders); internalizing disorders (separation anxiety, specific phobia, overanxious disorder, generalized anxiety disorder, panic disorder, social phobia). Finally, diagnoses of disorders not subsumed by these categories (elimination disorders, adjustment disorder, school refusal, nightmare disorder) formed the other disorders category.
Table 4 presents the prevalence of diagnoses when cases were grouped according to high or low risk based on SDQ subscale and total scores. Cases scoring within the most extreme 10% of each subscale were regarded as ‘high risk’, while those scoring below the 90th percentile were regarded as low risk. The exception to this was the prosocial subscale, on which the lowest scoring 10% were thought most likely to exhibit psychopathology. The diagnoses examined for each SDQ score were those most closely associated with the respective subscale score. For scores on the emotional symptoms, conduct problems, and hyperactivity subscales of the SDQ, these diagnoses were those grouped in the three respective diagnostic categories described earlier (conduct disorders, hyperactivity, internalizing disorders). As total difficulties, peer problems, prosocial, and impact scores were thought relevant to a range of diagnoses, the prevalence of any diagnosis was examined for each. This method is based on Goodman's [4] analysis of concordance between the SDQ and clinical diagnoses, and as such was thought optimal for comparing data from the current study with previous SDQ research. Following Goodman [4], the discrete scores of the SDQ were divided as closely as possibly into groups of 10% and 90% (e.g. at times these were 12% and 88%, etc. due to the discrete nature of the scores).
Prevalence of DSM-IV diagnoses within high (extreme 10% of sample) and low risk (90% of sample) groups based on SDQ scores
For each SDQ scale/subscale, there were significant differences in prevalence between the high and low-risk groups (p < 0.05 for each), indicating that higher scores were associated with a greater probability of being assigned a DSM-IV diagnosis. The highest odds ratio was observed for the conduct problems subscale (and ODD, CD diagnoses) (30.5), while the lowest was seen for prosocial (and frequencies of any diagnoses) (2.3).
The concurrent validity of the SDQ was also evaluated against diagnostic interviews by correlating SDQ scores with the severity of primary Axis I diagnostic features. Ratings of symptom severity were collapsed into the same diagnostic variables described earlier (conduct disorders, hyperactivity, internalizing disorders), in addition to ratings of symptom severity for any disorder (i.e. those within these categories as well as those in the ‘other disorders’ category). Results are shown in Table 5.
Correlations between SDQ scales and severity of diagnostic features
The severity of the sample's overall primary features as rated by clinicians, correlated strongly with SDQ scores for impact (0.57, p < 0.01) and total difficulties (0.47, p < 0.01). The conduct problems, hyperactivity, and emotional symptoms scales of the SDQ correlated strongly with clinical assessments of related diagnostic features, with coefficients ranging from 0.33 (p < 0.01) for emotional symptoms with internalizing disorders, to 0.51 (p < 0.01) for hyperactivity scores with hyperactivity diagnoses. As would be expected, SDQ ratings of peer problems correlated positively with both clinical diagnoses of conduct disorders (0.12, p < 0.05) and internalizing disorders features (0.14, p < 0.05) as well as the severity of any diagnostic features (0.28, p < 0.01).
Participant treatment status was examined to further evaluate the discriminant and predictive validity of the SDQ. ANOVA s revealed that children reported by parents to be currently receiving treatment for emotional/behavioural problems scored significantly higher on SDQ total difficulties (mean = 15.0, SD = 6.5), than children not receiving treatment (mean = 8.0, SD = 5.1), F 1,1398 = 171.21, p < 0.001).
Shown in Table 1 are the means and cut-offs for the 90th and 95th percentiles for each of the five SDQ subscales, and the total difficulties and impact scales. From the total sample (n = 1359), these cut-offs were produced separately for each gender, and broken into two age groups (4–6 years and 7–9 years). Due to the positive nature of the prosocial subscale, the cut-offs presented for this subscale represent scores at the 10th and 5th percentiles.
A number of similarities are evident between the current findings and those reported in Goodman's [4] nationwide UK epidemiological sample. For example, using the same criteria as Goodman [4], prevalence of diagnoses in the high-risk groups of each sample were comparable for emotional symptoms (UK: 20.5%, Australia: 17.5%) and ADHD (UK 17.5%; Australia: 20.8%). Also consistent with Goodman [4], odds ratios in the current sample identified the prosocial subscale as the SDQ subscale exhibiting the weakest association with DSM-IV diagnoses (UK: 3.4; Australia: 2.3).
One notable differences between the current study and that reported by Goodman [4] was the discrepancy between prevalence rates of ODD and CD in the conduct problems high risk groups within each sample (UK: 25.7%; Australia: 9.8%). This difference however, is better explained by differences in the overall prevalence rates of CD and ODD diagnoses in the respective samples (UK: 4.7%; Australia: 1.5%) than differences in the distribution of diagnoses between risk groups.
Conclusion
The aim of this study was to assess the basic psychometric properties of the parent-report SDQ with a large community sample of young Australian children (aged 4–9 years). Moderate to strong internal reliability and stability was exhibited across all SDQ subscales. Adequate validity was evidenced in the relationship of these scales to one another, while the pattern of correlations between the SDQ subscales, teacher-ratings and diagnostic interviews demonstrated sound external validity. This was demonstrated further by the relationship of SDQ total difficulties scores to concurrent treatment status.
While the original five-factor structure of the SDQ was generally confirmed, one conduct problems item (‘Generally obedient…’) loaded more strongly onto the prosocial subscale for both boys and girls. While this finding may seem conceptually confusing, it is consistent with previous research. Goodman [4], for example, reported factor loadings on the prosocial subscale for this item (among others), prompting the suggestion that the prosocial subscale might be described as a ‘positives’ factor. Studies outside the UK have also noted unexpected factor loadings for this item [5], [7]. The findings from the current study can therefore be seen as adding to existing evidence, questioning the utility of this item as an indicator of conduct problems in young children. While not yet evaluated, it is possible that this item would demonstrate a more unique relationship with conduct problems if it were negatively worded (e.g. ‘Generally disobedient, usually refuses/ignores what adults request’).
The following limitations should be noted for the current study. While teacher ratings of child behaviour and difficulties were obtained, the teacher version of the SDQ was not used for this purpose. Inclusion of this would have allowed for a comparison of cross-informant reports on the instrument. The present study nonetheless supports the utility of the parent report SDQ as a measure of psychopathology in young Australian children. As the properties of the teacher and youth-report SDQs remain unknown in Australian samples, further research into these forms of the instrument would represent an important contribution to the growing knowledge of the measure.
Footnotes
Acknowledgements
This research was supported by the National Medical Health and Research Council of Australia and Catholic Education of Queensland. Thanks to all the families who participated.
