Abstract
The Strengths and Difficulties Questionnaire (SDQ) is one of the most broadly used questionnaires to evaluate children’s psychological adjustment, however its internal structure has been a target of ongoing controversy. Recent studies suggested a three-factor structure of the SDQ, however data is still scarce. The present study used the Multitrait-Multimethod analysis to examine SDQ construct related-validity with three and five dimensions, provided by children, their parents and teachers. A total of 415 participants were recruited from a Portuguese community sample. Both SDQ versions presented good convergence-related validity, with higher values for the five version. Findings from this study suggest that the SDQ with three dimensions could be more suitable as a screening measure of children’s psychological adjustment in a community low-risk sample. Nevertheless, the SDQ still needs further psychometric improvements in order to properly collect information from multi-source samples about the prevalence of children’s psychological adjustment.
Keywords
The Strengths and Difficulties Questionnaire (SDQ) is one of the most broadly used brief questionnaires to evaluate psychological adjustment in children and adolescents aged between 2 and 17 years (Achenbach et al., 2008). The SDQ has been translated into more than 80 languages and can be completed by teachers and parents of children from 4 to 16 years old, and by children/adolescents from 11 to 16 years old. It is composed by 25 items that could be organized into five dimensions namely Emotional Problems, Peer Problems, Conduct Problems, Hyperactivity/Inattention and Prosocial Behaviours. Another SDQ version was further developed by Goodman and colleagues (2010) in which the SDQ could be organized into three dimensions, namely Externalizing, Internalizing and Prosocial Behaviours. In this version the Emotional Problems and Peer dimensions merge into a subscale named ‘Internalizing Problems’ and the conduct and Hyperactivity/Inattention dimensions merge into an ‘Externalizing Problems’ subscale. The two merged scales are more suitable to assess children’s psychosocial adjustment from community samples, whereas the five separate scales could provide more useful information in high-risk and/or clinical samples (Goodman et al., 2010).
Nevertheless, the SDQ internal structure has been a target of ongoing controversy. Several studies with children/adolescents, parents and teachers as independent informants have reported adequate support for the five-factor structure version (e.g. Stone et al., 2010), whilst other studies have either found only marginal support (e.g., Hill & Hughes, 2007) or no support for this SDQ version (e.g., Dickey & Blumberg, 2004). Some studies showed difficulties in confirming the SDQ five-factor structure and revealed poor construct validity (Pechorro et al., 2011). Other studies with SDQ five-factor structure and with three informants (self-report, teacher and parent) have found no evidence of SDQ discriminant-related validity (e.g. Marzocchi et al., 2004).
The three-factor SDQ structure has been supported in some exploratory analyses in the US (parents), Belgium (parents and children), and Finland (youth SDQ) (Dickey & Blumberg, 2004; Koskelainen et al., 2001; Van Leeuwen et al., 2006). In Portugal, Costa and colleagues’ (2020) study showed evidence of acceptable internal consistencies the three dimensions and evidence of discriminant-related validity, however poor convergent-related validity. This issue about the best psychometric version of SDQ should be investigated in detail.
Another enduring issue is whether different informants are truly indicating the same construct (Munkvold et al., 2009). A study with multinational sample of children aged between 6 to 11 years-old from seven European countries have found only moderate agreement between parent and teacher SDQ ratings (Cheng et al., 2018), whilst Hill and Hughes (2007) have found evidence for convergency among parent, teacher and peer ratings with first graders at risk of educational failure.
Construct validity is an important aspect of validity assessment of a measurement, and it can be evaluated through convergent- and discriminant-related validities (Byrne, 2010). Research about construct validity usually focuses on the extent to which data exhibit evidence of convergent-related validity (the extent to which different methods correspond in their assessment of the same trait), discriminant-related validity (the extent to which independent methods differ in their assessment of different traits) and method effects (an extension of the discriminant-related validity issue; Campbell & Fiske, 1959). Convergent- and discriminant-related validities can be more robustly evaluated by using the multitrait–multimethod (MTMM) method (Widaman, 1985).
Summary of Studies Assessing Strengths and Difficulties Questionnaire with a Multitrait-Multimethod Approach.
Despite useful findings from the aforementioned studies data is still inconsistent for the SDQ five-factor structure, and scarce to the three-factor version. The main purpose of this study is to assess construct validity for SDQ with three and five dimensions, provided by children, their parents and teachers, and within the framework of a multitrait-multimethod (MTMM) design. The specific goals are to evaluate the internal consistency, convergent- and discriminant-related validity of SDQ with three dimensions and five dimensions. The two versions of SDQ were explored since the three-dimension version could be more appropriate to evaluate psychological adjustment in low-risk community samples. Goodman (et al., 2010) also recommended the importance of using multiple approaches to assess construct validity in order to obtain a more and complete information about SDQ performance, however, until this date very few studies have explored SDQ construct-related validity with a sample of children and their respective parents and teachers simultaneously, within a MTMM design.
Method
Participants
This community sample comprised a total of 415 participants, including 136 children, 142 parents (96 mothers, 46 fathers) and 137 teachers. Inclusion criteria were having Portuguese nationality and having at least one child whose age was between 10 to 15 years. Children were in the 5th to 9th grade in primary school, their mean age was 12 years (SD = 1.709), and more than half were boys (54.4%). Parents’ ages ranged from 36 to 63 years old (M = 44.970; SD = 4.479) and the overwhelming majority was married, full-time employed, held a college degree, and lived in an urban area (for more details see Table Appendix A).
Measures
Before completing the study measures, parents were asked to complete a brief sociodemographic questionnaire which included individual questions (age, relationship with the children, marital status, educational level, professional status, partner’s educational level, partner’s professional status and residential area) and family characteristics (child’s age and gender, child relationship with the partner and household composition).
Strengths and Difficulties Questionnaire (SDQ)
Children, parents, and teachers completed the self-report, parent and teacher version, respectively, of the Portuguese version of Strengths and Difficulties Questionnaire (Fleitlich et al., 2005). SDQ assesses children’s psychosocial adjustment, relationships, emotions and behaviours. The SDQ is composed by 25 items and was originally created with five scales, namely, Emotional Problems [EP], Peer Problems [PP], Conduct Problems [CP], Hyperactivity [HY], and Prosocial Behaviours [PB]. Recent studies suggested a three scales version, as being more suitable for low-risk community samples and composed by Internalizing Problems [I] (combining the EP and the PP scales), Externalizing [E] (combining the CP and HY scales) and Prosocial Behaviours [PB]. Each dimension is scored on a three-point scale (0 = Not true; 1 = Somewhat true; 2 = Certainly True) and ranging from 0 to 10, and total difficulties score ranging from 0 to 40. The reversed scores were performed for items 7, 11, 14, 21 e 25. Internal consistency of SDQ version will be reported in the results section.
Procedures
Participants were recruited through non-probabilistic intentional sampling in private schools, post-class educational centres and soccer learning centres from the Lisbon metropolitan area. Participants completed the questionnaires via Paper and Pencil (P&P) or online, as some schools and parents requested an online version. During the sample recruitment, some schools and learning centres have only returned the parents’ questionnaires (n = 55) and therefore those participants were excluded from the present study, since the purpose was to match children with their respective parent and teacher. Data from parents that did not respond to the sociodemographic measures that allowed us to make the match with the children and teacher were also excluded from the study. In some cases, it was not possible to recruit the three informants (14 participants excluded), however the questionnaires from the two informants were included. The final sample was comprised of 129 triads (children and their respective parents and teachers) and 14 doubles (6 Children and their parents; 1 Child and their teacher; 7 Parents and Teachers of the same children). Sample recruitment occurred between November 2018 and September 2019. Following the Declaration of Helsinki, all participants were given the option to elucidate any questions related to study’s content and procedures. All participants have signed a consent form prior to their participation. The study was approved by *****’s Ethics Committee (approval number D/001/03/2018).
Data analysis
Sample size for the present study has been determined according to Saris and Galhofer (2014). The authors mentioned that the sample size of a three-group design should be higher than 300 in order to obtain design efficiency, regardless the method variance. Descriptive statistics were calculated for all items of the SDQ using SPSS (v. 25, SPSS Inc. Chicago, IL). Items’ sensitivity was evaluated through Skewness (Sk) and Kurtosis (Ku) analysis. Absolut values of |Sk| and |Ku| greater than three and seven, respectively, were considered as a severe violation of the normality assumption (Marôco, 2014). SDQ internal consistency was evaluated through standard Cronbach’s alpha coefficient (Marôco, 2014).
In order to test for evidence of SDQ construct-related validity, convergent- and discriminant-related validity were tested within the framework of a multitrait-multimethod (MTMM) design by which multiple traits (SDQ dimensions) were measured by multiple methods (children, parents and teachers). The MTMM analysis was conducted for both matrix level and individual parameter level and were performed with the SDQ constituted by three dimension and by five dimensions. The following steps were performed according do Byrne’s guidelines (2010) and all MTMM analysis were conducted using AMOS program (v. 18, SPSS Inc. Chicago, IL). Four models were created using SDQ dimensions as traits and the three informants (children, parents and teachers) as methods SDQ 3 dimensions hypothesized MTMM general CFA model (Model 1: freely correlated traits; freely correlated methods). SDQ 3 dimensions MTMM Model 2 (no traits; correlated methods). SDQ 3 dimensions MTMM model 3 (perfectly correlated traits; freely correlated methods). SDQ 3 dimensions MTMM model 4 (freely correlated traits; uncorrelated methods).



MTMM matrix-level analyses
The goodness-of-fit indices (χ2 and CFI) of Model 1 were compared with the goodness-of-fit indices of the other MTMM Models to evaluate the existence of evidence of convergent- and discriminant-related validity on a matrix level. To determine convergent-related validity, or the extent that independent measures of the same trait are correlated (e.g., parent rated and self-rated prosocial behaviours), comparisons between Model 1 and Model 2 were performed using the difference in CFI and χ2 values. A significant difference in χ2 (Δχ2) and in CFI (ΔCFI) and a ΔCFI greater to .01 provides evidence of convergent-related validity (Byrne, 2010). Discriminant-related validity was evaluated for traits and methods. To examine traits’ discriminant-related validity (the extent to which independent measures of different traits are correlated), Model 1 and Model 3 were compared. A large Δχ2 and/or a substantial ΔCFI provides support for discriminant-related validity. To examine methods’ discriminant-related validity, Model 1 and Model 4 were compared. A small Δχ2 and/or a small ΔCFI indicates discriminant-related validity.
MTMM Parameter-level analyses
A more precise evaluation of trait- and method-related variance can be determined by examining individual parameter estimates of the factor loadings and factor correlations of the hypothesized model (Model 1). Convergent-related validity was assessed through the factor loadings. The magnitude of the trait loadings reflects the convergent-related validity and an overall comparison of trait and method loadings reveals the proportion of method variance that may exceed the trait variance. If this proportion is significant, convergent-related validity could be weakened. Discriminant-related validity bearing on specific traits and methods is determined by examining the factor correlation matrices. In order to suggest discriminant-related validity, the correlations between traits should be negligible. Regarding method’s factor correlations, their discriminability is related to the extent to which they are maximally dissimilar (correlations should be also negligible).
The mentioned analysis will be performed with SDQ with three dimensions and SDQ with five dimensions.
Results
Missing data
There were 84 scores randomly missing from different participants within a total of 8715 scores. The missing value were approximately 1%. Expectation maximization (EM) was used to impute the missing data.
Descriptive statistics of strengths and difficulties questionnaire individual items
Children descriptive statistics for the SDQ items indicated that the three-point Likert scale, with answers ranging from zero to two, was used for almost all items, except for items 12 (“I fight a lot”) and 17 (“I am kind to younger children”). The answers “certainly true” and “Not true” were not used for any children in items 12 and 17, respectively. Skewness (.095< |Sk|< 2.762) and Kurtosis (.343 < |K| < 6.696) values did not show evidence of violations of the normal distribution (Kline, 2016). Average score for SDQ items ranged from .10 (SD = .305) and to 1.83 (SD = .376). Parents descriptive statistics for SDQ items indicated that the three-point Likert scale was used for almost all items, except for item 1 (“Considerate of other people’s feelings”). The answer “Not true” was not used for any parent to respond to item 1. Parents SDQ item’s distribution did not present acceptable skewness and Kurtosis for items 12 (|Sk| = 6.820; |K| = 49.904), 17 (|Sk| = 4.223; |K| = 18.867), and 22 (|Sk| = 6.765; |K| = 49.319). The other items presented acceptable skewness (.018< |Sk|< 2.470) and kurtosis (.042 < |K| < 5.522) values (Kline, 2016). The average score for SDQ items ranged from .04 (SD = .222) and to 1.93 (SD = .285). Teachers’ descriptive statistics for SDQ items indicated that the three-point Likert scale was used for all items. Teachers’ SDQ items distribution did not present acceptable skewness and Kurtosis for items 11 (“Has at least one good friend”) (|Sk| = 3.298; |K| = 49.904), and 22 (“Steals from home, school or elsewhere”) (|Sk| = 3.813; |K| = 14.854). The other items presented acceptable skewness (.296< |Sk|< 2.208) and Kurtosis (.026 < |K| < 4.187) values (Kline, 2016). The average score for SDQ items ranged from .09 (SD = .340) and to 1.71 (SD = .513).
Strengths and difficulties questionnaire with three dimensions
Internal consistency
Teacher’s ratings presented the highest internal consistencies, especially for E and PB (αInternalizing = .798; αExternalizing = .857; αProsocial = .803). Parents’ ratings presented low internal consistencies for all dimensions (αInternalizing = .732; αExternalizing = .773; αProsocial = .653). Children’ ratings presented the lowest internal consistencies with low values for I and E (αInternalizing = .675; αExternalizing = .694), and very low (unacceptable) internal consistency for PB (αProsocial = .579).
Convergent and discriminant-related validity: MTMM matrix-level analyses
Summary of Goodness-of-Fit Indices for SDQ MTMM Models.
aRespecified model with an equality constraint imposed between E7, E8 and E9.
In order to evaluate convergent- and discriminant-related validity at a matrix level, a summary of comparisons between Model 1 with Models 2, 3 and 4 were performed. The Δχ2 is highly significant (χ2(12) = 95.782, p < .001), and the difference in practical fit (ΔCFI = .348) is significantly above .01, which suggested strong evidence of convergent-related validity. The comparison between Model 1 and Model 3 yields a Δχ2 value that is statistically significant (χ2(3) = 27.308, p < .001) and the difference in practical fit was fairly large (ΔCFI = .101), suggesting modest evidence of discriminant-related validity for SDQ traits. The comparison between Model 1 and Model 4 yield a Δχ2 value that was small and statistically significant (χ2 (3) = 31.59, p < .001) and the difference in practical fit was also small (ΔCFI = .024), which indicates evidence of good discriminant-related validity for the methods.
Convergent and discriminant-related validity: MTMM parameter-level analyses
Trait and Method Loadings for MTMM Model 1 (Correlated Traits; Correlated Methods). a
aStandardized estimates.
bNot statistically significant (p = .826).
cNot statistically significant (p = . 218).
dNot statistically significant (p = .144).
Trait and Method Correlations for MTMM Model 1 (Correlated Traits; correlated Methods). a
aStandardized estimates.
bNot statistically significant (p = .847).
Strengths and difficulties questionnaire with five dimensions
Internal consistency
Teacher’s ratings presented the highest internal consistencies, especially for the HY and PB (αEmotional = .735; αConduct = .651; αHyperactivity = .865; αPeer = .684; αProsocial = .803). Parents’ ratings presented good internal consistency for HY (αHyperactivity = .770), low internal consistency for EP, PP and PB (αEmotional = .646; αPeer = .635; αProsocial = .653), and very low internal consistency for CP (αConduct = .545). Children’ ratings presented the lowest internal consistency, with unacceptable values of internal consistency for CP, very low values for EP, PB and PP (αEmotional = .574; αConduct = .452; αPeer = .643; αProsocial = .579). HY presented low values of internal consistency (αHyperactivity = .706).
Convergent and discriminant-related validity: MTMM matrix-level analyses
Summary of Goodness-of-Fit Indices for SDQ.
Regarding convergent- and discriminant-related validity, comparisons between Model 1 with Models 2, 3 and 4 were performed. For Model 1 and Model 2 the Δχ2 was highly significant (χ2(25) = 169.297, p < .001), and the difference in practical fit (ΔCFI = .269) was significantly above .01, which suggests strong evidence of convergent-related validity. For Model 1 and Model 3 the comparison yields a Δχ2 value that is statistically significant (χ2 (10) = 68.363, p < .001) and the difference in practical fit was large (ΔCFI = .109), suggesting modest evidence of discriminant-related validity for the traits. For Model 1 with Model 4 the comparison yields a small and significant Δχ2 value (χ2 (3) = 10.974, p = .010) and also a small difference in practical fit (ΔCFI = 0.015) which argues for evidence of good discriminant-related validity for the methods.
Convergent and discriminant-related validity: MTMM parameter-level analyses
Trait and Method Loadings for SDQ 5 dimensions MTMM Model 1 (Correlated Traits; Correlated Methods). a
aStandardized estimates.
bNot statistically significant (p = 2.49).
cNot statistically significant (p = .171).
Trait and Method Correlations for SDQ 5 dimensions MTMM Model 1 (Correlated Traits; correlated Methods). a
aStandardized estimates.
bNot statistically significant (p = 1.59).
cNot statistically; significant (p = .452).
dNot statistically significant (p = .186).
eNot statistically significant (p = .572).
fNot statistically significant (p = .732).
Discussion
The present study examined construct-related validity of the three and five dimensions of the SDQ in a sample of children (10–15 years), parents and teachers, within the framework of a multitrait-multimethod (MTMM) design. The first goal was to explore evidence for internal consistency, convergent- and discriminant-related validity of SDQ with three dimensions. Regarding the SDQ dimension’s internal consistency the main concern was with the low to very low self-ratings. Findings suggested that for this sample the SDQ is more reliable for the parents and teachers, rather than the children. Similarly, an Italian study (Di Riso et al., 2010) with children (8–10 years) reported unacceptable to good reliability values for the SDQ with three dimensions on the self-reports. Probably for other age range this could occur differently. The convergent- and discriminant-related validity were moderate mainly due to parents’ and children’ ratings on prosocial behaviours. Prosocial behaviours also seemed to interfere with discriminant-related validity, especially for parents’ and teachers’ ratings. Issues with this dimension were already reported in other studies (e.g. Palmieri & Smith, 2007) and were expected by Goodman et al., (2010). The items of prosocial behaviours are rated in another format response and are interspersed with the items of the other dimensions (e.g. Achenbach et al., 2008).
Another goal was to explore evidence for internal consistency, convergent- and discriminant-related validity of SDQ with five dimensions. Internal consistency values were unacceptable for parents’ ratings and self-ratings. For teachers’ ratings the values were very good, except for the conduct problems dimension. Since the children of this study belong to a community sample problematic behaviours could not exist, or certain problems could not be identified because they could not have the enhanced self-reflection about their own behaviours. Teachers probably could have more privileged access to children’s behaviours in their multiple life areas comparatively to parents that usually observe them in their household and family moments. Therefore, the SDQ could be faithfully reflecting different children’s behaviours observed in different contexts from different informants’ perspectives.
The findings revealed a good convergent-related validity and low discriminant-related validity. Similarly with (Gomez, 2014), conduct problems and hyperactivity/inattention contributed to the decreasing of the discriminant-related validity. Emotional and peers' problems also contributed to this decreasing suggesting that these dimensions are confounded with each other possibly because is not expected for these children to manifest disruptive behaviours (lying or stealing) nor clinical emotional problems.
In sum, although the reliability of the SDQ with three dimensions in not satisfactory for all informants the values of internal consistency were more suitable for this sample when compared to the SDQ with five dimensions since it presented unacceptable values in self-ratings and parents’ ratings. Also, when SDQ traits are combined into higher order dimensions it helps improve discriminant-related validity indicating that the SDQ with three dimensions could be more appropriate to evaluate psychological adjustment in non-clinical samples. Goodman et al. (2010) also found higher values of convergent and discriminant-related validities for internalizing and externalizing problems, than for the five-structure SDQ version. These dimensions have the advantage to reduce measurement error since they have a greater number of items. These findings seem to corroborate the literature that refers that the three-factor structure SDQ could be more suitable for community samples with minimal risk problematics and also more appropriate as explanatory or outcome variables in epidemiological studies (Costa et al., 2020; Goodman et al., 2010). Nevertheless, studies should further investigate whether these internal consistencies are due to cultural differences in parental and youth perceptions about the latent constructs. With the appropriate psychometrical modifications, SDQ with three dimensions could be used to collect information from multi-source samples about the prevalence of children’s psychological adjustment and with the purpose of examine the need for prevention and intervention programs (Hill and Hughes, 2007).
Results should be considered in light of study limitations. The recruitment procedures and families’ characteristics could influence the findings from this study, since it is not possible do determine the representativeness of our sample regarding the Portuguese population. Future studies should aim to recruit more diverse samples. Also, future studies should explore SDQ psychometric qualities in both clinical and community low-risk sample to confirm if SDQ is indicated to attribute clinical diagnostics and/or to be a screening instrument. SDQ external criterion should be posteriorly accessed. As far as we know, this was the first study to assess simultaneously both SDQ versions regarding their construct-related validity through an MTMM design and to explore Goodman and colleague’s hypothesis about the SDQ most suitable version for low-risk samples.
Footnotes
Authors contributions
FC recruited the sample and performed all the statistical analyses and wrote all article sections. PC contributed to the study design, analysis plan, and reviewed the article. IL contributed to the study design and reviewed the article. All authors reviewed the manuscript and contributed to it in a meaningful way.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Following the Declaration of Helsinki, all participants were given the option to elucidate any questions related to study’s content and procedures. All participants have signed a consent form prior to their participation.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by one grant (SFRH/BD/118226/2016) from Science and Technology Foundation (FCT) awarded to the first author.
Ethical approval
The study was approved by ISPA University Institute’s Ethics Committee (approval number D/001/03/2018).
Appendix
Characteristics of the Caregiver’s Sample (n = 142).
| Frequency | % | |
|---|---|---|
| Relationship Status | ||
| Married | 107 | 75.4 |
| In a relationship | 4 | 2.8 |
| Single | 9 | 6.3 |
| Divorced | 18 | 12.7 |
| Widowed | 1 | .7 |
| Educational level | ||
| High school or less | 40 | 28.1 |
| At least a college degree | 99 | 69.7 |
| Partner’s educational level | ||
| No partner | 22 | 15.5 |
| High school or less | 35 | 24.6 |
| At least a college degree | 76 | 53.5 |
| Professional status | ||
| Full-time employment | 120 | 84.5 |
| Part-time employment | 2 | 1.4 |
| Freelance/Independent work | 6 | 4.2 |
| Unemployment | 6 | 4.2 |
| Retirement | 1 | .7 |
| Partner’s professional status | ||
| No partner | 22 | 15.5 |
| Full-time | 101 | 71.1 |
| Part-time | 5 | 3.5 |
| Unemployment | 5 | 3.5 |
| Residential area | ||
| Urban/big city | 81 | 57.0 |
| Urban/suburbs of a big city | 46 | 32.4 |
| Semi-urban/small city | 3 | 2.1 |
| Rural | 7 | 4.9 |
| Number of people in the household | ||
| 1 | 5 | 3.5 |
| 2 | 8 | 5.6 |
| 3 | 41 | 28.9 |
| 4 | 59 | 41.5 |
| 5 | 17 | 12.0 |
| 6 | 4 | 2.8 |
| 7 | 2 | 1.4 |
