Abstract
An important question for clinical researchers is why the positive findings from controlled trials of treatment do not always generalize to the ‘real world’ setting. In the case of attention-deficit/hyperactivity disorder (ADHD) there can be little doubt that psychostimulant medication given in adequate doses leads to a statistically significant improvement in the short to medium term on a range of outcome measures [1]. Whether these improvements are of clinical importance is less certain. Most placebo-controlled trials of psychostimulant medication have adopted a cross-over design, and have used continuous as opposed to dichotomous measures of improvement. The use of continuous outcome measures is congruent with the view that ADHD is better conceptualized as a dimensional rather than a categorical disorder. However, continuous outcome data are more difficult to interpret with respect to clinical efficacy than are dichotomous outcome data. For example, pooled effect size estimates for the benefit of psychostimulant medication compared with placebo derived from metaanalyses range from 0.75 to 0.90 standard deviations [1]. Such a magnitude of treatment effect is within the range usually considered to be moderate to large according to the criteria established by Cohen [2], but determining the clinical significance of any given effect size is, essentially, a clinical matter. We may consider quite small effect sizes to be clinically important if the condition being treated is severe and life-threatening, while we may require very large effect sizes to be convinced of clinical importance when a condition leads to only mild or moderate disability. Effect size estimates are more clinically meaningful if we relate the standard deviation of a sample on a given scale to measurement units. For example, the standard deviation of baseline scores on a Conners Parent Rating Scale that can range from 0 (‘not at all’) to 3 (‘very much’) may be 0.50 units. A treatment effect of 0.80 standard deviations therefore translates to an average shift of 0.40 units on the scale.
An alternative metric for determining the magnitude of treatment effect is percentage change in symptoms from the baseline or placebo condition. In a review of selected randomized controlled trials of methylphenidate, all of which yielded a statistically significant treatment effect, we found that percentage improvement on various versions of the Conners Parent Rating Scale ranged from 24% [3] to 75% [4]. Some researchers convert percentage improvement into a dichotomous outcome by nominating a specific cut point to indicate clinical improvement. Elia et al. [5] nominated 20%, while Varley and Trupin [6] nominated 25% reduction in Conners Parent and Teacher Rating Scales scores as an indication of clinical improvement in response to psychostimulant medication. While easier to interpret than continuous outcome data, such cutoff criteria are, again, somewhat arbitrary. A variant of this approach is to report the number of treated subjects whose scores on outcome ratings ‘normalize’, that is, achieve the threshold score defining the normal population's range of functioning [7], [8].
The aim of the present study was to compare the desired and actual reduction in Conners Parent Rating Scale scores from baseline to follow up in a naturalistic sample of children and adolescents who had been treated with psychostimulant medication. We sought to reference these changes in ratings against global impressions of treatment benefit (categorized into Low, Moderate and High), and to estimate the magnitude of treatment effect associated with each category of improvement. The research design permitted exploration of a second question concerning the relationship between treatment expectation and estimates of treatment benefit.
Method
Sample
Participants were a stratified subsample of parents of children and adolescents under the age of 19 years resident in the Hunter region of New South Wales (defined by postcode), Australia, who had previously responded to a postal survey concerning various aspects of the child's management with psychostimulant medication [9], [10]. In the context of the study parents had been asked their global impression of treatment benefit (‘Overall, how much has the use of medication improved life for your child?’) rated on a 5-point scale. A random selection of 50 parents reporting ‘low’ treatment benefit (‘not at all’ or ‘a little’), 50 parents reporting ‘moderate’ treatment benefit (‘moderately’) and 50 parents reporting ‘high’ treatment benefit (‘considerably’ or ‘extremely’) were invited to participate in the study. As the original sample was skewed in favour of positive treatment benefit we effectively oversampled for children with poor treatment response. The time on treatment could not be reliably estimated owing to discontinuities in treatment for some children. However, treatment had been initiated on average about 2.5 years prior to the survey [9]. Many children had received adjunctive treatment, including other pharmacotherapies [9].
Procedure
As part of a study concerning qualitative aspects of treatment with psychostimulant medication (methylphenidate or dexamphetamine) which comprised a telephone interview and postal survey, parents were asked to complete and return an abbreviated version of the 48-item Conners Parent Rating Scale [11] which included only the items making up the Conduct Problems, Impulsivity-Hyperactivity, Hyperactive Index and Learning Problems subscales. Each item on the questionnaire is scored on a 4-point scale ranging from ‘not at all’ (score = 0) to ‘very much’ (score = 3). Parents were asked to respond to the questionnaire three times, reporting: (i) their child's symptoms prior to the introduction of psychostimulant medication; (ii) the level of symptoms they desired or hoped would be achieved following treatment; and (iii) their child's current level of symptoms.
Data analyses
Subscale scores for the Conners questionnaire were derived by dividing the aggregate score for each scale by the number of scale items, yielding a range from zero to three. The effect of psychostimulant treatment was estimated by calculating the difference between the subscale scores for ‘current’ functioning, and for functioning prior to the introduction of treatment (C–A). The expectation of treatment effect was estimated by calculating the difference between the subscale scores for ‘desired’ functioning, and for functioning prior to the introduction of treatment (B–A). Mean percentage change was calculated by dividing the mean difference score by the mean baseline score and multiplying by one hundred. Effect sizes for treatment effect and treatment expectation were estimated by dividing the mean difference scores by the grand standard deviation for the relevant variable. Between-group comparisons were conducted using Pearson Chi-square analyses for categorical variables and one-way analysis of variance (ANOVA), with Scheffé follow-up tests for continuous variables. A Bonferroni correction was applied to adjust for multiple analyses. The cut point for determining statistical significance was set at p < 0.0125, while the cut point for determining a statistically non-significant trend was set at p < 0.025. Data were analysed using BMDP statistical software [11].
Ethics
The study was conducted with the approval of the Human Research Ethics Committee, University of Newcastle, and the Hunter Area (Health Service) Research Ethics Committee.
Results
Response rates and characteristics of the sample are summarized in Table 1. Only gender of the child was statistically different across the three groups (χ2 (df = 2) = 10.13, p < 0.01). Gender was included as a covariate in all subsequent analyses. Actual and desired treatment effect reflected in change scores on the four Conners subscales are presented in Table 2. There was greater disparity between actual and desired treatment effect for the ‘low’ treatment benefit group than for the ‘high’ treatment benefit group. Actual improvement in conduct problems scores ranged from 26% (0.73 standardized effect size units [ES]) for the low group to 52% (ES = 1.44) for the high group, while there were no statistically significant differences in desired improvement for each of the groups (low 56% [ES = 1.61], moderate 52% [ES = 1.33], high 51% [ES = 1.48]). Actual improvement in impulsivityhyperactivity scores ranged from 25% (ES = 0.73) for the Low group to 52% (ES = 1.60) for the high group, while there were no statistically significant differences in desired improvement for each of the groups (low 53% [ES = 1.71], moderate 53% [ES = 1.73], high 50% [ES = 1.69]). Actual improvement in hyperactive index scores ranged from 28% (ES = 1.01) for the low group to 56% (ES = 2.11) for the high group, while there were no statistically significant differences in desired improvement for each of the groups (low 55% [ES = 2.06], moderate 55% [ES = 1.97], high 52% [ES = 2.05]). Actual improvement in learning problems scores ranged from 25% (ES = 1.16) for the low group to 54% (ES = 2.56) for the High group, while there were no statistically significant differences in desired improvement for each of the groups (low 56% [ES = 2.74], moderate 53% [ES = 2.48], high 54% [ES = 2.74]).
Characteristics of the sample
Actual and desired reduction in Conners subscale scores† following psychostimulant treatment
∗p < 0.001; †Expressed as units (maximum = 3), with higher scores indicating a greater reduction; ‡statistically non-significant trend.
Discussion
Parent reports of percentage of actual improvement on each of the four Conners subscales equalled or exceeded the cut points nominated in the studies of Elia et al. [5] and Varley and Trupin [6] even for the parents selected on the basis of reporting little or no benefit of psychostimulant treatment. This would suggest that percentage cut points used in previous research to indicate clinical improvement are too low, and could therefore confound estimates of treatment benefit. An inappropriately low threshold may lead to a false conclusion of treatment benefit if significantly more subjects on active treatment reach threshold than controls, even though the improvement is not of sufficient magnitude to be considered of clinical benefit. Conversely, treatment benefit may be obscured if a substantial number of placebo-treated individuals reach the low threshold, even though the magnitude of improvement is, on average, much greater in the actively treated subjects. In the present study mean improvement in the high group was 50% or more, and was congruent with the magnitude of improvement desired by the parents. The findings suggest that percentage cut points used for the Conners rating scales to indicate clinical improvement should, in future research, be adjusted upward.
We tested the impact of raising the threshold for clinical improvement in a study examining the benefit of augmenting psychostimulant treatment with clonidine for the management of hyperactive and aggressive symptoms in children with ADHD [12]. Using a criterion for improvement of a 25% reduction in scores from baseline significantly more children treated with clonidine than with placebo improved on both the conduct problems and hyperactive index subscales of the Conners Parent Rating Scale. However, when a 38% cut point was applied to the conduct problems subscale, and a 43% cut point for the hyperactive index subscale (representing mean percentage improvement for those in the moderate improvement group in the present study) statistically significant differences in response rate between the clonidine and placebo-treated groups persisted only on the conduct problems subscale. The finding makes intuitive sense as the sample was already treated with a psychostimulant, and it would be surprising if the addition of clonidine had a substantive impact on hyperactive symptoms.
Effect size estimates in the present study were also somewhat higher than those reported in previous metaanalyses of psychostimulant treatment benefit. The estimates are not directly comparable, however, because the effect sizes calculated in the present study were for change from baseline while the meta-analyses focused on differences between the placebo condition and active treatment.
Limitations
The important limitation to the present study was that the data were obtained retrospectively, and may therefore be subject to recall bias. In addition, since length of treatment was heterogenous, there was variability in the period over which parents were asked to recall the child's baseline functioning. We were able, however, to demonstrate concurrent validity for the measure of treatment effect as it conformed with the global rating of improvement. As most subjects in the present study had been receiving treatment for some time, maturation effects may have led to some improvement in Conners Parent Rating Scale scores independent of any treatment effect.
Clinical implications
In the present study the reported changes on the Conners Parent Rating Scale were consistent with the global ratings of improvement. In contrast, there was no difference in the magnitude of treatment effect desired by the parents from the low, moderate and high groups. The finding suggest that expectation of treatment benefit is unlikely to contribute to variation in treatment response. At least for ADHD the threshold for improvement in treatment studies may have been set too low, and could account for the lack of generalizability of research findings to the clinical setting. Treatment researchers should establish and use clinically meaningful, as opposed to arbitrary, criteria for improvement.
Footnotes
Acknowledgements
The research was supported by the New South Wales Health Department as a Health Outcomes Project.
