Abstract
Attention-deficit/hyperactivity disorder (ADHD) is a neurodevelopmental disorder characterized by developmentally inappropriate and impairing inattention, hyperactivity, and impulsivity, mostly seen and diagnosed in children and adolescents.1,2 According to the most recent meta-analysis of Polanczyk et al. 3 of 41 studies in 27 countries from every world region, the prevalence of ADHD in children and adolescents is estimated around 3.4%. Academic failure, poor self-esteem, troublesome peer and family relationships, substance abuse, and delinquent behavior are associated with ADHD, and patients are often diagnosed with one or more co-occurring psychiatric disorders.1,2,4,5 The majority of children and adolescents diagnosed continue to have impairing symptoms into adulthood.1,2,5 The negative impact of ADHD within and beyond the health system during childhood and the long-lasting impact into adulthood result in significant long-term personal and societal costs.6–9
The major treatments to mitigate the related (economic) burden of ADHD are medication management and behavioral treatment, alone or in combination. 10 Jensen et al. evaluated the cost-effectiveness of these treatments using data from the National Institute of Mental Health’s (NIMH) Multimodal Treatment Study of Children with ADHD (MTA study), 11 in which 579 children with ADHD were assigned to 14 months of controlled medication management, behavioral treatment, the combination of medication management, and behavioral treatment (also referred to as combined treatment), or routine community care (control condition).10–12 The cost-effectiveness of medication management proved superior to behavioral treatment at the end of the 14-month trial. 11 The combined treatment is less superior than medication management due to the considerable increase in costs associated with behavioral treatment. 11 We build on this study by focusing on the cost-effectiveness beyond the trial period of the MTA study. Hence, a decision model was developed to evaluate the 10-year cost-effectiveness of the treatments of the MTA study.
Contrary to previous Markov models for ADHD treatment evaluation,13–16 we propose a different model structure. First, ADHD is a chronic condition that makes a full remission state unlikely. Second, the level of ADHD symptoms tend to be highly persistent over time.17,18 Thus, decision models with diseases states based on ADHD symptoms end up with extremely low or high transition probabilities, which limits the applicability of the model. Third, treatments for ADHD are mostly evaluated in terms of symptomatic outcomes, while the (economic) burden of ADHD often extends to society at large.6–9 For example, antisocial behaviors and delinquency associated with ADHD result in significant costs for society.19–21 Specifically, D’Amico et al. demonstrated that conduct disorders in childhood are associated with a two- to threefold increase in early adulthood costs, mainly driven by criminal acts and judicial contacts. 22 Therefore, we defined delinquency states for our decision model. Importantly, delinquency is a distinct indicator for children’s behavior and partaking in society. Also, robust correlations between delinquency and the level of ADHD symptoms are found in the literature.23–25 Fourth, we followed common practice and based the extrapolation on data within the trial period of the MTA study. Subsequently, contrarily to previous studies, we used follow-up data to assess the accuracy of the model’s long-term prediction to ensure reliability of modeling estimates in future economic evaluation. Finally, the previous Markov models for ADHD treatment evaluation consider discrete time periods.13–16 Consequently, changes in states occur only at the beginning or end of predefined time intervals. 26 We have relaxed this assumption to build our model in continuous time. This relaxation was previously shown to result in more accurate estimates. 27
Methods
NIMH’s Multimodal Treatment Study of Children With ADHD (MTA Study)
For this study, we used data from the MTA study, a multi-site randomized controlled trial that was conducted in the United States and was designed to evaluate the major forms of ADHD treatment.10,12 Children had been randomly assigned to one of the three active treatments—medication management, behavioral treatment, or the combination thereof (hereafter combined treatment)—or routine community care. Routine community care is the control condition and reflects the nature of less intensive (and less costly) community-delivered treatment. The MTA study involved 14 months of controlled treatment in 579 children with ADHD, aged 7 to 10 years, with naturalistic follow-ups for up to 16 years after the end of the trial period. Follow-up assessments were carried out during childhood (2 and 3 years after baseline), (late-)adolescence (6, 8, and 10 years after baseline), and adulthood (12, 14, and 16 years after baseline). We used the follow-up data of childhood and late-adolescence periods, since our modeled outcome variable (delinquency) was not assessed in adulthood. Summary statistics of the baseline characteristics age, gender, comorbidity, intelligence, ethnic background, and occupation-based socioeconomic family status are presented in Table 1.
Baseline Characteristics of the Dropped and Selected Sample a
IQ, intelligence quotient; SD, standard deviation; WISC, Wechsler Intelligence Scale for Children.
Inference: * indicate significant differences at the 5%/1% level based on the mean differences of the two samples, assessed with a t test.
Comorbidity is a dummy variable that equals 1 for the presence of anxiety and/or depression. Intelligence is the child’s total intelligence quotient (IQ) measured with the Wechsler Intelligence Scale for Children–III (WISC-III). Ethnic background is a dummy variable that equals 1 for children from a non-Caucasian background. Finally, occupation-based socioeconomic family status is a dummy variable that equals 1 for children from a high socioeconomic family status. Further details on the four treatment modes of the MTA study and other baseline characteristics are available in previous publications.10,12,28,29 All study procedures had been approved by institutional review boards and were carried out in accordance with the Declaration of Helsinki. Participants and parents were informed of the procedures and provided written informed consent. 10
Delinquency
In this study, a six-point scale on delinquency was used as primary outcome variable, 29 coded ordinally from two parent-report measures, the DISC-IV-CD Module and the Parent DSM-IV Aggression and Conduct Disorder Rating Scale, 30 and two self-report measures. Specifically, the Self-Reported Antisocial Behavior questionnaire 31 at the 2-year assessment and the Self-Reported Delinquency questionnaire 32 at the 3-year assessment. By using all available procedures participants were assigned (retrospectively) a delinquency classification code at each assessment point.33–36 The coding scheme of the Pittsburg Youth Study was used to contribute items to each code.35,36
The delinquency scale was then categorized as follows: 0 = no delinquency; 1 = minor delinquency only at home (e.g., theft of less than $5 or vandalism); 2 = minor delinquency outside of the home (e.g., vandalism, cheating someone, shoplifting less than $5); 3 = moderately serious delinquency (e.g., vandalism, theft of $5 or more, carrying a weapon); 4 = serious delinquency (e.g., breaking and entering, drug selling, attacking someone with the intent to seriously hurt or kill, rape); 5 = engagement in two or more different level 4 offences. This variable was assessed at baseline, after the 14-month trial period, and at the follow-up assessments after 2, 3, 6, and 8 years.
Markov Model
To predict the trajectories of delinquent behavior during adolescence in relation to the four treatment modes of the MTA study, we developed a continuous-time Markov model 26 based on three delinquency states (Figure 1): no delinquency (state 1), minor to moderate delinquency (state 2), and serious delinquency (state 3). The states were discerned based on the delinquency scale mentioned above, in which a 0 score was considered no delinquency, 1 to 3 scores minor to moderate delinquency, and scores 4 and 5 were considered serious delinquency.

Schematic Representation of the Markov Model.
The specification, parameter estimation, and evaluation of this model were conducted using vertical modeling formulation 37 based on a previously suggested framework. 38 Briefly, this includes the specification of the Markov process by means of two main parameters: 1) sojourn time distributions and 2) the probabilities of the next state visited (also referred to as the future state probabilities). The estimation of the parameters are subsequently conducted in which the treatment indicator can easily be incorporated in the corresponding parametric survival and multinomial regression models. Finally, the modeled outcomes can be evaluated using Monte-Carlo simulation of the whole procedure.
To assess the cost-effectiveness of the treatments through prevention of serious delinquent behavior, we considered serious delinquency to be an absorbing state. This assumption does not allow participants to either make transition out of this state (i.e., decrease the delinquency level after entering the serious delinquency state) or start in this state (i.e., enrolling in this study with delinquency levels 4 or 5). Consequently, within the total sample size of 579 children, our Markov model was built based on the delinquency data from 448 children. The reason for exclusion of these 131 children was that they either already had a 4 or 5 delinquency score when enrolling in the study or follow-up data was missing.
Exponential survival models were subsequently used to estimate the cumulative distribution functions of the sojourn time in both states 1 and 2
We developed our model based on the individual trajectory of delinquency seriousness within the trial period, as we use the 10-year follow-up data to validate accuracy of long-term prediction. The outcome of our model is the average time of not reaching the serious delinquency state, defined the same as the life-years (LYs) of serious delinquent behavior prevented. Model performance was subsequently internally validated by comparing the predicted (average across 100,000 simulation runs) probability of serious delinquent behavior prevented (based on Kaplan-Meier estimate) to the observed empirical survival curves with the same outcome variable. 39 The simulated results reflected a 10-year time horizon. The available follow-up data in the MTA study enables the unique opportunity to internally validate the modeling prediction at 10-year follow-up for the four different treatments.
Economic Evaluation
We included treatment costs in which the following three components were taken into account: medication cost, visit cost for teachers and aides, and cost of psychiatrist, psychologist, and pediatrician. Per treatment group, we converted the longitudinal costs into daily costs. The respective resulting daily costs were $0.52 for routine community care, $0.62 for medication management, $3.18 for behavioral treatment, and $3.53 for the combined treatment. Total treatment costs were calculated by multiplying the LYs of serious delinquent behavior prevented and the daily costs. The annual rate of discounting was set at 3% for both cost and effectiveness outcomes. 40 We compared the cost-effectiveness outcomes among the treatments in terms of a net-monetary benefit (NMB) framework. 41 The willingness-to-pay (WTP) threshold was set equal to the annual cost associated with serious juvenile delinquency in children with ADHD retrieved from previous research in the United States. 6 Specifically, criminal history was assessed through self-report, including crimes, juvenile detention, probation, and jail. The costs of crimes incurred by victims and costs to the criminal justice system were estimated based on information from the Bureau of Justice Statistics, the Federal Bureau of Investigation, and the Criminal Justice Institute. The mean total criminal costs were $12,868 and $498 for children with and without ADHD, respectively. Hence, we incorporated the adjusted difference of $12,370 as WTP threshold. As such, we incorporated the cost avoided in the serious delinquency state in the economic evaluation. The cost-effectiveness can then easily be calculated within this framework as follows:
The WTP threshold is a key parameter for the NMB analysis, hence determining the conclusions drawn from the economic evaluation. Therefore, we reevaluated Equation (1) in deterministic sensitivity analyses with linearly increasing WTP thresholds between $ 0 and $50,000.
Further Analyses
We used logistic regression models to control for sample selection and the likelihood of absorbance. In the first model, we controlled for the effect of sample selection by estimating odds ratios (ORs) of model exclusion conditional on the relevant covariates age, gender, comorbidity, intelligence, ethnic background, and occupation-based socioeconomic family status. We established whether the covariates mentioned above were independent predictors for model exclusion at a 5% significance level. In the second model, we focused on the likelihood of the absorbance in the serious delinquency state. We compared the adjusted OR of absorbance in the serious delinquency state with moving out of this state at a 5% significance level for both the included and excluded children. All of above statistical analyses were performed with R 3.2.4 (R Foundation; https://www.r-project.org/foundation/) and STATA/SE 15.0 (STATA; https://www.stata.com/).
Results
Model Validation
The model parameter estimation results are presented in Table 2. As is shown in Figure 2, the predicted survival curves obtained from our Markov model closely resembled the observed survival curves. The only exception was the curve for children who were assigned to medication management, in which a clear overestimation of the probability of preventing serious delinquent behavior was detected. Our model provided excellent predictions for children assigned to routine community care and the combined strategy of medication management and behavioral treatment. Specifically, the mean difference in percentage of LYs of serious delinquent behavior prevented between the modeled and observed trajectories is 8.5% for medication management, 7.8% for behavioral treatment, 5.8% for the combined treatment, and 5.7% for routine community care.
The Specification and the Results of the Parameter Estimation for the Markov Model a
Standard errors in parentheses; T2, T3, T4, respectively, represent routine community care, medication management, behavioral treatment and combined treatment, with routine community care as the reference category for

Results of the model validation for the four treatment modes: (a) routine community care, (b) medication management, (c) behavioral treatment, (d) combined treatment; straight lines represent observed data, and dashed lines represent the model prediction.
Economic Evaluation
Thirty-two of the 448 children (7%), who started at baseline in the no delinquency or mild to moderate delinquency state, reached the serious delinquency state within 10 years. For policy makers this is a substantial percentage, looking at the description of the delinquency levels associated with this state and taking into account the annual cost associated with serious juvenile delinquency in children with ADHD of $12,370. 6 Table 3 presents the cost-effectiveness results stratified by the four treatment modes considered in the MTA study. We performed both undiscounted and discounted analyses.
Results of the Economic Evaluation Stratified by Treatment Mode
LY, life-year; NMB, net monetary benefit.
In 10 years time, the discounted average LYs of serious delinquent behavior prevented were 7.86 for medication management, 7.90 for behavioral treatment, 8.17 for the combined treatment, and 8.10 for routine community care. Although the combined treatment had the highest LYs prevented, routine community care turned out to be the optimal strategy in terms of cost-effectiveness with the highest mean NMB due to the substantial difference in treatment cost.
Figure 3 illustrates that the difference in NMB between routine community care and the two active treatments medication management and behavioral treatment increases as the WTP increases, while the difference in NMB between routine community care and the combined treatment reduces. The latter is due to the fact that the difference in treatments cost becomes, relatively, less determinative for the NMB results.

NMB plot with linearly increasing WTP thresholds between $ 0 and $50,000.
Further Analyses
Table 4 demonstrates that the results are not driven by sample selection, as none of the ORs of model exclusion conditional on the relevant covariates age, gender, comorbidity, intelligence, ethnic background, or occupation-based socioeconomic family status were statistically significant at a 5% level.
Logistic Regression Results on Model Exclusiona
IQ, intelligence quotient; WISC, Wechsler Intelligence Scale for Children.
Furthermore, we determined the likelihood of absorbance in the serious delinquency state once a child entered this state. We found for the 448 included children an adjusted OR of continuation in the serious delinquency state of 1.471 (P < 0.001), against an adjusted OR of moving out the serious delinquency state of −1.423 (P < 0.001). Similarly, we found for the 131 excluded children an adjusted OR of continuation in the serious delinquency state of 0.927 (P < 0.001), against an adjusted OR of moving out the serious delinquency state of −0.953 (P < 0.001). Hence, the serious delinquency state is more likely absorbent than transient for both groups.
Discussion
In this study we assessed the long-term cost-effectiveness of the three major forms of ADHD treatment and routine community care beyond the 14-month trial period of the MTA study. 12 For this we developed a Markov model with an innovative model structure. We considered the high-cost serious delinquency state to be absorbing to predict the LYs of serious delinquent behavior prevented over a time period of 10 years. The availability of long-term follow-up data in the MTA study enabled us to assess the accuracy of modeling prediction. Modeled and observed outcomes matched closely with a mean difference of 6.9%. Hence, the model delivers reliable estimates when used in future economic evaluations. By setting the WTP equal to the cost avoided in the serious delinquency state, we calculated cost-effectiveness in terms of NMB. 41 Results of the economic evaluation revealed that the combined treatment was the only active treatment mode that further decreases serious delinquent behavior compared with routine community care. However, the substantial difference in treatment cost renders the routine community care to be the optimal treatment strategy in terms of NMB. Moreover, results are robust to manipulating the WTP threshold in deterministic sensitivity analyses.
These findings are in line with previous research with the MTA study.28,29 Molina et al. demonstrated that while after 3 years the active ADHD treatments had improved symptomatic symptoms of the children relative to the control condition, no differences were found with respect to delinquent behavior.28,29 Although the MTA study was not originally designed to examine delinquency, we argue that it is interesting for policy makers to see whether the pursued medical effects of the treatments translate into positive effects with respect to behavior of these children in society. Clearly, the economic impact of ADHD often extends to society beyond the health system6–9 and treatment effects on ADHD symptoms and broader societal aspects of the disorder differ substantially.28,29 Additionally, Schawo et al. 15 plead for model-based studies in ADHD using empirical data for model validation and broader societal outcomes as the modeled outcome. As such, the evaluation form in this study has its potential to be extended to a broader perspective.
One of the innovative structures of the Markov model in this study is the use of a more or less transient state in real life as absorbing state in the model. As such, it enables model validation according to the guidelines for simulating a continuous-time Markov model. 26 Without including an absorbing state, the simulation along with our vertical modeling approach reaches an endless process. The only reason for a process to end is when the processing time is beyond our study time horizon of 10 years. Furthermore, with this model structure we were able to evaluate the treatments of the MTA study through prevention of serious delinquent behavior. The latter is highly relevant for policy makers as serious delinquency is associated with substantial societal cost.19–21 A limitation of this assumption is that we had to delete 22.6% of the original sample. However, we demonstrated that this assumption did not affect the modeled outcomes, as model exclusion was not predicted by the relevant child and family characteristics of age, gender, comorbidity, intelligence, ethnic background, and occupation-based socioeconomic family status. Moreover, results revealed that the serious delinquency state is more likely absorbent than transient for both included and excluded children.
In this study we included additional covariates to control the sensitivity of our modeling assumptions. Exploring the effect of these child and family characteristics on juvenile delinquency in children with ADHD is beyond the scope of this study, but remains an interesting topic for further research.
Conclusions
This study assessed the cost-effectiveness of treatments for ADHD in children using a continuous-time Markov model. The structure of the model allowed to evaluate the treatments through prevention of serious delinquent behavior over a time period of 10 years. The three major forms of ADHD treatment had a lower NMB than routine community care, which confirms the necessity for policy makers to evaluate ADHD treatments in broader societal outcomes before implementation.
Footnotes
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Data and/or research tools used in the preparation of this article were obtained and analyzed from the controlled access datasets distributed from the NIMH-supported National Database for Clinical Trials (NDCT).
NDCT is a collaborative informatics system created by the National Institute of Mental Health to provide a national resource to support and accelerate discovery related to clinical trial research in mental health. Dataset identifier: NCT00000388 (Clinical Trial ID). Furthermore, this work was supported by the Dutch Child and Adolescent Psychiatry Centre Accare. Finally, the findings and views reported in this article are those of the authors and should not be attributed to individuals or organizations mentioned here.
