Abstract
Background and aim
Research has demonstrated that a variety of treatments can reduce or eliminate self-injurious behavior (SIB) in individuals with autism and/or intellectual disabilities but evidence suggests that not all treatments are equally effective.
Methods
We used multi-level meta-analysis to synthesize the results of 137 single-case design studies on SIB treatment for 245 individuals with autism and/or intellectual disabilities. Analyses compare the effects of various behavioral and medical treatments for SIB and assess associations between treatment effects and participant- and study-level variables.
Results
Findings suggest differential reinforcement, punishment, and treatment packages with reinforcement and punishment components resulted in the largest SIB reductions.
Conclusions
Results indicate that overall, treatment for SIB is highly effective and that participant and study characteristics do not moderate treatment effects.
Implications
Based on results and in line with current practice recommendations, we encourage use of reinforcement-based procedures in all cases of SIB. In the event that reinforcement-only treatments have failed or if SIB poses a serious, immediate threat to the health and well-being of an individual, our results suggest that overcorrection paired with reinforcement may be the most effective as well as less invasive alternative.
Keywords
Introduction
Self-injurious behavior (SIB) is a broad term encompassing behaviors that cause unintentional, self-inflicted, socially unacceptable physical injury to the individual’s own body (Yates, 2004). Examples of SIB topographies include head hitting, hand mouthing, hair pulling, eye gouging, and hitting self with objects (e.g., Matson & LoVullo, 2008). In addition to causing physical injury, SIB can have serious social repercussions. Distress among caregivers and severely limited community acceptance and participation are commonly associated with SIB (Fisher, Piazza, Bowman, Hanley, & Adelinis, 1997; Schalock, 2004; Tate & Baroff, 1966).
The prevalence of SIB is estimated at 50% for people diagnosed with an autism spectrum disorder (ASD) and in the range of 4%–24% for people diagnosed with intellectual disability (ID; Baghdadli, Pascal, Grisi, & Aussilloux, 2003; Prangnell, 2009; Richards, Oliver, Nelson, & Moss, 2012). Furthermore, nearly half (47%) of all individuals with SIB exhibit these behaviors for 10 years or more (Totsika, Toogood, Hastings, & Lewis, 2008). The persistence of SIB over time, high prevalence rates, and the potential for severe physical and social consequences necessitate the identification of efficient, effective, and lasting treatments for SIB.
Theoretical frameworks for emergence and treatment of SIB
The fields of medicine and behavior analysis offer differing theoretical frameworks for the emergence of SIB. The medical perspective emphasizes biochemical processes internal to individuals, while behavior analytic theories focus on relations between environmental conditions and individuals’ behaviors.
Authors from the field of medicine have hypothesized that engaging in SIB precipitates the release of endorphins, hormones, or neurotransmitters, which act to reinforce and maintain the behavior (Garcia & Smith, 1999). Treatments for SIB based on these hypotheses involve administering psychotropic drugs that interrupt key biochemical processes. For example, the drug naltrexone blocks opioid receptors and is used in an effort to prevent experience of a potentially reinforcing “endorphin rush” during SIB episodes (e.g., Sandman, Barron, & Colman, 1990). Anti-depressant medications (e.g., Sertraline and Paroxetine; Hellings, Kelley, Gabrielli, Kilgore, & Shah, 1996; Davanzo, Belin, Widawski, & Bryan, 1997) and serotonin-rich foods (e.g., bananas, Gedye, 1990) have been administered to alter individuals’ baseline state and may render SIB and its biochemical effects of less interest or reinforcing value. Further, antipsychotic drugs have been used to suppress motor activity and thereby limit incidence of SIB (e.g., Risperidone and Clozapine; Cohen, Ihrig, Lott & Kerrick, 1998; Hammock, Schroeder, & Levine, 1995).
Alternatively, authors from the field of behavior analysis have argued that, while the physical sensation a person experiences during SIB could reinforce the behavior, SIB could additionally function as a form of communication and/or means to solicit behavior in others (Sigafoos, Arthur, & O’Reilly, 2003). For example, a person may engage in SIB to obtain attention from a caregiver, gain access to tangible items (e.g., food and toys), or avoid participation in a non-preferred activity. In this framework, people with ASD and/or ID, who often have limited proficiency in communication and/or mobility with which to manipulate their environment, understandably resort to challenging behaviors such as SIB to express their preferences and needs. Conversely, the ability to effectively communicate requests and refusals with speech or alternative communication methods may serve as a protective factor against SIB (Schroeder, Schroeder, Smith, & Dalldorf, 1978).
Landmark studies conducted by Iwata, Pace, Cowdery, and Miltenberger (1982/1994) established functional analysis (FA) as a reliable, experimental method for identifying the function(s) of individuals’ SIB. In the years since, behavior analysts have embraced FA as the gold standard for determining why an individual exhibits a behavior and consider the method an essential step in selecting appropriate and maximally effective interventions (Cooper, Heron, & Heward, 2007). Behavior analytic treatments for SIB generally involve manipulation of environmental conditions and teaching of skills. For example, in cases in which FA identifies the function of SIB to obtain a particular toy or form of attention, interventions may include reinforcement of alternative communicative behaviors with provision of the particular food or form of attention (e.g., Hanley, Piazza, Fisher, & Adelinis, 1997). For SIB with an avoidance function, interventionists may teach individuals new ways to request or initiate release from non-preferred stimuli, and then provide structured opportunities to reinforce the new behaviors with release (e.g., Kahng, Iwata, DeLeon, & Worsdell, 1997). Punishments for SIB are also common methods of intervention that are intended to break associations between SIB and its reinforcing consequences. Following instances of SIB, aversive stimuli may be applied (e.g., ammonia tablet broken under nose; e.g., Singh, Dawson, & Gregory, 1980) or reinforcers may be withheld (e.g., attention or toy withdrawn, e.g., Lucero, Frieman, Spoering, & Fehrenbacher, 1976; further movement restrained, e.g., Fisher et al., 1997). Typically, interventions involve multiple components, such as a punishment contingency, instruction in leisure or communication skills, or new routines, plus several reinforcement procedures.
Effects of medical treatments for SIB
Reviews of the research have reported conflicting results regarding the effectiveness of anti-depressants, opioid antagonists, and antipsychotics in the treatment of SIB. Symons, Thompson, and Rodriguez (2004) conducted a multi-level meta-analysis and narrative review of studies on naltrexone use in individuals with ID. Results indicated that 80% of subjects showed reductions in SIB during short-term treatment and that males required lower doses and responded more favorably than females. Based on their narrative review, Mahatmya, Zobel, and Valdovinos (2008) also found that anti-depressants, opioid antagonists, and antipsychotics are generally effective for treating SIB in individuals with ASD, but that several variables appear to moderate the effects of medication (i.e., dosage, treatment duration, sex, severity of SIB, and variables maintaining SIB). In contrast, results from Gormez, Rana, and Varghese’s (2014) narrative review of pharmacological treatments for SIB in adults with ID suggested that no active drug was more effective than a placebo. Authors of these reviews and others on pharmacological treatments for challenging behaviors (e.g., Matson et al., 2000) have noted a variety of difficulties in drawing conclusions confidently regarding the effects of specific drugs, for example, individual differences in reactivity, interactions with other medications or extra-experimental changes in other medications, and, at the study level, a lack of consistency in measurement and reporting on side effects (e.g., suppression of learning or communication).
Effects of behavioral treatments for SIB
Syntheses of behavior analytic research similarly report varying results. Findings of a quantitative review of studies on behavioral treatments for SIB published between 1965 and 2002 suggest that increased use of FA over time coincides with a marked increase in the use of reinforcement-based treatments (e.g., differential reinforcement of alternative behavior) and a gradual decrease in the use of punishment procedures (Kahng, Iwata, & Lewin, 2002). Kahng et al.’s (2002) analysis of treatment effects depicted all methods as capable of substantial reductions of SIB, favored punishment and preventative methods (e.g., manipulation of antecedents, mechanical or manual blocking of SIB), and suggested combinations of treatments were slightly more effective than single component interventions.
Prangnell (2009) reviewed studies on behavioral treatments for SIB published from 1998 to 2008 and, in contrast to Kahng et al. (2002), noted a substantial proportion of studies reported the use of aversive treatments. Prangnell’s (2009) synthesis findings suggested treatment effects were highly variable within particular categories/types of interventions, punishment procedures typically resulted in rapid and substantial reductions in SIB, and combinations of behavioral treatments were more effective than single treatments. Prangnell (2009) cautioned that conclusions about the long-term effects of treatments could not be drawn due to a lack of follow-up or maintenance data reported across studies.
Additionally, Denis, Van den Noortgate, and Maes (2011) conducted a multi-level meta-analysis of non-aversive and non-intrusive, reinforcement-based SIB treatment effects for participants with profound ID. Results showed that non-aversive and non-intrusive behavioral treatments resulted in large reductions in SIB. Participant sensory impairment was identified as a moderator of treatment effects, while medication, motor impairment, setting, age, gender, and matching of treatment with behavioral function were not found to account for variations in treatment effects.
Limitations of previous research syntheses on SIB treatment effects
While a number of reviews have synthesized research on effects of behavioral treatment on SIB, each limited their sample of studies to focus on a single disability category: ASD or ID. Behavior analytic tradition suggests, with regard to SIB and other forms of challenging behavior, disability condition is negligible and greatly surpassed in relevance by an individual’s history of contingencies (Watson, 1913). Thus, previous syntheses’ omission of large portions of the research literature on behavioral treatments may have skewed their findings.
As described above, the majority of reviews on SIB treatment have focused on either medical or behavioral treatments. Only one narrative review (Mahatmya et al., 2008) included studies on both medical and behavioral treatments for SIB in individuals with ASD. However, authors limited their sample to function-based behavioral treatments and thereby excluded several decades of research from before the establishment of functional assessment methods.
Additionally, the scope and internal validity of previous reviews’ findings are limited by their synthesis methods. With the exception of the quantitative analyses conducted by Symons et al. (2004), Kahng et al. (2002), and Denis et al. (2011), each which focused on narrow selections of treatments and/or disability conditions, all other syntheses employed narrative review methods. While narrative review methods have a variety of strengths (White, 1987), they do not permit precision in assessment or comparison of treatment effects nor analysis of associations between treatment effects and potential moderators. Also, in recent years, methodological researchers have greatly advanced understandings of and techniques for meta-analysis of single-case research data (i.e., the primary research design employed in research on treatment of SIB; “Methodological Dilemmas,” April 2012; “Methodological Issues,” April 2013; “Handling Methodological Issues,” April 2014; “Single-Case Experimental Designs,” April 2015). In empirical research and theoretical commentary, authors have identified systematic biases and inaccuracies in summary statistics used in previous syntheses (e.g., Mean Baseline Reduction, standardized difference between means), which are associated with common features of single-case research data (e.g., variability across datasets in numbers of repeated measures and/or outcome metrics, variability across observations in behavior levels; Beretvas & Chung, 2008; Pustejovsky, 2015).
Limitations in previous reviews’ samples and methods warrant further meta-analytic work in the area of SIB treatments. The present review was designed to address these limitations. We sampled the entire body of literature on SIB treatment and synthesized research on all types of SIB treatment, involving individuals with ID and/or ASD, using the most current and methodologically sound quantitative procedures.
Meta-analysis of single-case research
Meta-analysis allows estimation and aggregation of treatment effects, analysis of associations between treatment effects, i.e., effect sizes (ESs), and independent variables, and calculation of confidence intervals and significance testing for treatment effects and associations (Cooper & Hedges, 1994). By synthesizing bodies of literature, meta-analysis can yield new knowledge about educational and behavioral practices (Jenson, Clark, Kircher, & Kristjansson, 2007). The practices are well established in fields of psychology and education that employ group-design research methods, although development and uptake of quantitative techniques for synthesizing results of single-case design research is still in process. Despite general agreement that meta-analysis of single-case experimental data is useful and should be undertaken, there has been considerable debate regarding the validity of various methods (e.g., Allison & Gorman, 1993; Beretvas & Chung, 2008; Ferron, 2002; Kratochwill & Levin, 2014; Salzberg, Strain, & Baer, 1987; Shadish, Rindskopf, & Hedges, 2008; Scruggs, Mastropieri, & Casto, 1987; Scruggs & Mastropieri, 2013). Criticisms primarily focus on the use of non-parametric summary statistics and methods developed for group-design research, but also pertain to methodological challenges that result from variations in standards and rigor in primary single-case research (Kratochwill & Levin, 2010; Kratochwill et al., 2012).
Non-parametric statistics and standardized mean differences
Non-parametric summary statistics (e.g., percentage of non-overlapping data (PND; Scruggs et al., 1987), percentage of zero data (PZD; Scotti, Evans, & Meyer, 1991), and mean baseline reduction (Kahng et al., 2002)) and application of the Standardized Mean Difference (SMD; Busk & Serlin, 1992) to single-case data carry the risks of systematic bias and misrepresentation of data phenomena. For example, PND has an inverse relationship with the number of baseline data points (i.e., higher PND values are associated with fewer baseline data points; Allison & Gorman, 1994). PZD has an inverse relationship with the number of observation sessions conducted during treatment phases beyond the first observation of a zero level of behavior (i.e., longer treatment phases are associated with lower PZD scores; Beretvas & Chung, 2008). Comparison and aggregation of mean baseline reduction (MBR) statistics across studies are confounded by variations in the levels of baseline data, such that different levels of baseline data produce statistics on different metrics (i.e., 50% reduction in a rate of 10 behaviors per minute does not equate to 50% reduction in a rate of 100 behaviors per minute, in terms of magnitude of reduction or practical value of the treatment outcome). Also, low PND, PZD, and MBR values can also result from slow acquisition rates, even when treatments are ultimately effective in changing or eliminating behaviors over time (Allison & Gorman, 1994; Scotti et al., 1991; White, 1987).
Similarly, trends in data confound SMD values by introducing error in variance estimates and skewing means (Marquis, Horner, & Carr, 2000). Further, standardization of treatment effects with the variance of phase data does not logically correspond to the import of variability in behavior levels (Pustejovsky, 2015). In the tradition of behavior analysis, variability in behavior levels purely represents unreliability of measurement procedures, created by uncontrolled variables, whose moderating effects are unknown and immeasurable. Presumably, effects of uncontrolled variables vary in composition and intensity across observation sessions. In group-design research, variability in measures of individuals represents the distribution of a trait, plus some amount of error due to uncontrolled variables. While standardization of mean differences with variances scales effects from group research in terms of meaningful distributions, approximately, such standardization in single-case research scales effects in terms of error and the presence of confounds (i.e., effects are compressed or inflated according to the degree of unreliability of the measurement system).
Additional bias and misrepresentation of phenomena can result from autocorrelation (i.e., serial dependence in repeated measures) and variations in outcome metrics (e.g., frequency counts, percent of partial intervals). Autocorrelation has the potential to interfere with accurate estimation of variances (Baek & Ferron, 2013). When used as a standardization factor, inaccurate variances confound ESs. Variations in outcome metrics across studies typically necessitate standardization of data prior to aggregation (Van den Noortgate & Onghena, 2003). Given no methods of standardization exist for non-parametric summary statistics, aggregation of data from disparate metrics yields results with very low internal validity.
Recognition of these and other methodological problems has led authors to caution against use of non-parametric summary statistics and methods developed for group-design research. While their flaws differ, each statistical method is compromised in terms of accuracy and bias in estimation and aggregation of treatment effects (e.g., Allison & Gorman, 1994) and moderator analyses (e.g., Haevaert, Saenen, Maes, & Onghena, 2015).
Multi-level modeling
Multi-level modeling (MLM) constitutes a viable alternative to non-parametric summary statistics for synthesizing single-case data. MLM estimation (a) is not biased by differences in the number of data points collected in each phase or levels of behavior across individuals, (b) can model trends in time-series data, and (c) is robust to the presence of auto-correlation (Baek & Ferron, 2013; Jenson et al., 2007; Raudenbush & Bryk, 2002; Singer & Willet, 2003; Van den Noortgate & Onghena, 2003, 2008). MLM also can identify differential effects when distributions of statistics vary or are not known, and allows for accurate moderator analyses. When used with datasets comparable in study design specifics or in conjunction with a rigorous ES summary statistic, mounting evidence suggests MLM can be a sound and useful tool for identifying evidence-based practices and generating new knowledge from research literatures (Baek & Ferron, 2013; Baek et al., 2014; Moeyaert, Ferron, Beretvas, & Van den Noortgate, 2014; Moeyaert, Ugille, Ferron, Beretvas, & Van den Noortgate, 2013; Moeyaert, Ugille, Ferron, Beretvas, & Van den Noortgate, 2014; Ugille, Moeyaert, Beretvas, Ferron, & Van den Noortgate, 2012).
Only one previous synthesis used MLM to investigate SIB treatments (i.e., the effects of non-aversive and non-intrusive reinforcement-based treatments in participants with profound ID; Denis et al., 2011). However, Denis et al.’s (2011) results may have been confounded by use of a SMD ES. The current study contributes to SIB research literature by (a) analyzing data on all types of treatment for SIB, from studies with both participants with ASD and ID; (b) making use of a more robust ES; and (c) assessing the relations between treatment effects and a variety of subject characteristics (e.g., disability diagnosis), study characteristics (e.g., duration of treatment), and intervention methods (e.g., matching treatments to SIB function, form of punishment).
The following research questions guided preliminary analyses (1), the formulation and estimation of multi-level models (2, 3, 4), and post-hoc analyses (5, 6):
Across categories of each participant- and study-level independent variable (e.g., disability condition, metric for measurement of SIB), are there apparent, systematic differences in ESs’ distributions and means? For which variables do data justify their inclusion as explanatory variables in meta-analytic models? In an unconditional model, involving aggregation of individual ESs within studies and then combination of study aggregates:
Overall, what is the expected effect of treatment on SIB level? Across studies, do aggregate ESs differ? Within studies, do individual ESs differ? What proportions of the total variance occur between studies and within studies (i.e., between individual ESs in studies)? In a first conditional model, including treatment type as a explanatory variable for study aggregate ESs:
For each treatment type, what is the expected effect of treatment on SIB level? Do data suggest any treatments are more effective than others? After controlling for treatment type, do study aggregate ESs differ? In other words, is there any variation that remains to be explained? In a second conditional model, including treatment type, as well as whether interventions were matched to SIB function as explanatory variables for study aggregate ESs:
For each treatment type, what is the difference between expected effects for interventions matched to function and those not matched to function? Do data suggest the practice of matching interventions to function is more effective than not matching interventions to function? Does modeling whether treatments were matched to SIB function result in greater explanatory power in the model (i.e., greater precision in effect estimates, less unexplained variance in study aggregate ESs)? In the subset of studies that investigated punishment procedures, do data suggest effects differ across negative punishment (i.e., withdrawal of preferred stimuli), positive punishment (i.e., application of aversive stimuli) that relates to SIB function, and positive punishment that is not known to relate SIB function? In the subset of studies that investigated positive punishments, do data suggest effects differ across irritants, movement suppression, overcorrection, and interventions that couple these components with reinforcement procedures?
Method
Literature search
We searched the PsychInfo, ProQuest, and Web of Science databases using combinations of the following keywords: (a) treatment, therapy, training, or intervention, (b) self-injur*, SIB, self-destruct*, or self-harm, and NOT suicide, and (c) autis*, ASD, autism spectrum, intellectual disabilit*, retard*, or mental deficien*. We limited results to peer-reviewed journal articles published in English, but did not constrain by date of publication. Our database search yielded 679 articles for potential inclusion.
Inclusion criteria
In examination of the 679 articles, we used the following criteria to select studies or datasets for inclusion: (a) the experimental study used a single-case research design, beginning with a baseline phase that was followed by a treatment phase; (b) the dependent variable was a quantitative measure of SIB (e.g., frequency of head-hitting); (c) the independent variable was a treatment which targeted reduction in SIB; (d) participants had ASD and/or ID; (e) outcome data were presented graphically or numerically in a table for individual measurements/observation sessions; and (f) authors reported at least two data points per phase. After screening all identified articles, 137 studies (reported in 131 articles) that included a total of 245 unique participants were selected for inclusion in our analysis (see Appendix 1 for reference list of included studies).
Independent variables
Based on information available in articles, we selected independent variables, at two levels, that we hypothesized had potential to influence ESs (i.e., the individual participant-level and the study-level). For each variable, we operationally defined coding categories for each variable to capture the diversity across studies. Coding category definitions were expanded and refined at regular research team meetings, as needed to unambiguously sort all studies and participants. After initial training on coding conventions, two team members coded each article independently and then compared all coding decisions for agreement. All coding inconsistencies were resolved during research team meetings by group consensus.
Participant characteristics
At the participant level, we coded information on three characteristics (with variable categories in parentheses): (a) diagnosis (ASD, ID, and dual diagnosis); (b) communication limitations (no form of communication, some form of communication, verbal communication, and not specified); and (c) physical limitations (non-ambulatory, uses wheelchair, ambulatory, restraints, and not specified). We did not include participant age as a dependent variable because behavior analytic treatments have proven effective with all age groups (The National Autism Center, 2009), and we did not include IQ because reporting on participant IQ was missing from the majority of included original studies.
Study characteristics
At the study level, we coded information on four characteristics pertinent to treatment of SIB (with variable categories in parentheses): (a) assessment of SIB function (none/informal observation, FA, indirect assessment scale (e.g., Functional Analysis Screening Tool and/or Motivation Assessment Scale), and mixed assessment (e.g., FA plus indirect assessment scale)); (b) treatment matched to function (yes and no (e.g., providing verbal praise statements for behaviors that are maintained by attention)); (c) hypothesized function of SIB, as identified by authors or inferred during coding process from information provided in article (attention, escape/demand, tangible, automatic, multiply maintained, and not specified/other); and (d) treatment type (medication, functional communication training (FCT), differential reinforcement, extinction + reinforcement treatment packages, extinction, punishment, punishment + reinforcement treatment packages, and punishment + punishment treatment packages). Within categories of treatments, there were various forms of heterogeneity in intervention specifics, most notably among medications. Decisions to group treatments into the eight categories were driven by interventions’ commonalities in procedures and principles, and the analytical benefits of maximizing sample sizes in categories, as well as the meta-analysis norm of tolerating minor forms of heterogeneity across studies (Cooper & Hedges, 1994) and the statistical capacity of MLM to model and test heterogeneity.
We also coded three study-level characteristics that we hypothesized could introduce bias or confound analysis of ESs, but are irrelevant to practical outcomes of treatment. These included (a) dependent variable metric (rate, frequency, percent of intervals, percent of trials, and duration); (b) length of data collection sessions (0–5 min, 6–10 min, 11–20 min, 21–45 min, 46–90 min, 91+ min, and not specified), and (c) baseline condition/treatment phase comparison (no treatment/no contingency in place, pretreatment routine, FA condition, participant-perceived reinforcement contingency, and participant-perceived punishment contingency).
Analysis procedures
Data extraction
After completing coding for the independent variables, we extracted dependent measure data from included studies. We used a web application, WebPlotDigitizer (Rohatgi, 2016), to define the coordinate space of graphs, pinpoint coordinates of data points, and tabulate coordinate values. Investigation of the use of WebPlotDigitizer to extract data from single-case research graphs has found the program to yield highly reliable coordinates for data points (Moeyaert, Maggin, & Verkuilen, 2016). After extracted data were tabulated, we visually checked coordinate values against graphical displays for accuracy and then transferred the data into a spreadsheet.
ES calculation
We used Pustejovsky’s measurement-comparable log-ratio ES measure to quantify treatment effects (Pustejovsky, 2015). In contrast to other available summary statistics, Pustejovsky’s ES is not subject to systematic bias related to sample sizes or baseline levels of behavior, and aggregates are not confounded by differences in outcome metrics across datasets. Also, because behavior changes, in theory, are multiplicative, as opposed to additive, linear aggregation of ESs for behavior is inappropriate and risks misrepresentation of phenomena in composite. The logarithmic form of Pustejovsky’s ES bolsters the internal validity of ES aggregates and meta-analytic results.
Figure 1 illustrates graphically the steps in calculation of a log-ratio ES and the corresponding percent reduction. ESs were estimated using the following equation:
Illustration of log ratio and percent reduction calculations. The top panel (a) shows an original data display (Kahng, Iwata, Thompson, & Hanley, 2000); the middle panel (b) shows mean lines and values for baseline and treatment phases; and the bottom panel (c) shows the formulae for the log ratio and percent reduction and illustrates the percent reduction area on the original graph.
In aggregate analyses, we weighted individual ESs by their precision. We calculated ES precision as the inverse of the conditional variance of the log-ratio:
As recommended by Pustejovsky (2015) to facilitate interpretation and increase accessibility of results, we transformed ESs from analysis output to percent reduction figures by converting log-ratios into exponentiation values and multiplying by 100. In this study, percent reduction figures represented the proportion of baseline SIB levels eliminated during treatment phases.
Multi-level analysis
Multi-level models in this meta-analysis were composed of linear regression equations that represented individual ESs as nested within studies. In the most basic model, at level-1, individual ESs were modeled as randomly varying around study average ESs. At level-2, study average ESs were modeled as randomly varying around an overall average ES. In subsequent models, explanatory variables (i.e., fixed effects) were added to level-2 equations to analyze associations between study-level variables and ESs. All models were estimated using SAS software (SAS Institute Inc., 2014). An alpha of .05 was selected for all statistical tests.
Our analysis began with preliminary inspections of means and distributions of ESs for each independent variable category (i.e., research question 1). Based on lack of apparent differences, we ruled out inclusion of participant and study characteristic variables.
To address research questions 2a–2d and establish a comparison for subsequent models, we estimated an unconditional model that included random effects at level-1 and level-2, to capture variation within studies (i.e., across individual ESs) and between study averages. The unconditional model included no fixed effects.
To investigate effects of different treatment types (i.e., research questions 3a–3c), we next estimated a conditional model that included fixed and random effects at level-2 for each treatment type, and a random effect at level-1. In the level-2 regression equation, we used dummy variables to represent each category of treatment. To test for differences in ESs between treatment types, we recoded dummy variables to rotate the reference category and re-ran the model to obtain contrasts between all pairs of treatments.
In a second conditional model, we evaluated the impact on effect magnitudes of matching treatments to SIB function (i.e., research questions 4a–4c). The model included a series of dummy variables that represented treatment types when not matched to function and differences in effect when matched to function (i.e., the additive or subtractive effect of matching to function). To obtain estimates of ESs for each treatment type when matched to function, we reverse coded dummy variables and re-ran the model.
After completing our planned investigations, we undertook two post-hoc analyses to explore curiosities regarding differential effects of various forms of punishment interventions (i.e., research questions 5 and 6). We formulated hypotheses regarding variables which may account for variations in effects, operationally defined further categories of punishment interventions, and then recorded additional codes for all studies that involved punishment. As before, we estimated conditional models that included dummy variables for each category of punishment intervention. For these final two models, we forwent tests of differences between each form of punishment intervention and instead drew conclusions from the overlap of confidence intervals.
Results
Descriptive statistics for independent variables not included in meta-analytic models.
Note. n = number of effect sizes for variable categories; SIB = self-injurious behavior
Overall average treatment effect
Parameter estimates and percent reduction for the multilevel analysis.
Note. FCT and Reinforcement + Ext. treatment packages were always matched to function; SE = standard error; CI = confidence interval. Matched-Unmatched values are the difference between effect size values for treatments matched to function of SIB and treatments not matched to SIB function.
p < .001.
p < .01.
p < .05.
The average log-ratio ES for overall treatment effect was −2.2 (SE = 0.16, df = 117, p < .001), which corresponds to an 89% reduction in SIB with a 95% CI [85, 92]. The random effect variance component representing variation in study average ESs was moderate and statistically significant, indicating that variation in study level ESs was greater than would be expected due to sampling bias alone (
Treatment effect by treatment type
The first conditional model addressed research questions 3a–3c, which regarded effects of different treatment types. We also compared the first conditional model against the unconditional model to determine whether including treatment type increased the explanatory value of the model. We included eight fixed effects for the eight treatment types: medication, FCT, differential reinforcement, extinction + reinforcement packages, extinction, punishment, punishment + reinforcement packages, and punishment + punishment packages (more than one punishment procedure used at the same time). Results for the first conditional model appear in Table 2. The fixed effects, γ0 through γ7, represent ES estimates for the eight treatment types; and the random effects,
Treatments with small average effects
Medication and extinction + reinforcement packages had small weighted ESs relative to the other six treatment types. The ES estimate for medication was −0.32 (SE = 0.4, df = 6, p = .43), which corresponds to a 27% reduction in SIB with a 95% CI [−60, 67]. The ES estimate for extinction + reinforcement packages was −0.87 (SE = 0.31, df = 20, p = .09), which corresponds to a 58% reduction in SIB with a 95% CI [24, 77]. The difference between effects of medication and extinction + reinforcement packages was not significant (p = .29).
Treatments with moderate to large effects
Extinction and FCT were associated with moderate to large weighted ESs, which were statistically significant. The ES estimate for extinction was −1.63 (SE = 0.5, df = 17, p = .01), which corresponds to an 80% reduction in SIB with a 95% CI [47, 93]. The ES estimate for FCT was −1.62 (SE = 0.62, df = 14, p = .04), which corresponds to an 80% reduction in SIB with a 95% CI [33, 94]. The difference between ES estimates for extinction and FCT was not statistically significant. Contrasts were not significant between ES estimates for medication and ES estimates for extinction and FCT (p = .055 and p = .10, respectively) and extinction + reinforcement and extinction and FCT (p = .23 and p = .31, respectively).
Treatments with very large effects
Punishment, differential reinforcement, punishment + reinforcement packages, and punishment + punishment packages were associated with very large, statistically significant ESs. The ES estimate for punishment was −2.3 (SE = 0.29, df = 63, p < .0001), which corresponds to a 90% reduction in SIB with a 95% CI [81.5, 94]. The ES estimate for differential reinforcement was −2.5 (SE = 0.29, df = 60, p < .0001), which corresponds to a 91% reduction in SIB with a 95% CI [85, 95]. The ES estimate for punishment + reinforcement packages was −2.8 (SE = 0.56, df = 32, p = .0003), which corresponds to a 94% reduction in SIB with a 95% CI [82, 98]; and the ES estimate for punishment + punishment packages was −2.89 (SE = 0.56, df = 14, p = .0001), which corresponds to a 94.5% reduction in SIB with a 95% CI [83, 98]. Differences between the average ESs for these four treatments were not statistically significant. The ES estimate for extinction was substantially smaller than the ES estimates for the four treatments, but the differences were not statistically significant (.11 ≤ p ≤ .30). Contrasts were not significant between the ES estimates for the four treatments and FCT (.15 ≤ p ≤ .38). Contrasts were significant between ES estimates for the four treatments and medication (p ≤ .001 for all tests). Contrasts were significant between ES estimates for the four treatments and extinction + reinforcement (p ≤ .01 for all tests).
Variation between studies in treatment effects
The between-study variance estimates for extinction (
The between-study variance estimates for punishment (
Variation within studies in treatment effects
The variance component
Treatment effect by treatment type and matched to SIB function
Table 2 presents the results for the second conditional multilevel model, which addressed research questions 4a–4c, regarding whether matching treatment to function had an impact of SIB treatment effects. We also compared the second conditional model against the first model to determine whether including matched to function increased the overall explanatory value of the model. We ran the second conditional multilevel model twice using reverse coding of dummy variables to achieve standard error estimates and calculate confidence intervals for both treatments matched to function and not matched to function. Figure 2 provides a visual representation of the percent reduction figures and confidence intervals for each treatment as estimated by the unconditional model, the first conditional model, and the second conditional model.
Mean percent reduction in SIB for the three multilevel models with 95% confidence intervals. The unconditional model includes the overall average effect of treatment. Conditional model 1 includes average percent in SIB by treatment category. Conditional model 2 includes average percent reduction in SIB by treatment category matched to function (M) and unmatched (UM) to function. Cut-off for lower bound of confidence intervals is sent to 0 for medication in conditional model 1, and extinction (UM), punishment + reinforcement (M), and punishment + punishment (M) in conditional model 2.
The fixed effects, γ20, γ40, γ50, γ60, and γ70, represent ES estimates for five treatment types when not matched to the function of SIB. ES estimates for medication, FCT, and extinction + reinforcement packages (γ00, γ10 and γ30) are excluded from this list because these treatments were always matched to function or, in the case of medication, not matched to function. The fixed effects γ10+11, γ20+21, γ30+31, γ40+41, γ50+51, γ60+61, and γ70+71 represent ES estimates for treatments when matched to function, and the fixed effects γ21, γ41, γ51, γ61, and γ71 represent the difference between matched and unmatched treatment effects.
ES differences between matched and unmatched treatments
For each treatment type, ESs, ES differences, and percent reduction figures for treatments unmatched and matched to function are summarized in Table 2. ESs for unmatched treatments (γ20, γ40, γ50, γ60, and γ70) had greater magnitude than ESs for treatments matched to function (γ10 + γ11 through γ70 + γ71) across all treatment types for which contrasts were possible; however, none of the differences were statistically significant. Regarding effect magnitudes (i.e., differences from zero), the ESs for punishment and differential reinforcement were statistically significant when treatments were both matched and unmatched to function (matched punishment, ω = −1.55, SE = .65, df = 13, p = .02; unmatched punishment, ω = −0.87, SE = 0.73, df = 61, p = 0.24; matched differential reinforcement, ω = −2.27, SE = 0.41, df = 39, p < .001; unmatched differential reinforcement, ω = −0.36, SE = 0.59, df = 25, p = 0.55). ESs for punishment + reinforcement packages and punishment + punishment packages were statistically significant only when treatment was unmatched to function (matched punishment + reinforcement packages, ω = −1.87, SE = 1.66, df = 5, p = 0.29; unmatched punishment + reinforcement packages, ω = −1.06, SE = 1.76, df = 20, p = 0.56; matched punishment + punishment packages, ω = −0.96, SE = 3.10, p = 0.76; unmatched punishment + punishment packages, ω = −2.01, SE = 3.14, df = 13, p = 0.53). Finally, the ES for extinction was statistically significant only when treatment was matched to function (matched extinction, ω = −1.57, SE = 0.56, df = 18, p = 0.02; unmatched extinction, ω = −0.34, SE = 1.34, df = 4, p = 0.80).
Variation between studies in treatment effects
Similar to results of the first conditional model, random effect variance components remained significant for punishment (
Variation within studies in treatment effects
The lack of reduction in
Differences in the effects of punishment treatments
We were interested to examine whether the remaining variation in punishment effects could be accounted for by the possibility that certain types of punishment were more effective than others. In a first post-hoc analysis, we investigated the possibility that positive punishment treatments were more effective than negative punishment treatments. Positive punishment is defined as the presentation of a stimulus that decreases behavior (e.g., administration of irritants) and negative punishment is defined as the removal of a stimulus that decreases behavior (e.g., time out; Cooper et al., 2007). Negative punishment may not be as effective as positive punishment due to the removal of a stimulus. While positive punishments are highly effective, they are also generally more aversive and more ethically objectionable than negative punishments. In a second post-hoc analysis, we grouped positive punishment interventions by punishing stimuli (i.e., irritant, movement suppression, overcorrection/exercise) and intervention components (i.e., punishment alone or packaged with reinforcement procedures) to investigate the possibility that use of particular stimuli was associated with differences in treatment effects. As meta-analysts free from ethical concerns of applied experimentation with punishments, we viewed the post-hoc analyses as a unique opportunity to investigate the relative efficacy of different forms of punishments in treatment for SIB.
Post-hoc analysis of positive and negative punishment treatments.
Note. n = number of effect sizes included in weighted average; ES = effect size; CI = confidence interval. P-values in table are for t-tests of effect sizes’ differences from zero generated by SAS software. Differences between effect sizes were not significant.
p < .001.
p < .01.
p < .05.
Post-hoc analysis of punishing stimuli.
Note. n = number of effect sizes included in weighted average; ES = effect size; CI = confidence interval. Where n > 0, p-values are for t-tests of weighted average effect sizes’ differences from zero generated by SAS software. Differences between groupings’ effect sizes are insignificant. Where n = 1, p-values are for z-tests of individual effect sizes’ difference from zero.
p < .001.
p < .01.
p < .05.
Discussion
Relations between participant-level variables and treatment effects
Descriptive statistic results corroborate the behavior analytic perspective that environmental manipulations are more influential than participant characteristics (Cooper et al., 2007). For example, the percent reduction figures corresponding to weighted mean ESs were nearly identical for participants with differing physical abilities, including those in wheelchairs (75%), those who are ambulatory (75%), and those who are non-ambulatory (76%). Percent reduction figures were also similar across the communication ability categories with the exception of the verbal communication category.
Only three individuals in the sample had verbal communication abilities. The small number of verbal participants in the sample is consistent with the suggestion that an ability to manipulate the environment with speech may serve as a protective factor against SIB (Schroeder et al., 1978). The percent reduction figures associated with weighted mean ESs for participants with verbal skills (31% reduction in SIB) was much smaller than the percent reduction figures across the limited or no communication categories (79%–88%). Based on visual inspection of the three original graphs for verbal participants, it appears that SIB levels for one subject (Clauser & Gould, 1988) were very similar during the baseline and treatment (extinction) phases. The similarity of means across phases would have yielded a very small ES, and given the small sample size (n = 3), this one effect likely skewed the group average giving the false impression that individuals with verbal skills have SIB that is somewhat more resistant to treatment.
An alternative explanation for insignificant differences between weighted mean ESs across diagnosis categories is that changes in ASD and ID diagnostic criteria over time resulted in low internal validity in our coding categories (i.e., ASD, ID, dual diagnosis) and obscured potential differences. Participant diagnoses reported in original studies reflect the fields’ understanding and characterization of ASD and ID at the time of publication. Our sample of studies spans five decades and included 38 participants in studies published before 1987, when autistic disorder first appeared in the DSM III. ASD may have been under-identified or conflated with ID for participants in studies published before 1987. The validity of diagnosis coding for studies published after 1987 is also questionable given that criteria for ASD diagnosis changed again in subsequent versions of the DSM.
Additionally, we found that treatment of SIB had similar effects across all functions of behavior (i.e., attention, escape/demand, tangible, automatic, and multiply maintained). In other words, sorting by function (without regard for treatment type) did not reveal differences in the magnitude of SIB treatment effects.
Given we did not detect category differences in variables we coded, we assume we did not have access to information on participant-level characteristics that influenced treatment effects. For example, it is possible that interventionist expertise, interventionist-participant rapport, or variables extraneous to intervention procedures (e.g., uncontrolled setting events or discriminative stimuli) were responsible for the observed variability in treatment effects.
Effectiveness of different treatments for SIB
Our results indicate that one group of treatments yielded large reductions in SIB (differential reinforcement, punishment, punishment + reinforcement, and punishment + punishment), and a second group of treatments yielded relatively smaller reductions in SIB (medication, extinction, extinction + reinforcement, and FCT). Within the group of treatments associated with smaller effects, our data suggest that medication in particular is not an effective treatment for SIB. The distribution of medication ESs skewed towards no effect, and visual analysis of individual graphs reveals that some participants in the sample experienced slight increases in SIB while receiving medical treatment. It is important to note, however, that we aggregated the effects of different medications due to very small sample sizes (i.e., Naltrexone (n = 5), Clozapine (n = 1), and Clozapine + Depakote (n = 1), and Carbamezepine (n = 1)). Given the potential for individual differences in reactions to medications and issues related to dosages, doing so may have resulted in low internal validity in this category. Relatedly, it is important to remember that a much greater proportion of the variance in treatment effects was found at the participant level than at the study level in the meta-analysis.
It is also important to note that our choice of ES statistic may have privileged faster-acting treatments (i.e., punishment) over treatments that affect behavior change more slowly. The percent reduction figure is driven by phase mean differences without consideration of time. A quick reduction in SIB (i.e., level change) during treatment yields a large difference between baseline and treatment means and a correspondingly large ES. The more gradual slope associated with a slower decrease in SIB yields a smaller mean difference between phases and a smaller ES. Visual analysis of the set of graphical displays for FCT treatment in our dataset indicates that FCT was sometimes associated with a more gradual reduction in SIB. As a result, the magnitude of the ES associated with FCT may be smaller than ESs associated with other treatments not because FCT results in smaller reductions in SIB, but because FCT treatment reduces SIB more gradually.
Extinction and extinction + reinforcement may be associated with smaller SIB treatment effects because participants may have had extinction bursts or increases in behavior in response to the withdrawal of expected consequences. In the event of an extinction burst, the increase in behavior following the start of treatment would confound the log-ratio ES. Further, punishment + reinforcement has the potential to suppress behavior and create new behavior. Extinction does not teach behavior and reinforcement of a new behavior may take time to show results.
Differential reinforcement and punishment-based treatments were associated with the largest treatment effects. Although there was no measurable difference in the magnitude of differential reinforcement and punishment-based effects, research does suggest there are differences in how quickly and sustainably these treatments work to reduce SIB. Punishment may act quickly but may not affect long-term behavior change (e.g., Favell, McGimsey, & Schell, 1982). Differential reinforcement may result in a slower reduction in SIB, but reinforcement procedures can teach new behaviors and result in long-term behavior change (e.g., Favell, McGimsey, & Schell, 1982). Our results support use of punishment + reinforcement treatment packages in efforts to reduce SIB quickly. While we did not synthesize data on maintenance of treatment effects, our review of studies did expose us to many studies which suggested that teaching new, adaptive behaviors sustained reductions in SIB (e.g., Bass & Speak, 2005; Iwata et al., 1994).
Current practice recommendations from a variety of government organizations advocate for use of reinforcement-based treatments (e.g., Department of Health, 2014; Epstein, Atkins, Cullinan, Kutash, & Weaver 2008; Individuals with Disabilities Education Act, 2004; Royal College of Psychiatrists, British Psychological Society, & Royal College of Speech and Language Therapists, 2007). Similarly, applied behavior analysts assert aversive punishment procedures should be reserved for situations in which other treatments have failed (e.g., Iwata, 1988; Matson & Taras, 1989; Worlery, Baliley, & Sugai, 1998). Yet, it can also be argued that allowing SIB to persist is unethical when using a fast-acting treatment could prevent serious harm (Houten et al., 1988). Consideration of punishment-based treatments requires consideration of ethical objections and potential adverse effects. Research suggests that an individual may respond to punishment by engaging in aggressive behaviors if aggression has been successful in escaping punishment in the past (e.g., Azrin & Holz, 1966). Additionally, punishment alone fails to teach a replacement behavior (e.g., Alberto & Troutman, 2012). In the absence of a replacement behavior, it is likely that SIB or a functionally equivalent behavior will reemerge, even if punishment initially eliminated the behavior (Cooper et al., 2007).
It is important to note that punishment is technically defined only as a consequence that reduces behavior. Ethical objections to punishment may stem from a belief that punishment is cruel or unjust, and may rest on the assumption that punishment is invasive and aversive. Of great relevance to debates on which and when punishments should be employed, results of our post-hoc analyses suggest that less aversive and invasive punishment procedures (e.g., overcorrection, time out) yield similar treatment effects as highly aversive and invasive punishments (e.g., electric shock, lemon juice sprays).
Effectiveness of matching treatment to function
Our analysis did not reveal differences in ESs for treatments matched and not matched to function. These results run counter to the common expectation that treatments matched to the function of behavior are more effective in behavior modification (e.g., as codified in requirements to conduct functional behavior assessments in US schools; Individuals with Disabilities Education Act, 2004). However, our results align with similar findings reported by Denis et al. (2011) and Machalicek, O’Reilly, Beretvas, Sigafoos, and Lancioni (2007).
The principle of matching interventions to needs identified by assessments is widely supported by research in other areas of education and development (e.g., curriculum-based assessment; Hosp, Hosp, & Howell, 2016) . While the practice of matching treatment to SIB function is consistent with this time-honored and research-supported principle, our data suggest further variables at the individual-level, yet unrecognized or targeted in established assessment methods, greatly influence the outcomes of intervention.
Limitations
Several limitations associated with our dataset should be noted. While we sampled the entire single-case SIB treatment literature, small sample sizes in several variable categories resulted in a number of unreliable estimates, namely those pertaining to medications, a few punishment categories in post-hoc analyses, and several matched and unmatched treatment categories. MLM typically provides robust estimates, but precision and accuracy of estimates positive correlates with sample size (Raudenbush & Liu, 2000).
Many primary studies did not include detailed participant-level information or report maintenance data. The large proportion of within-study variance associated with our models led us to hypothesize that one or more unaccounted-for participant-level variables(s) may have influenced treatment effects. Access to more participant-level information might have allowed us the opportunity to code and analyze additional and potentially meaningful participant-level variables in our analysis. The inclusion of maintenance data might have decreased the magnitude of effects for treatments that work quickly to reduce SIB but do not result in long-term behavior change (i.e., punishment only treatments).
We must also address study limitations related to our analysis. Over the five decades during which primary studies in our sample were published, behavior analysis and single-case research evolved (e.g., Odom, Brantlinger, Gesten, Horner, Thompson, & Harris, 2005). At different time points, study participants with ASD and ID would have had differing levels of access to care and educational opportunities; interventionists would have had different training experiences; and researchers would have been bound by different standards for ethics and rigor. Research in education and medicine has demonstrated that baseline and treatment values can ‘drift’ over time with changes in societal practices and environmental conditions (e.g., Lemons, Fuchs, Gilbert, & Fuchs, 2014). Though the potential effect of time on our dataset raises interesting questions, we did not model time in our analyses for the sake of parsimony and diminishing benefits of greater complexity in statistical models.
Also, single-case research is an inductive method, while statistical modeling is inferential. In single-case designs, conditions are manipulated across time, and manipulations are replicated, in efforts to induce reliable patterns of behavior change. In group design research, conditions are manipulated across people, and from differences in groups’ behavior patterns, inferences are drawn regarding variable associations. As described in the introduction, much empirical research and wide-ranging consensus supports the practice of meta-analysis of single-case data. However, this fundamental difference has an unknown impact on the internal validity of use of inferential statistics with single-case data.
Implications for practice
Results suggest that treatments involving differential reinforcement, punishment, and reinforcement plus punishment treatments result in similar reductions in SIB. In line with current practice recommendations, we encourage use of reinforcement-based procedures in all cases of SIB. In the event that reinforcement-only treatments have failed or if SIB poses a serious, immediate threat to the health and well-being of an individual, our results suggest that overcorrection paired with reinforcement may be the most effective as well as less invasive alternative. Overcorrection can quickly reduce SIB, and by including a reinforcement component to teach more adaptive behavior, treatment can result in sustainable behavior change.
Results of the meta-analysis could be interpreted to support use of punishment procedures as sole components of interventions. We caution against this interpretation and discourage the practice of punishment in isolation. Research suggests that punishment overrides the effect of reinforcement from problem behavior, but the effects of punishments depend on having the punishing stimulus presented upon the occurrence of the challenging behavior. The common presence of obstacles to consistent delivery of punishing stimuli, along with legal and ethical issues of punishment make use of reinforcement-based practices alone or in conjunction with less aversive and invasive punishments more reasonable, practical, and efficient.
As a final point, despite the lack of support for treatments that were matched to function over those that were not matched to function, it is advisable to follow current practice recommendations for assessing SIB and basing treatment decisions on functional assessment findings. In our study, we did not investigate prosocial behaviors, however, functional assessments can guide and support the selection of reinforcement-based procedures that target increases in socially acceptable behaviors, enhancements in an individual’s community involvement, and improvements in the quality of their relationships (Mace, 1994).
Conclusion
SIB is common, persistent, and can have serious physical and social consequences. The present review used a rigorous multi-level meta-analysis to synthesize the large body of single-case design SIB treatment literature in order to investigate the relative effectiveness of various treatments for SIB. Results indicate that overall, treatment for SIB is highly effective and that participant and study characteristics do not moderate treatment effects. Treatment effects for punishment, differential reinforcement, and treatment packages including punishment and differential reinforcement were largest; effects for extinction and FCT were slightly smaller; and effects for medication and extinction + reinforcement treatment packages were smallest. Differences between treatments not matched to function and treatments matched to function were observed, although evaluated to be negligible.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
