Abstract
Students with challenging behaviors can put a strain on both teachers and other learners in the classroom. While responding to challenging behavior in an effective and educative manner is critical for successfully teaching school-age children, teachers are often under-trained in evidence-based behavior reduction techniques. Timeout (TO) is defined as a behavior reduction strategy that involves removing access to reinforcement and reinforcers. TO procedures have a long history of research regarding its effectiveness and have been used by teachers and other service providers for decades. Furthermore, TO procedures can be categorized into two broad types (exclusionary and non-exclusionary) with a variety of implementation methods for each. This analysis examined published studies on TO and TO procedures for students identified as needing behavioral support. Along with analyzing the published research, we calculated the effect sizes of the TO procedures used for each study using single case research design methodologies (i.e., within-case standard mean difference and between- case standard mean difference) as appropriate and group design methodology (i.e., standard mean difference). We also evaluated publication bias on TO research for students who need behavioral support and analyzed the quality of each included study. Implications for research and practice are discuss.
Keywords
Teachers in today’s schools are responsible for creating environments that facilitate student learning while maintaining a positive and nurturing school climate that meets the diverse needs of many learners. Throughout the school day, teachers must simultaneously ensure that instruction is effectively and efficiently delivered, students are engaged in and learning academic material, and disruptive behaviors are averted (Martin, 2004; Sutherland et al., 2008). Students with challenging behaviors, including those with identified emotional and behavioral disorders (EBD), often exhibit a higher frequency of disciplinary issues than their non-disabled peers. As such, they are entitled to Free and Appropriate Public Education (FAPE) in the Least Restrictive Environments (LRE). Since the LRE is most often the general education classroom, the responsibility for providing these entitlements often falls to classroom teachers (Berkeley et al., 2020). This can strain teachers and other classroom learners (Maag, 2001).
Trends in behavior management over the past several decades have focused on providing multi-tiered positive, preventative, and restorative policies for addressing disruptive behavior in school settings (Sailor, 2015; D. B. Wilson et al., 2001; S. J. Wilson & Lipsey, 2007). Traditionally, however, punitive and exclusionary practices have been implemented to keep the behaviors of one student from negatively impacting others in the classroom (Twardawski et al., 2020). Over the past several decades, research reflects this shift from punishment oriented to positive practices.
There is a paucity of research detailing the more intensive yet efficacious practices for students with EBD who access their education in the general education environment. Brigham et al. (2016) suggest that inclusion models that insist on general classroom placement may hinder opportunities for academic and social gains due to reduced access to specialized instruction. While responding to challenging behavior in an effective and educative manner is critical for successfully educating school-age children, teachers are often under-trained in function and evidence-based behavior reduction techniques (Dutton Tillery et al., 2010; McKenna et al., 2021; Oliver & Reschly, 2017).
Punishment Principles, Behavior Reduction Strategies, and Time-Out
By definition, punishment procedures are behavior reduction techniques involving the presentation of an aversive or undesired event or stimulus (positive punishment) or removal of a preferred event or stimulus (negative punishment), contingent on exhibition of the target behavior, that decreases the future likelihood of exhibition of that behavior (Kazdin, 2001). Simply stated, punishment is a consequence-based contingency designed to quickly reduce, decrease, or lower behavior occurrences of disruptive or otherwise unwanted behavior. One commonly utilized behavior reduction strategy is time-out.
Timeout (TO), also referred to as time-out from positive reinforcement, is a negative punishment procedure because it relies on the contingent removal of access to desirable items, activities, or people to affect a decrease in the future frequency of a target behavior (Hall et al., 1971; Simonsen et al., 2008). TO is more aversive than some behavior reduction procedures (e.g., differential reinforcement, extinction) but not as aversive as others (e.g., overcorrection, restitution; Vegas et al., 2007). Even so, time-out has a long-established history of being a first-choice disciplinary procedure, possible because it creates a negative reinforcement contingency for the teacher by removing an aversive stimulus from the environment. Because the removal of the students acts as a reinforcer, the teacher will likely continue to remove the student from the environment under the same conditions in the future. Time out procedures may also reduce aggressive acts in other students who observe a peer receiving TO (C. C. Wilson et al., 1979) and, when implemented correctly, is considered an evidence-based practice (Vegas et al., 2007).
Research has shown that teachers frequently implement TO procedures in inclusive classrooms. In an early study, 70% of pre-school through grade 12 teachers surveyed reported using time-out procedures (Zabel, 1986). Of the 730 teachers surveyed, preschool (88%), early elementary (78%) and intermediate elementary level teachers used time out procedures most frequently, however, middle school (65%) and high school teachers (51%) also reported using TO as a disciplinary procedure. Similarly, in a sample of 20 elementary level teachers surveyed about negative individual behavior management strategies, all reported using a system that included a time out procedure (Dutton Tillery et al., 2010).
Several variations to the TO strategy exist along a continuum of least to most intrusive. Harris (1985) identified three types of time out (non-exclusionary, exclusionary, and isolation). More recently, Cooper et al. (2020) included isolation time out as a type of exclusionary time out, and this work adopts Cooper’s description. Both non-exclusionary and exclusionary time out can take multiple forms.
Non-Exclusionary TO Strategies
Non-exclusionary time-out involves the implementation of TO procedures while the student is within the active reinforcing setting exemplified by the removal of one or more of four elements including (a) reinforcement opportunities, (b) the reinforcer itself, (c) barriers to observing preferred activities but no opportunity to participate, or (d) moving the individual to a different location in the time-in setting (Brantner & Doherty, 1983). This type of TO includes planned ignoring, contingent observation, termination of specific reinforcer contact, and partition/select space time-out (Payne et al., 2005; Ryan, Sanders, et al., 2007).
Planned Ignoring
Planned ignoring is the deliberate removal of social reinforcers (e.g., attention, interaction, engagement) contingent upon student’s exhibiting undesired behavior. When implemented, planned ignoring does not remove the student from the environment or situation (Turner & Watson, 1999). Teachers can use planned ignoring when socially mediated attention maintains the challenging behavior (Kee et al., 1999). Furthermore, Wolf et al. (2006) suggested using planned ignoring for behaviors that fell into four categories: (a) is mildly disruptive, (b) is exclusively for teacher attention, (c) is not for peer attention, and (d) is not harmful to people, self, or others. For example, a student yells out a teacher’s name and the teacher does not acknowledge the yelling. It is important for the teacher to ignore the undesirable behavior consciously and consistently while richly reinforcing the student’s engagement in desired behaviors that serve the same function (e.g., hand raising to get teacher attention).
Contingent Observation
Contingent observation, also known as “sit and watch,” is a time-out procedure in which access to the sources of reinforcement are removed for a specified time contingent upon the exhibition of desired behavior (Rutherford & Nelson, 1982). During contingent observation, the student remains in the classroom but is moved to an area where they can still observe the class. For contingent observation to have the desired effect, the environment from which the student is removed must be reinforcing. In addition, to increase the reinforcing qualities of the “time in” setting, the teacher must reinforce the expected behaviors of the target student’s peers in the presence of the target student (Turner & Watson, 1999). An example of contingent observation includes having a student sit and watch their peers during recess for a short time after exhibiting aggressive play.
Termination of Specific Reinforcer Contact
The termination of specific reinforcer contact time-out strategy involves removing a desired object (i.e., positive reinforcers: food, toys, etc.) that is available to the student or in the student’s possession for a brief amount of time contingent on engaging in a target problem behavior (Alberto et al., 2002). The reinforcing stimulus is delivered to the student immediately upon display of desired behavior (Harris, 1985). One specific strategy connected to this time-out procedure is the use of time-out ribbons. During the time-out ribbon procedure, all students are given a ribbon to keep at their desks or wear. The presence of the ribbon acts as a discriminative stimulus signaling that positive reinforcement of desired behaviors is available (e.g., behavior specific praise, non-contingent reinforcement). When any disruptive behavior occurs, the teacher removes the ribbon. Absence of the ribbon conveys the message that reinforcement is not available and will be withheld either for a specified period of time (e.g., 3 minutes) or until the misbehavior stops (Donaldson & Vollmer, 2011; Foxx & Shapiro, 1978). Once the student possesses the ribbon, they resume earning reinforcement. A practical example of a TO ribbon procedure would involve a teacher targeting out-of-seat behavior. If the student leaves their seat without permission, the teacher removes the ribbon, signaling the inability to earn reinforcement. Once the students return to their seats, they can have the ribbon back. The teacher would then contingently provide reinforcement for the student’s in-seat behavior.
Partition/Select Space Time-Out
Cooper et al. (2020) describe partition/select space time-out as “the student remains within the time-in setting, but his view within the setting is restricted by a panel or cubicle, or a select space is arranged to serve as the time-out area (i.e., a carpet, a corner)” (p. 395). Partition time-out procedures do not include physical removal of the student from the environment. For example, a desk with a study carrel facing away from the class limits the student’s view of the time-in environment. Similarly, portable partitions can be arranged in such a way as to create a separate TO space within the classroom, allowing the teacher to continue monitoring student behavior. Partition TO may be most effective when the activity is highly desirable and/or reinforcing for the student.
Exclusionary TO
In contrast to non-exclusionary TO, exclusionary TO involves removal of the student from a reinforcing setting, contingent on exhibition of undesirable behavior, for a specified period (Ryan, Peterson, et al., 2007). Forms of exclusionary time-out are (1) time-in setting removed from participant and (2) participant removed from time-in setting (Payne et al., 2005; Ryan, Peterson, et al., 2007). Wolery and colleagues (1988) defined exclusionary TO as any procedure that includes a student’s removal from instructional activities, without any opportunity to participate in those activities. That is, the individual is removed from the reinforcers or reinforcing setting or condition. Exclusionary procedures are considered more intrusive than non-inclusionary procedures (Ryan, Peterson, et al., 2007).
Exclusionary time-out has been used in hallway time-outs and time-out rooms. A hallway TO involves having the student stand in the hallway, near the time-in environment, for a short period of time (Cooper et al., 2020). The student can hear what is going on in the classroom (and look in) and the teacher is able to monitor the student’s behavior. This type of TO is effective with certain conditions being met including (a) few distractions or reinforcers in the hall, (b) teacher unobstructed observational angles, and (c) escape is not the function of inappropriate behavior. Time-out rooms are considered seclusion or isolation TO and are the most restrictive form of time-out (Mayerson, 2003). When students are removed to an isolation time- out room, they are placed in a room for a specified amount of time (Costenbader & Reading-Brown, 1995) and have no interaction with the teacher or peers. According to the Division of Emotional and Behavioral Health (DEBH; Freeman et al., 2023) of the Council for Exceptional Children, this form of TO requires the user to answer a few questions before implementation, including:
are there local, state, and federal laws or ordinances that prohibit or guide use?
has there been consideration of the appropriateness and impact of TO room as a consequence to the target student, peers, personnel, and property?
is there the appropriate amount of personnel, and are they appropriately trained?
It is important to note here that the “Keeping All Students Safe Act” was introduced in congress in May 2021. The bills would ban all seclusion and restraint practices in schools receiving federal funding. The bill was written in response to data from the 2017–18 academic year showing that 78% of all students subjected to seclusion and restraint were students with disabilities who were identified as Black/African American and boys. As of this writing, the bill has not progressed past the introductory phase.
Concerns With Time-Out Procedures
TO is an attractive intervention. It is easy to implement, widely considered as appropriate, reduces unwanted behaviors quickly and can be combined with other interventions. The ultimate goal of classrooms management, however, is to create equitable learning environments for all students, not just those who conform to the existing norms of the prevailing school culture (Weinstein et al., 2004). The actions of students who are perceived as physically, culturally, or cognitively different are at greater risk for having their “actions misperceived and judged unfairly” (Cartledge & Kourea, 2008, p. 352). Subsequently, these students may be more likely to experience disciplinary time-out interventions.
Responding to challenging behavior in an effective, ethical, and educative manner is critical; the side-effects related to punishment procedures may decrease their potency. In addition to the concerns regarding culturally responsive practices, functional concerns also exist. For instance, escape-maintained behaviors of students are removed from inaccurately identified reinforcing environments may the inadvertently reinforced (Arthur, 2008; Connelly, 2017). It is also possible that the attention a misbehaving student receives from their teacher or peers is more reinforcing than the TO is punishing (Maag, 2001). Time away from instruction during TO may lead to decreased learning opportunities (Simonsen et al., 2008) for the student, and there has been continuing concerns that exclusionary procedures violate the Individuals with Disabilities Act (IDEA, Jones & Felder, 2009; Osborne, 2001) and the right for a FAPE (Bon & Zirkel, 2014; Zirkel, 2013). As mentioned previously, the removal of a disruptive student from the environment also creates a negative reinforcement contingency for the teacher who escapes an aversive condition and, as a result, may continue to exhibit the exclusionary behavior.
Purpose of the Current Study and Research Questions
Current trends in education advocate for using positive and proactive approaches for managing student behavior in schools and classrooms. For SWD, the reauthorization of the IDEA in 1997 further specified the requirement that schools utilize positive behavioral interventions, strategies, and supports to address behaviors that impede learning and to prevent behavior from reoccurring through the teaching and reinforcement of functional replacement behaviors (Turnbull et al., 2001). Despite this provision, punitive practices such as time-out may still be appropriate for reducing unwanted behaviors. TO has a long history of implementation and success as it relates to behavior reduction techniques for problematic or challenging behaviors. However, there have been surprisingly few syntheses of TO research to ascertain the overall effectiveness of time-out strategies for students with EBD collectively or individually.
The purpose of the present review and analysis is to review the research literature collectively focused on time-out procedures for students with EBD and conduct an analysis of published empirical studies, dissertations, and theses of the use of time-out for students with EBD in educational settings. To date, only one literature review (Wolf et al., 2006) and one meta-analysis (Vegas et al., 2007) have been conducted specifically in relation to TO procedures and both focused on typically developing students. Given that students with EBD are disproportionately subjected to exclusionary disciplinary practices, it is crucial that data informing the virtue of time out as an evidence-based practice be explored and disseminated.
The current review and analysis will focus on the following research questions:
What are the characteristics of research studies examining the use of time-out procedures for students identified with an EBD in educational settings?
What are the effect sizes of research studies examining the use of time-out procedures for students with EBD in educational settings?
What is the quality of research studies examining the use of time-out procedures for students with EBD in educational settings based on appropriate quality indicators?
What is the extent of publication bias within research literature examining the use of time-out procedures for students with EBD in educational settings?
Method
For the purposes of the current special issue, the authors proceeded through multiple stages and disclosures. First, the authors pre-registered the review and analysis methodology for peer-review prior to continuing the study. The disclosure includes providing data to support validity and reliability in our search procedures, inclusion criteria, study selection, data extraction, analysis of data, and interrater reliability. Second, we provided all relevant materials used in the study including coding, websites, calculators, and rubrics used for analysis.
Literature Search and Study Selection
A systematic search of the relevant literature was performed consistent with the PRISMA methodology to identify the extant research aligned to the focus of the review and analysis (Liberati et al., 2009). In an effort to gather all available research on time-out for students with EBD, the authors reviewed databases that house published and gray literature (i.e., thesis, dissertations, pre-printed and pre-registered studies). Studies in this review were gleaned from systematic searches of the following databases: Academic Search Complete and Premiere, Psychological and Behavioral Sciences Collection (i.e., PSYCArticles and PsychINFO), Education Resources Information Center (ERIC), Open Access Theses and Dissertations (OATD), ProQuest Dissertations & Theses Global, EdArXiv, and PsyArXiv.
The following Boolean phrase were used in each database search related to population of interest within the full text of the manuscript (‘‘emotion* OR/AND behave* disorders’’ OR/AND ‘‘emotion* disturb*’’ OR/AND “EBD” OR/AND “conduct disoder”) AND intervention of interest (‘‘time-out” OR “timeout’’ OR ‘‘exclusion*’’ OR ‘‘non-exclusion*’’OR “seclusion*” OR “non-seclusion*” OR ‘‘ribbon’’ OR “timeout room” OR “contingent observation” OR “planned ignoring”). We did not include limiters during the search for potential studies. That is, date of publication or availability was not included as a factor during the search. Duplicate articles were removed.
As per the requirements of this special issue, the authors conducted forward, backward, hand, and ancestral/first author searchers. Forward searches included all studies that met criteria that are from primary sources. Backward searches were conducted on studies cited in the primary studies. We conducted a hand search of journals that focus on students with EBD (e.g., Behavior Disorders; Journal of Emotional and Behavioral Disorders) and journals that have a history of publishing time-out studies as evidenced from our search (e.g., Journal of Applied Behavior Analysis) with no date limiters. A hand search of journals that target EBD populations was selected as the participants in those journals are the primary participants sought out for the present review. Lastly, the authors conducted Ancestral/First Author searches that focus on locating publications referenced in primary sources by first or corresponding authors.
Inclusion Criteria
Studies included in this review and analysis must include the following: (a) be written in English, (b) include students identified as ED or EBD as defined by the DSM-5-TR (c) occur in a PreK-12 educational settings (i.e., early education center, public school, private school, alternative school), (d) explicitly state the use of time-out strategies or procedures (e.g., time-out rooms, contingent observation, time-out ribbons), (d) use a quantitative experimental design (i.e., single-case or group design methodology), and students must be school aged (i.e., 3 years of age to 21 years of age; American Psychiatric Association [APA], 2022). Students with ED/EBD in group design studies must be disaggregated as a group to allow for analysis. For single-case research studies, at least one student must be identified with ED/EBD based on design (e.g., multiple baseline or multiple probe designs).
Exclusion Criteria
Studies were excluded from the current review that were: (a) written in a language other than English, (b) not quantitative empirical studies (e.g., qualitative studies, practitioner articles), (c) not strategies or procedures that can be considered as time-out, (d) not focused on students with ED or EBD; (e) not implemented educational settings (e.g., home). See Figure 1 for the search and inclusion process.

PRISMA Manuscript Search and Inclusion Process.
Data Extraction and Coding
Data was extracted from the included studies and coded for student-specific and study-specific variables. Student-specific variables included: (a) per participant gender (i.e., female; male), (b) setting (i.e., early education center, public school, private school, alternative school), (c) school grade level (i.e., pre-k; elementary; middle; high), and (d) participant race/ethnicity (i.e., Black; White; Hispanic; Asian; not specified). Study-specific variables included the following variables: (a) research design (i.e., group design; single case), (b) specific within-design features (e.g., quasi-experimental; multiple baseline design), and (c) publication type (e.g., research manuscript; dissertation). Variables specific to TO as the intervention of choice were coded as follows: (a) category of time-out (i.e., exclusionary; non-exclusionary), (b) specific type of time-out (e.g., ribbon; partition; room), (c) dependent variable, (d) implementation rationale (e.g., disruptive behavior, hitting, screaming), (e) TO implementer (e.g., teacher, para, other), (f) function of behavior (e.g., escape), and (g) behaviors occurring during TO, (h) length TO lasted (e.g., 2 min or less), and (i) adverse effects of TO reported (e.g., yes or no).
Quantitative data extraction was based on the type of design methodology used per study. The study results of group design studies were reviewed by the authors. Each study was examined for means, standard deviations, and sample sizes for treatment and comparison groups at pre- and post-test intervals. If this information was not reported, extracted data included any reported statistical outcomes (e.g., chi-square statistics, F-scores, T-scores). The WebPlotDigitizer (Version 4.5; Rohatgi, 2015) was used to extract precision data points for single case design studies. WebPlotDigitizer is a web-based software application that extracts XY axis coordinates form single case design graphic images. Once the single case data is extracted, it was compiled per case on an Excel spreadsheet.
Effect Size Data Analysis
Data analysis for both single case and group design studies consisted of calculating for Cohen’s d statistic using standard mean difference. Analysis was conducted to determine the effectiveness of TO based on individual study results per participant, omnibus study results across all studies, and based on participant, study, and TO variables from data extraction.
Single Case Designs
The authors used within-case (Busk & Serlin, 1992; Gingerich, 1984) and between case (Hedges et al., 2013; Pustejovsky et al., 2014) standardized mean difference (WC-SMD and BC-SMD respectively) to determine effect size outcomes for the effects of TO (independent variable) on the identified dependent variable. All effect sizes were calculated using a 95% confidence interval. The online Single Case Effect Size Calculator was used to calculate the WC-SMD effect sizes (Pustejovsky et al., 2024) with BC-SMD calculated using the Between-Case Standardized Mean Difference Estimator online calculator (Pustejovsky et al., 2023). WC-SMD provides Cohen’s d statistic, however, due to the nature of the analysis it can only be used to determine significance of effect size as each study acts as its own control. Thus, WC-SMD cannot be used for cross design comparisons. BC-SMD allows for comparable Cohen’s d effect size estimates as with between-case experimental designs (Hedges et al., 2012; Shadish et al., 2014). Hedges et al. (2013) and Pustejovsky et al. (2014) described the use of BC-SMD specifically for effect size analysis of multiple baseline designs and reversal designs. This statistic for single case design methodology is familiar to researchers of quantitative group design studies (Shadish et al., 2013). WC-SMD effect size interpretations include as small effect as 0 to 1, a medium effect as 1 to 2.5, and a large effect as greater than 2.5 (Harrington & Velicer, 2014; Pustejovsky, 2019; Zimmerman et al., 2018). BC-SMD effect size interpretations are a small effect as 0 to 0.2, a medium effect as 0.3 to 0.7, and a large effect as above 0.8 (Cohen, 1988).
Group Designs
Cohen’s d effect size statistics were gleaned from standard mean differences from group design studies. Following suggestions from Hedges (1981), we reported unbiased standardized mean differences using means, standard deviations, and group sizes obtained at post-test from treatment and comparison groups or other relevant statistical results (e.g., t-scores; F-scores). The authors used the Campbell Collaboration effect size calculator to garner the Cohen’s d statistic (D. B. Wilson, n.d.).
Publication Bias
The authors of the current study analyzed the effect sizes from the literature to determine publication bias. Publication bias analysis was conducted via Egger’s test method (Egger et al., 1997) and a funnel plot of effect size results and use of a funnel plot of effect size results for visual analysis. As suggested by Egger et al. (1997) the Egger’s test method uses a weighted regression analysis to examine the relationship between effect size estimates and their precision measures. A significant intercept (p < .05) indicates publication bias. A funnel plot based on the effect sizes was produced of the included studies and their variance (Kossmeier et al., 2020; Light & Pillemer, 1984; Peters et al., 2008).
Quality Indicator Analysis
The current study’s authors used the guidelines introduced by (Cook et al., 2014) to determine the quality of each single case design and group design study. Based on eight quality indicators (Cook et al., 2014) and the quality indicator matrix (Royer et al., 2017), we assessed the merit of each indicator for each study (i.e., 1 = met indicator; 0 = indicator not met). Royer et al. (2017) suggest using weighted criteria to determine if a study is methodologically viable (i.e., 80% = sound methodologically). Each study will be graded on the soundness of their methodology as suggested by Royer et al. (2017).
Interrater Reliability
Two raters were used to compare (a) search results, (b) included studies, (c) extracted data, and (d) quality indicators. Coders were trained for interrater reliability using sample studies of time-out that did not include students with ED/EBD. Once coders reached 90% agreement in practice, evaluation of included studies began. Disagreements were resolved through discussion to increase agreement between the two raters. Separate spreadsheets for each evaluation (data extraction and quality indicators) were used for each of the raters (2 x 2). The primary coder reviewed 100% of the included studies, and the secondary coder randomly reviewed 40%. Each study was independently coded with the goal of 90% agreement between the two raters across all evaluation metrics.
Results
The initial search generated 3919 results. After duplicates and misidentified studies were removed, 319 remained. After screening titles and abstracts, two authors agreed on 37 potential studies to consider for inclusion. A hand search for any recent publications yielded zero results, as did an ancestral search following the same procedures. Following a discussion, the raters reached 100% agreement that 13 studies met the criteria for inclusion. All included studies were manuscripts published in peer-reviewed journals. As a preregistered study, access to the extracted data from the included studies along with the coding metrics are available via the Open Science Framework (OSF) website (http://doi.org/10.17605/OSF.IO/9D3JN).
Descriptive Results
Within the 13 selected studies, there were a total of 182 participants. Descriptions of participants, study-specific variables, time-out-specific variables, behaviors consequenced with time out, and length of time out are documented in Table 1. Ages of participants ranged from three to eighteen years old, and the majority (81%) were male Only three studies reported student grade levels, however, based on reported ages and settings, it can be determined that students were educated in pre-kindergarten through grade 12 classes. The most common settings for the intervention was a special education school (n=4) or a general education (inclusion) classroom (n=4), followed by a special education classroom housed within a neighborhood (public) school (n=3), a resource room setting (n=1) and a free play setting (recess; n=1). The time-out intervention utilized in each study was coded according to criteria described by Cooper et al. (2020), with exclusionary time-out (e.g., time out room or removal to another location; n=5) and non-exclusionary time-out (e.g., time out ribbon, contingent observation; n=5). The remaining studies (n=3) used a combination of non-exclusionary and exclusionary time-out assigned on a least to most restrictive continuum. The dependent variable for most studies (n=7) was to decrease the number or rate of specific inappropriate or unwanted behaviors (e.g., cursing, hitting, spitting, self-stimulating behaviors). The remainder of the studies were concerned with rates of student compliance with teacher instructions (n=3) and patterns of staff implementation of time-out procedures (n=3).
Characteristics of Included Studies.
Note. GD = group design, SCRD = single case research design, M = means, SD = standard deviation, NR = not reported, yo. = years old, gr = grade, Fe = female, Ma = male. SS = special school, PK = pre-kindergarten, K = kindergarten, S-C = self-contained classroom, GE = general education classroom, RR = resource room.
timeout room. bcontingent observation. ccombined types. dremoval of opportunities for reinforcement.
Effect Size Results
WC-SMD was calculated for each subject-level analysis in all single case design studies. Across 10 studies that used single case design, 26 cases were analyzed for effect size. The omnibus effect size was strong (WC-SMD = 2.55, SE = 0.54, 95%CI = 0.72 - 3.68) based on the interpretations recommended by Cohen et al. (2017). Individual study effect sizes per case ranged from weak effect (WC-SMD = 0.20, SE = 0.38, 95%CI = −0.54 to 0.95; Twyman et al., 1994) to strong (WC-SMD = 8.78, SE = 1.41, 95%CI = 6.02 - 11.55; Kee et al., 1999). See Table 2 for the effect size analysis on a per case level.
Case by Case Effect Size Results per Single Case Deign Study.
BC-SMD effect size analysis was conducted on each study that contained at least three baseline to intervention comparisons (n = 4). The omnibus effect size was extremely large (BC-SMD = 1.24, SE = 0.53, 95%CI = −0.06 to 1.58). Two studies, Donaldson et al. (2013) and Sachs (1973), resulted in strong effects with study effect sizes of BC-SMD = 1.45 and BC-SMD = 2.205 respectively. Grskovic et al. (2004) had a moderate effect of BC-SMD = 0.85. Kee et al. (1999) had an overall modest effect with a BC-SMD = 0.46.
Cohen’s d effect size analysis was conducted on the three group design studies across 53 reported comparisons. Effect size analyses were calculated per study and as an omnibus measure across all three studies. The omnibus effect size for the studies that used a group design was d = 0.50 (95%CI = −0.68 to 1.19), indicating a modest effect. Pease and Tyler (1979) had an overall effect size of d = 0.98 (95%CI = −0.20 to 1.92), indicating a moderate effect. Both Costenbader and Reading-Brown (1995) and Stage (1997) had modest effects, resulting in d = 0.45 (95%CI = 0.14 - 0.76) and d = 0.48 (95%CI = −0.62 to 0.88) respectively.
Publication Bias
The effect size statistic (Cohen’s d) and the variance was used to conduct the publication bias analyses. Results of the Egger’s Test (intercept = 2.69; p = .00) suggests publication bias may be present. However, results from the funnel plot was not significantly asymmetrical. Consequently, results of publication bias show mixed results. That is, the intercept test (Egger’s Test) showed the possibility of bias while the funnel plot indicated little bias. See Figure 2 for a visual depiction of the funnel plot.

Funnel Plot Visualization of Publication Bias.
Quality of Included Studies
Of the 13 studies included in this meta-analysis, four met all of CEC’s (2014) evidence-based standards using the absolute coding criteria while three met the 80% weighted coding criteria threshold (Lane et al., 2009). Each study is deemed methodologically sound if all eight QIs (absolute coding) or 80% of all eight QIs are met (weighted coding; Cook et al., 2014; Royer et al., 2017). Four studies met seven indicators, two met six indicators, two met five indicators, and one met four indicators The following indicators were met by all studies (1.1, 1.2, 4.1, 4.2, 6.2, 6.3, 6.5, 6.6, 6.7, 6.8, 6.9, 7.1, 8.1, and 8.3). Commonly missed were descriptions of implementation fidelity related to dosage or exposure to direct reliable measures (5.2=64.29% met), clear descriptions of assignment to groups in group studies (6.4= 66.67%), and evidence of adequate internal, inter-observer, test-retest, or parallel form reliability (7.5=64.29%). Also frequently omitted were descriptions of specific training or qualifications required to implement the intervention (3.2=71.43%), and reports of implementation fidelity related to adherence using direct reliable measures (5.1= 71.43%).
Discussion
The purpose of this meta-analysis was to investigate the characteristics, effect sizes, study quality and publication bias of time-out interventions for students identified with EBD. Overall, examining the effects of TO based on design type (i.e., single case research design vs. group design) and unit of analysis (i.e., case vs. study). When omnibus effects are reviewed by cases (within-case) and across studies (between-case) the interpretation of results were large or extremely large (Cohen, 1988). However, the overall effects of TO for group design studies was moderate (Cohen, 1988). In determining quality indicators absolutely and weighted, few studies in the current meta-analysis met the CEC indicators. Results are discussed in relation to the research questions followed by limitations and implications.
Study Characteristics
There are several important things to note within this body of literature.
To begin, previous literature on TO supports varied implementation and based on types of TO (Cooper et al., 2020; Lieneman & McNeil, 2023; Miltenberger, 2016). While TO as a behavior change procedure has a long history of implementation, the use of TO has been decreasing over the decades.
There have only been two TO research studies focused on students with EBD published since 2000 (Donaldson et al., 2013; Grskovic et al., 2004). This decline in the use of TO may be related to the changing zeitgeist in education. TO as an intervention procedure has been questioned due to ethical and legal concerns regarding implementation (Freeman et al., 2023; Zirkel, 2016). Eleven of the thirteen studies used exclusionary (e.g., TO room) or non-exclusionary (e.g., TO ribbon) as the means of implementation. Two studies (Earles & Myles, 1994; Grskovic et al., 2004) used a combination of the two. The majority of the TO procedures used were done so to combat inappropriate or problem behaviors (e.g., non-compliance; cursing), however, one study (Sachs, 1973) implemented TO for decreasing self-stimulating behaviors. The focus on problem behaviors is typical of the use of TO regardless of who is implementing TO (Everett et al., 2010) or disability status (Fabiano et al., 2004; Ryan, Sanders, et al., 2007). Surprisingly, only 4 of the 13 studies (Costenbader & Reading-Brown, 1995; Pease & Tyler, 1979; Sachs, 1973; Stage, 1997) had participants that were teenagers (i.e., 13+ years old). The majority of the study focused on students in the preK and elementary grades. This would suggest that TO is most effective for younger students who need behavioral support. Lastly, a consistent reality of students who receive services that are identified as EBD is that the majority of them are male. This is consistent with the included TO studies. Males were the participants in the TO studies at a rate that was almost 5x of females (e.g., 147 males vs 32 females).
Effects of TO and Publication Bias
The effect size results of the studies included in this analysis were overwhelmingly positive and ranged from medium to extremely large. These findings are typical for meta-analytic works that are often connected to publication bias. Previous researchers have cautioned about the conclusions that should be drawn from meta-analytic studies as they are generally compiled from published studies that are most often positive in their results (Gage et al., 2017; Ropovik et al., 2021). In addition, Ropovik et al. (2021) explains that the studies used identified for meta-analyses, tend to show large or extremely large effects thus further compromising the conclusions and implications of the intervention being examined. The publication of each of the studies included in this analysis were found to have positive results based on the metrics used at the time of original publication. Based on current effect size analysis metrics, some of which did not exist at the time of the original publication, the results were tilted toward medium to large effects with the exception of Twyman et al. (1994) which had a small effect.
Quality Indicators of Studies
With the establishment of the CEC (2014) quality indicators, standards have been applied to evaluating research studies for elements that should exist in research. Since every study included in this analysis was published before the quality indicators were released, it is unsurprising that none of the studies met all of the indicators. This can be attributed to the fact that prior to quality assessments, researchers were mostly bound to the publication requirements of the journal they were submitting to and not to an industry or field-wide criteria. Thus, studies included any number of combinations of elements that did not lead to uniformity.
Study Limitations
There are a number of limitations to the current meta-analysis that should be considered when reading. First, when analyzing cases, we did not analyze specific TO procedures as an intervention and only focused on the broad categorizations of TO (e.g., exclusionary vs. non-exclusionary TO). Therefore, the effect specific TO interventions had on target behaviors is unclear. Second, while we believe our search methods were extensive and thorough, it is possible we may have missed articles investigating TO for students with EBD in the search process. Relatedly, based on the nature of coding articles (an identified subjective process; see Losinski et al., 2019), it is possible that researchers replicating our process may get similar but slightly different results in their search. Finally, we were not able to examine the studies based on race/ethnicity as this information was not available for most of the studies.
Conclusion and Implications
The use of TO has been under a microscope for the past few decades. In fact, the ethical and legal ramifications of using TO has been questioned since the 1970s (Gast & Nelson, 1977). As behavior interventions have increasingly focused on positive behaviors alongside a decrease on punishment-based interventions. When paired, the questions regarding the appropriateness of TO as an intervention and the focus on positive interventions and supports has resulted in fewer uses of TO. This is evidenced in the lack of research on TO use for students with EBD since 2000. This is particularly evident in the recommendations from the Division for Emotional and Behavioral Health (DEBH; formerly the Council for Children with Behavior Disorders) that suggest the elimination of seclusion-based interventions in schools (i.e., exclusionary TO; Freeman et al., 2023). While the elimination of exclusionary TO may be warranted, this meta-analysis provides evidence to suggest that non-exclusionary TO can be effective for students with EBD. With that in mind, other authors have provided detailed guidance for using TO procedures ethically and within the legal parameters of students receiving special education services (Ryan, Sanders, et al., 2007; Zirkel, 2016).
The effectiveness of TO has been ignored since the early 2000s. The authors of the current analysis view this as positive and negative. Positively, the abusive implementation of TO has been theorized and written about for years (Ryan, Sanders, et al., 2007; Siegel & Bryson, 2014). Critiquing TO and its usage is a good thing. Better understanding the when, where, how, and who to use TO with is important to consider when protecting children from harm and trauma (i.e., exclusionary TO). However, the complete elimination of an established, researched, and effective intervention (i.e., non-exclusionary TO) is foolhardy and is strongly discouraged (Quetsch et al., 2015). Ultimately, there is a need for new research on the use of TO for students with EBD that reflect our updated knowledge of both TO as an intervention procedure and research methodology based on quality indicators.
Footnotes
Authors’ Note
That study data, code, and materials are available to reviewers and editors upon request.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
