Sage Journals: Discover world-class research

Abstract

Background: Previous reviews did not include sources that were published in the last 14 years, indicating a need for an updated synthesis on the time delay literature. Objectives: This review was designed to synthesize outcomes for studies assessing time delay prompting procedures for teaching young children. Data sources: PsycINFO and ProQuest Dissertations & Theses Global. Eligibility criteria: To meet inclusion criteria, studies had to be written in English, include a single case design with the primary intervention being time delay. Methods: We coded descriptive characteristics and evaluated rigor using the Single-Case Analysis and Review Framework. Results: A total of 33 published and unpublished sources were included. Results were mixed but suggest that time delay is effective for teaching young children with intellectual and developmental disabilities. Limitations: We did not calculate effect sizes or conduct a meta-analysis due to the inappropriateness of most effect size metrics for acquisition data. Conclusions: Additional research is needed to (a) determine in which contexts time delay can and will be used by endogenous implementers; (b) assess outcomes for children who are non-imitative; and (c) evaluate child preferences for learning variations.

Keywords

time delay constant time delay progressive time delay instruction single case research design

Recommended practices in early childhood special education include using systematic instructional strategies with fidelity to promote learning and individualizing the frequency, intensity, and duration of support to provide each child with the level of instruction needed to make optimal gains (Division for Early Childhood, 2014). Although children often learn common skills, such as pre-academics, in typical environments with little explicit instruction, some young children with disabilities benefit from frequent learning opportunities incorporating planned, systematic, evidence-based procedures (Greenwood et al., 2018; Ledford et al., 2019).

Response prompting describes a category of interventions used to increase correct responding when learning a new skill (e.g., color naming). These procedures can be broadly defined as providing the assistance needed to ensure correct responding, with eventual transfer of stimulus control to the typically occurring stimulus (Cengher et al., 2018). That is, children initially respond correctly in the presence of the prompt, but eventually learn to respond correctly even when prompts are not provided (Touchette, 1971). Time delay (TD) is a specific response prompting procedure intended to result in near errorless learning of new skills (Wolery & Gast, 1984). TD is a single prompt procedure—that is, the same prompt is used for all teaching sessions, for a given behavior, and no other prompts are provided. The prompt used during TD is referred to as a controlling prompt—assistance of any form (e.g., verbal, gesture, etc.) that consistently evokes the correct response. Compared with controlling prompts, intermediate (or non-controlling) prompts provide the child with a hint and do not consistently result in the correct response; these types of prompts are not used in TD procedures.

When using TD strategies, the controlling prompt is immediately delivered during initial instructional trials—referred to as 0-s delay trials. Following a specified number of 0-s delay trials, the instructor increases the number of seconds between the delivery of the cue and the delivery of the controlling prompt (e.g., a latency of 3 s). This short interval provides the individual with an opportunity to independently respond with the correct answer prior to the delivery of the controlling prompt (cf. Ledford et al., 2019). During this delay, the learner could give an unprompted error response rather than an unprompted correct response (Heckaman et al., 1998); thus, early learners may primarily benefit from TD if they can—or are able to quickly learn to—wait for the controlling prompt (Ledford et al., 2019).

Two primary variations of TD exist: constant TD (CTD) and progressive TD (PTD; Ledford et al., 2019). The difference between CTD and PTD is that implementers gradually increase the delay interval for PTD and hold it constant for CTD. The final delay interval depends on the complexity of the behavior and the typical response latency of the child (e.g., naming a letter takes less time than sounding out a word and a child with apraxia may have a longer latency to respond than a child with no language delay). Although TD procedures (e.g., controlling prompt, wait interval) are well-established, numerous variations in delivery model are possible. For example, TD can be conducted in massed (e.g., 10 trials in a row), distributed (e.g., 10 trials distributed throughout the day), or embedded (e.g., 10 trials interspersed within a specific play activity) trials and it can be used with or without instructive feedback (i.e., extra information in the consequent statement), error correction, and interspersing known stimuli.

Several previous reviews have evaluated the outcomes of TD interventions (Handen & Zane, 1987; Ledford et al., 2012; Schuster et al., 1998; Walker, 2008; Wolery et al., 1992). The foci of the reviews varied, with some restrictions to specific behavior categories (e.g., chained tasks; Schuster et al., 1998), diagnosis (e.g., autism; Walker, 2008), or instructional arrangement (e.g., small group; Ledford et al., 2012). None focused on early childhood participants or included studies published in the last 14 years. Previous reviews have also not reported the extent to which variations are used and shown to be effective. One review published 30 years ago (Wolery et al., 1992) found CTD to be effective in teaching discrete behaviors to a variety of individuals across a number of instructional arrangements and settings. Another review (Walker, 2008), published 15 years ago, found that CTD resulted in more errors to criterion, over PTD, when teaching autistic¹ individuals.

Given the ever-expanding corpus of studies, this review expands on the previous ones in several ways. First, we included peer-reviewed and non-peer-reviewed literature (i.e., theses and dissertations), allowing us to detect potential publication bias (Chazin et al., 2021; Sham & Smith, 2014). In addition, we excluded studies without sufficient experimental rigor for interpreting outcomes (e.g., without three potential demonstrations of effect). Given that quality can vary even given three potential demonstrations, we considered outcomes in relation to quality and rigor (Q&R; Ledford et al., 2020; Ledford & Gast, 2018; What Works Clearinghouse, 2020). That is, we rated the quality of included studies and considered this rating in relation to outcomes. Finally, we did not exclude studies based on specific participant diagnoses or target behavior criteria which may allow us to make broader generalizations when compared with previous reviews. Thus, we systematically evaluated the quality/rigor, characteristics, and outcomes of studies evaluating TD for teaching young children. The research questions were as follows:

Research Question 1 (RQ1): What are the characteristics of TD studies (e.g., participant characteristics, instructional contexts, independent and dependent variables) for preschool children across early childhood settings?

Research Question 2 (RQ2): What is the overall quality/rigor of designs?

Research Question 3 (RQ3): Do outcomes vary by participant characteristics or variations in the independent variable?

Research Question 4 (RQ4): Do outcomes vary by publication status?

Method

Coding Team

The first author, a certified teacher and behavior analyst enrolled in a doctoral program in special education with experience coding for systematic reviews and conducting TD, trained five master’s student coders. Prior to conducting coding for the present study, all master’s students were required to reach a minimum 90% agreement with the first author on at least three practice sources. Once trained, master’s students conducted all descriptive coding for included sources. In addition, quality and outcomes were coded via the Single Case Analysis and Review Framework (SCARF; Ledford et al., 2020) by the first author and a PhD-level behavior analyst with expertise in single case design and systematic instruction. Six out of the seven team members identified as non-Hispanic White females, and one identified as an Asian female.

Literature Search

A search was conducted on September 10, 2020 and updated on January 6, 2022, using PsycINFO and ProQuest Dissertations & Theses Global. The following search string was used to identify potential eligible studies: ab (“time delay” or “prompt delay” or “prompt and test”) and (instruction or teaching or intervention). Eligibility was evaluated by the two authors (described above). A total of 1,239 studies were identifying via the database searches. Seven additional sources (Chazin & Ledford, 2021; Ledford & Wolery, 2013, 2015; Schebell et al., 2018; Sweeney et al., 2023; Todt et al., 2023; Wolery et al., 1988) were identified via a secondary search where we reviewed all works by any first author that showed up more than once during the initial search, resulting in a total of 1,246 studies eligible for initial screening.

Study Screening and Inclusion Criteria

To meet inclusion criteria, studies had to: (a) be written in English, (b) include data for at least one participant described as attending a preschool or who was aged 3 to 5 years, (c) graphed data from a single case design that met contemporary standards (i.e., three data points per condition with three potential demonstrations of effect; Ledford & Gast, 2018), and (d) the primary intervention had to include TD instruction. We specifically excluded variables not taught via TD (i.e., observational learning, incidental learning targets, and DVs without baseline data collection). See Table 1 for additional details regarding TD definitions.

Table 1.

Time Delay Inclusion Criteria.

TD type	Definition
CTD	During initial instruction, the instructor presents the controlling prompt immediately following the task direction (0-s delay trial). Following a predetermined number of 0-s delay trials, the instructor proceeds to lengthen the amount of time between the task direction and the prompt using a fixed interval (e.g., 3–5 s).
PTD	During initial instruction, the instructor presents the controlling prompt immediately following the task direction (0-s delay trial). Following a predetermined number of 0-s delay trials, the instructor proceeds to systematically lengthen the amount of time between the task direction and the prompt using an increasing time interval (e.g., 1, 2, 3, 4, 5 s)

Note. Sources that included time delay as part of a packaged intervention and did not include graphed data of the target behaviors taught only using time delay were excluded. TD = time delay; CTD = constant time delay; PTD = progressive time delay.

A source was determined to be published if it appeared in a peer-reviewed journal. If all data from a singular study were available in both unpublished and published forms (e.g., a doctoral dissertation was later published in a peer-reviewed journal), the unpublished version of the study was removed.

All abstracts and titles were double-screened using the Rayyan software tool (Ouzzani et al., 2016; www.rayyan.ai/). Duplicates were detected and removed (n = 60). Abstracts and titles were reviewed for the remaining 1,186 studies (100% double-screened). At this stage, 824 studies were excluded, and 362 studies were identified as eligible for full-text screening. These studies were double-screened by the authors and a second doctoral student. Discrepancies were resolved by the first author.

Initially, 189 studies (including both demonstration and comparison designs) were screened and sorted based on participant age. This manuscript includes data from 33 sources that included at least one preschool-aged participant (n = 26 published, n = 7 unpublished). See the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (Moher et al., 2009) diagram in Figure 1 for inclusion results.

Figure 1.

PRISMA Flow Diagram.

Coding

Descriptive, quality, and outcomes coding was conducted for eligible sources. Information on study participants, variables, and validity was recorded via dichotomous yes/no codes. Code books, with definitions and coding directions, are available via online Supplemental Materials. (https://osf.io/ux2s6/?view_only=bb6b92ac6c7041e38ca07a2cb3a9cece). We did not calculate effect sizes or conduct a meta-analysis which inhibits our ability to determine if magnitude of effect differed based on coded study characteristics. This decision to not conduct a meta-analysis was determined due to the inappropriateness of available effect size metrics for analyzing acquisition data and the fact that available metrics generally reflect level of overlap or mean differences between conditions, which are not ideal metrics for establishing whether criterion levels are met for previously unknown targets, which is the context in which TD is used (Ledford & Gast, 2018).

Participants

Data were coded on participant demographics, including gender, age, disability, race, and ethnicity. Disability category was deduced if there was evidence of disability specific special education services (e.g., provision of speech and language therapy for a child with a speech delay), reported Individuals with Disabilities Education Act classification (e.g., orthopedic impairment), or reported medical diagnosis (e.g., autism). Participant race and ethnicity were coded separately (e.g., White, Hispanic or White, Non-Hispanic).

Dependent Variables

Data regarding dependent variable (DV) categorization and topography were coded for each eligible participant. Designs that included graphed data of multiple dependent variables were coded as different designs (e.g., if data from two children in a dyad were shown on a single graph, we analyzed these data as two separate designs). Target behaviors taught were divided into three categories: (a) academic—skills related to school-based scholastic activities, including early learning behaviors (e.g., pre-academics such as matching) and school-aged skills (e.g., answering questions related to a science topic); (b) adaptive behaviors (e.g., self-help and vocational behaviors); and (c) communication/social/joint attention—any skill that could be used during reciprocal interactions between two or more people and play/leisure skills (see “Descriptive Coding” in the online Supplemental Materials for details). Topography included verbal responses (e.g., saying the name of a stimulus) and motor responses (e.g., engaging in a motor behavior to complete a play behavior or selecting a stimulus from an array of three).

Independent Variables

We coded various details about the TD procedure, including the TD type and interval details (i.e., trial arrangement and mastery criteria, antecedent and consequent components, and intervention context and implementer). The controlling prompt was coded by topography (e.g., verbal model, full physical assistance). Details regarding the TD procedure (including antecedent and consequent procedures) were coded on a participant level (n = 149) and details regarding the instructional context and implementer were coded by source (n = 33).

TD Type and Interval Details

We coded whether authors used progressive or constant delays (CTD or PTD), the controlling prompt topography, terminal delay latency (e.g., 3 s), and the interval for increasing latency when PTD was used (e.g., increased by 1 s at a time). We also coded whether authors reported three types of criteria: mastery criteria, criteria for increasing interval latency (e.g., moving from 0 to 1 s), and criteria for decreasing interval latency (e.g., moving from 1 to 0 s, usually due to unprompted errors).

Antecedent and Consequence Components

We recorded data on whether any antecedent (e.g., attending cues) or consequent (e.g., error correction, instructive feedback, tokens) strategies were reported and if any modifications were made contingent on non-responders (e.g., modifying delay intervals or changing reinforcement).

Contexts and Implementers

For each source (n = 33), we coded the context of the intervention, including setting (e.g., clinic, school), instructional arrangement (e.g., small or whole group), trial arrangement (e.g., massed, embedded), trials per session, and implementer (e.g., teacher, researcher; including whether endogenous implementer training was described).

Rigor and Outcomes Coding

Demonstration designs (i.e., any baseline to intervention comparison) for which a functional relation identification was possible (i.e., sufficient data per condition with three potential demonstrations of effect) were coded using the SCARF 2:0 (Ledford et al., 2020) to assess the overall Q&R and outcomes. The SCARF evaluates Q&R in several categories: (a) data sufficiency, (b) data reliability, (c) procedural fidelity, (d) DV descriptions, (e) condition/participant descriptions, and (f) ecological/social validity. Q&R coding is automatically converted into a numeric 0 to 4 score, with higher scores representing higher-quality studies (i.e., studies with a thorough methodological description and rigorous controls for threats to internal validity; Chazin et al., 2021; Ledford & Windsor, 2022). We elected not to include comparison designs in this coding because interpretation of an outcome of no difference between two intervention conditions (no functional relation, with both interventions being similarly effective or ineffective) is substantially different than interpretation of an outcome of no difference between a baseline and an intervention condition (no change from baseline, evidence of ineffectiveness).

Outcomes were evaluated via the SCARF using visual analysis, with scores of 0 indicating no positive effects, scores of 1 to 2 indicating inconsistent or minimal effects, and scores of 3 to 4 indicating consistent positive effects (a positive functional relation between the dependent and independent variable was apparent). These data were plotted to allow for visual analysis of the relationship between Q&R and outcomes, with high Q&R and outcome scores (top right quadrant) indicating high-quality evidence of positive effects. Contrastingly, high Q&R scores and low outcome scores (bottom right quadrant) indicate high-quality evidence of negative outcomes (inconsistent, non, or negative effects; for additional information regarding the SCARF please see https://ebip.vkcsites.org/scarfv2/).

Generalization and Maintenance Coding

Generalization and maintenance data were evaluated using the SCARF, with outcomes plotted on the vertical axis and measurement details on the horizontal axis. More distal (maintenance) or rigorous (generalization) measurement is depicted further right on the graph while less distal or rigorous measurement is depicted further to the left of the axis. As described for primary outcomes, generalization and maintenance outcomes were scored on a scale of 0 to 4, with scores of 3 and 4 representing evidence of consistent positive effects.

Interobserver Agreement

Interobserver agreement (IOA) data were calculated using the point-by-point method ([agreements / total agreements + disagreements] × 100). Mean IOA was 92.12% (86.22%–97.29%; see “IOA” in the online Supplemental Materials). All sources were double-coded (IOA coding was calculated for all codes for all sources). Disagreements for descriptive coding were resolved by the primary author and disagreements for quality and outcomes coding were resolved via weekly discrepancy discussions between the two authors.

Results

We identified and included 33 eligible sources, including 189 (demonstration and comparison) single case designs, with publication (or completion) dates ranging from 1990 to 2021 (Mdn = 2013; no date restrictions were set, but no eligible studies published prior to 1990 were identified).

Participants

Participants (n = 149) included 101 identified as being male or a boy (68%) and 40 as being female or a girl (27%). No participants were identified as being non-binary and gender was not reported for eight participants (Giunta-Fede et al., 2016; Ledford & Wolery, 2015). Average reported age was 4.54 years old (SD = 1.29). Age was not reported for six participants, all of which came from a single source (Ledford & Wolery, 2015). Most (n =117, 79%) participants were identified as having a disability, predominantly as autistic (n = 54, 36%) or as having a developmental delay (n = 47, 32%) or a speech language impairment (n = 26, 17%; see Table 2). Race was not reported for 57 (38%) participants. When race was reported (n = 92, 62%), participants were predominately identified as being White (n = 54, 36%) or Black (n = 22, 15%; see Table 2). Ethnicity was only reported for four sources (Lane et al., 2015; Ledford et al., 2017; Olszewski et al., 2017; Schebell et al., 2018) with seven participants identified as Hispanic.

Table 2.

Participant Information.

Characteristic	Category
Age	Mean = 4.54 years old SD = 1.29 years NR = 6 (4%)
Gender	Male = 101 (68%)
	Female = 40 (27%)
	NR = 8 (5%)
Primary disability	Autism spectrum disorder = 54 (36%)
	Developmental delay = 47 (32%)
	Speech language impairment = 26 (17%)
	Hearing/vision impairment = 9 (6%)Other disability = 7 (5%)
	Intellectual disability = 7 (5%)
	Health impairment (e.g., seizures) = 5 (3%)
	ADHD = 3 (2%)
	Orthopedic impairment = 4 (3%)
	Unspecified, received SPED services= 3 (2%)
	No disability reported = 32 (21%)
Race	White = 54 (36%)
	Black = 22 (15%)
	Asian = 8 (5%)
	Middle Eastern = 3 (2%)
	Multiracial/biracial= 3 (2%)
	NR = 57 (38%)
Ethnicity	Hispanic = 7 (5%)

Note. Other disability = unique disability that does fall under other categories (e.g., Williams syndrome). NR = not reported; ADHD = attention-deficit/hyperactivity disorder; SPED = participant identified as simply receiving special education services.

Dependent Variables

Commonly reported DVs were academic skills (n = 91, 61%; e.g., sight words, math facts, matching), followed by communication/social/joint attention (n = 54, 36%; e.g., social communication, imitation, play) and adaptive behaviors (n = 4; 3%; e.g., hand washing). Although there were some examples of chained behaviors (e.g., swimming; Rogers et al., 2010), DVs were primarily reported to be discrete skills. The most commonly reported target behavior topography was verbal responses (n = 117, 79%), followed by motor performance (n = 39, 26%) and motor identification (n = 2, 1%). One source reported target behaviors including both verbal and motor performance responses (n = 4; Gast et al., 1993). Relatedly, controlling prompt topographies were most often a verbal model (n = 115, 77%) or full physical assistance (n = 27, 18%; see Table 3).

Table 3.

Dependent Variable Information.

Academic	Behavior type		Target behavior topography	Controlling prompt topography
Academic	Adaptive	Social/Comm/JA
Academic = 31 (21%)School age = 60 (40%)	Self-help = 4 (3%)Voc. = 0 (0%)	Comm. = 27 (18%)Imitation = 18 (12%)Sharing = 6 (4%)Leisure = 3 (2%)Play = 0 (0%)	Verbal = 117 (79%)Motor performance = 39 (26%)Motor ID = 2 (1%)Written = 0 (0%)	Verbal model = 115 (77%)Full physical = 27 (18%)Verbal prompt = 8 (5%)Visual = 6 (4%)Gesture prompt = 2 (1%)Physical model = 1 (1%)Partial physical = 0 (0%)NR = 4 (3%)

Note. See Supplemental Materials for definitions. JA = joint attention; Voc = vocational; Comm = communication; V = verbal; NR = not reported.

Independent Variables

TD Type and Interval Details

A similar number of designs were included that evaluated CTD (n = 64, 43%) and PTD (n = 85, 57%). The most common number of 0-s delay sessions conducted for each participant was two (n = 43, 29%), followed by three (n = 35, 23%), and one (n = 32, 21%). The number of instructional sessions at 0-s delay was not reported for 39 (26%) participants. The most commonly reported terminal latency—the largest wait interval provided prior to the delivery of the controlling prompt—was a 3-s delay (n = 63, 42%), followed by a 4-s delay (n = 44, 29%). For sources that evaluated PTD, the most commonly reported interval to increase latency was a 1-s delay (n = 55, 65%), followed by a 2-s delay (n = 12, 14%) and a 3-s delay (n = 12, 14%). Criteria for increasing the delay latency were reported for 112 (75%) participants (e.g., latency increased after the participant reached 100% unprompted or prompted correct responses; Shepley et al., 2016). Criteria for decreasing the delay latency were reported for 36 (24%) participants (e.g., moving back to 0 s delay contingent on two or more unprompted errors in a session; Chazin & Ledford, 2021). Mastery criteria were reported for 115 (77%) participants. The most commonly reported mastery criteria was 100% unprompted correct responses for two consecutive sessions (n = 29), followed by 100% unprompted correct responses for three consecutive sessions (n = 20).

Additional Components

Trials occurred in massed (n = 112, 75%), embedded (n = 36, 24%), and distributed formats (n = 7, 5%). One source included both massed and embedded trials (n = 10; Ledford et al., 2017), measuring child preference for each. On average, sessions conducted in a massed trial format consisted of 15 total trials per session (range = 3–30), sessions conducted in an embedded trial format consisted of 10 total trials per session (range = 4–24), and sessions conducted in a distributed trial format consisted of 9 total trials per session (range = 5–12).

The use of an attending cue was reported for 94 (63%) participants. An error correction was used for 87 (58%) participants contingent on an incorrect response (e.g., provided the correct the response; Ledford et al., 2019). Instructive feedback (the provision of additional information contingent on correct responding; Alber et al., 1999; Whatley et al., 2009) was provided for 22 (15%) participants. Implementers repeated trials based on errors for 46 (31%) participants and interspersed known or previously mastered targets for 38 (26%) participants.

Most participants (n =145, 97%) were provided with social consequences indicating approval or correctness, and 37 (25%) participants were provided social consequences indicating acknowledgment of an error contingent on incorrect responding. A total of 67 (45%) participants were provided items without inherent value (e.g., tokens) that were later exchanged for a reward, and 73 (49%) participants were provided access to tangible items (including edibles) or preferred contexts contingent on correct responding. In addition to reinforcement for target responses, reinforcement was provided to 48 (32%) participants for engaging in non-target behavior (e.g., attending, staying seated). No source reported the removal of tokens or preferred items contingent on errors or challenging behavior.

Modifications due to insufficient correct responding were made to TD procedures for 19 participants. Modifications included increasing or altering reinforcement (n = 8; Chazin & Ledford, 2021; Francis et al., 2020; Ledford et al., 2017; Sweeney et al., 2023; Todt et al., 2023), alternating the attending cue (n = 5; Ledford et al., 2017; Ledford & Wolery, 2015) or instructional format (n = 3 Ledford et al., 2017; Sweeney et al., 2023; Todt et al., 2023), and changing the response latency (n = 2; Valk, 2003). Six of the seven studies that used modifications came from a related group of authors, perhaps suggesting that some researchers may be more likely to make modifications when acquisition is slow or delayed.

Contexts and Implementers

Instructional arrangement (i.e., how many children were present) was not reported for three sources (Giunta-Fede et al., 2016; Klaus et al., 2019; Schrandt et al., 2009). When reported, the majority of reported sessions took place in a one-on-one format (n = 17, 52%), followed by a small group (defined as eight or less; n = 14, 42%). Two sources reported using both one-on-one and small group instruction (Fleming, 1991; Hawkins, 2008).

Sessions commonly occurred in classrooms (n = 30, 91%), including special education (i.e., self-contained or pullout; n = 14), general education (n = 12), and non-specified (i.e., authors did not report classroom type; n = 4). Other identified settings were a clinic (n = 3, 9%) and community park (n = 1, 3%). Most TD sessions were implemented by researchers (n = 22, 67%), followed by classroom teachers/instructors (n = 7, 21%), and a classroom teacher enrolled in graduate school at the time of implementation (n = 1, 3%). TD Implementer was not reported for three sources (9%; Alig-Cybriwsky et al., 1990; Kinder, 2010; Schrandt et al., 2009). Of sources that reported teacher/instructor implementation, only three provided information about teacher training (e.g., video modeling and role-playing, Gast et al., 1993; Olszewski et al., 2017; Rakap, 2019).

SCARF 2.0 Analysis

The SCARF (Ledford et al., 2020) was used to assess the Q&R and outcomes for all eligible single case demonstration designs (n = 111) Designs included multiple probe/baseline designs across behaviors (n = 66, 59%), adapted alternating treatment designs (n = 36, 32%), and multiple probe/baseline across participant designs (n = 9, 8%).

Quality and Rigor

Studies were generally high-scoring on internal validity measures, with 108 (97%) having sufficient data, 90 (87%) collecting sufficient reliability data and demonstrating high agreement, and 96 (86%) collecting fidelity data and demonstrating high fidelity. Feasibility or acceptability ratings (i.e., social validity) were reported for 29 designs (26%). Typical environments and/or endogenous implementers were used for 92 designs (82%). The average Q&R score, as reported by the SCARF, across all included designs was 2.88 (Mdn = 3.14; possible range 0 - 4; actual range 1.5 - 3.53). Most (n = 102; 92%) designs had scores above 2.0; single case designs that scored below 2.0 (n = 9, 8%) came from a single peer-reviewed source.

Outcomes

Most outcome scores corresponded with functional relation determination (i.e., SCARF score of 3 or 4; see Figure 2), with no functional relation being identified in 15 (14%) designs. The median outcome score was 4, indicative of a functional relation with no weak effects (range: 0–4). Thus, 96 (86%) designs had data patterns consistent with a functional relation.

Figure 2.

SCARF Data for Included Sources

Outcomes by Publication Status

As shown in Figure 3, published and unpublished studies had mixed outcomes, although unpublished studies had a smaller range of rigor scores, such that published studies contributed to all scores under two. There were 28 designs in unpublished sources, six of which did not demonstrate a functional relation (21%) and there were 83 published designs, nine of which did not demonstrate a functional relation (11%).

Figure 3.

SCARF Data for Published vs Unpublished Sources

Outcomes by Behavior Topography

To explore whether outcomes varied based on the topography of the target behavior, we analyzed separately designs including measurement of a verbal behavior and those including measurement of a motor behavior. As shown in Figure 4, there was variability in both types, with 18 of 22 designs measuring motor behavior demonstrating a functional relation (82%), and 72 of 83 designs measuring verbal behavior demonstrating a functional relation (87%).

Figure 4.

SCARF Data for Verbal vs Motor Responses

Outcomes by Instructional Arrangement and TD Type

We also explored whether outcomes were different based on instructional arrangement (one-on-one versus small group). As shown in Figure 5, most designs demonstrated a functional relation (58 of 64, 91%) when individual instruction was used. Fewer designs assessing small group instruction demonstrated a functional relation (43 of 53, 81%); however, this difference was partially explained by quality. When only considering designs with a score of at least two, more studies included data indicative of a functional relation (36 of 43, 84%). When analyzing the designs that were of high Q&R but demonstrated a non-effect, 10 out of the 12 designs evaluated the use of CTD rather than PTD.

Figure 5.

SCARF Data for TD Implementation in One-on-One vs Small Group Settings

Generalization Measurement and Outcomes

The most commonly reported form of generalization measurement was stimulus generalization with novel materials (n = 44, 40%), followed by contexts (n = 40, 36%), and social partners (n = 31, 28%). In 15 designs (14%), a behavior representing a generalized tendency (e.g., play behaviors) was measured. In one design, a specific behavior (or type of behavior) was taught, and a different specific behavior was measured (i.e., response generalization; Schrandt et al., 2009). Outcomes for generalization were mixed, with about half of the designs (n = 30; 48%) coded as having all strong positive effects (SCARF outcome = 4), followed by 15 designs (24%) coded as having at least one weak positive effect and one strong positive effect with no non-effects (SCARF outcome = 3; see Figure 2).

Maintenance Measurement and Outcomes

A total of 89 (80%) designs reported evidence of continued behavior change, following the removal of the intervention. Of those single case designs in which maintenance was reported, 54 (49%) were unclear in their reporting of maintenance or only reported maintenance immediately following completion of intervention. Designs that measured maintenance were coded as representing strong positive effects (SCARF outcome = 4; n = 60; 54%), followed by those that were analyzed to have a least one weak positive and one strong positive effect with no non-effects (SCARF outcome = 3; n = 20; 18%; see Figure 3).

To explore whether outcomes varied based on the target behavior domain, we analyzed separately the two most common outcome domains—social/communication/joint attention and academic behaviors. Data were similar for both (see graphical representation of all SCARF data via “All SCARF Data” in the Online Supplemental Materials), but maintenance data were discrepant, as shown in Figure 6. First, social behaviors were almost exclusively measured immediately after intervention was withdrawn (or timing of maintenance measurement was unreported). For these designs, 10 of 12 demonstrated positive maintenance outcomes (83%). Academic outcomes were also commonly measured immediately (n = 42) with primarily positive outcomes (40 of 42 designs, 95%). When measured at a longer duration (e.g., 1 week or longer; n = 31), outcomes were also mostly positive (27 of 31 designs, 87%).

Figure 6.

SCARF Maintenance Data for Academic and Social Behaviors

Discussion

The purpose of this review was to analyze the characteristics of the TD literature and evaluate the quality and outcomes, in relation to participant and variable characteristics, and with consideration for publication bias. A total of 33 eligible sources, including 111 single case demonstration designs were included in this analysis. Overall designs were of higher Q&R (M = 2.88) than previous reviews of other intervention types (Barton et al., 2020; Chazin et al., 2021; Ledford & Pustejovsky, 2023; Ledford & Windsor, 2022). Ninety designs that were high in Q&R were visually analyzed to be consistent with a positive functional relation, representing substantial, but not universal, evidence for the effectiveness of TD.

For Whom is TD effective?

The majority (79%) of included participants were diagnosed with a disability, with the most common being developmental and intellectual disabilities. It may be unsurprising that these participants comprised the largest group, since they might, in particular, be most likely to receive supplemental direct instruction during the early childhood years. Similar to previous reviews (Robertson et al., 2017; Severini et al., 2018), there was minimal reporting of participant race or ethnicity. Absent race and ethnicity data prevent future researchers from being able to identify and make the appropriate adjustments to mitigate potential over or under representation of marginalized communities in research (Pritchett et al., 2022).

When analyzing the single case designs that were of high Q&R but demonstrated a non-effect, 10 out of the 12 (83%) designs evaluated the use of CTD and not PTD. This is an important distinction to make when considering instruction for young children. Preschool-aged children are more developmentally likely to engage in more impulsive behaviors and are less likely to wait for a prompt (Ledford et al., 2019) when compared with older children or adults. In addition, the response guided increase in the delay interval required for PTD (Ledford et al., 2019) may make it difficult for endogenous implementers to achieve fidelity when tasked to remember various TDs across different target behaviors and children. In summary, we are most confident in the positive results of TD for children with autism and developmental delays, with preliminary indications that PTD might be more universally effective for young children than CTD.

For What Behaviors is TD Effective?

Most (83 of 111; 75%) designs that were analyzed using the SCARF evaluated TD for target behaviors that were verbal in topography. In contrast, only 22 (20%) designs evaluated TD for target behaviors that were gross or fine motor in topography. Thus, although both topographies were often associated with generally positive effects (see Figure 4), TD may have primarily been assessed with students who were already proficient at vocal imitation, indicating a need for future research evaluating the implementation of TD for children who are not consistently imitative. In addition, the majority (56%) of the DVs in which a modification was reported were imitative behaviors. This is strikingly different from the 12 percent of all DVs reported as imitative (see Table 3). This is interesting, because TD can also be used to teach imitation (Francis et al., 2020; Valk, 2003; Venn et al., 1993)—additional research may be needed to determine whether children who are not readily imitative can be taught to imitate using TD with physical prompts, which then allows them to be taught a variety of other skills using model prompts (i.e., teaching imitation with TD is a pivotal skill that allows for less intrusive prompts in later TD instruction).

Although both were consistent with positive effects, maintenance data discrepancies were observed when analyzing dependent variables. Maintenance data collected on academic tasks varied in the duration of time that lapsed after intervention was removed, whereas maintenance data collected on social behaviors predominately occurred immediately following the removal of the TD intervention (see Figure 6). Thus, while long- and short-term maintenance of academic skills were largely positive, conclusions about long-term maintenance of social behaviors are not possible. Additional research is needed to determine whether these social behaviors taught via TD maintain over time.

Under What Conditions is TD Effective?

Interventions were predominately conducted in a massed trial format. Previous research has shown that child preference for embedded trials versus massed is mixed (Ledford et al., 2017) and different benefits exist (i.e., massed trials are efficient while embedded trials allow for child-directed interactions). Most instruction occurred in special education settings, limiting conclusions about the extent to which TD can be used in inclusive classrooms and other typical settings. Thus, there is a need for additional research evaluating TD in an embedded and distributed trial formats, especially when multiple children are present (e.g., in typical classroom contexts), and in general education settings to increase the ecological validity of the TD procedure.

Recommendations and Future Directions

It is recommended that practitioners consider the necessary prerequisite skills (e.g., the ability to wait for prompt), target behavior, and implementer skill set (i.e., ability to implement each potential procedure to fidelity), when deciding which TD procedure to use (CTD, PTD). Practitioners may want to assess the child’s ability to wait for a prompt prior to implementation (e.g., determine if the child waits for help when known versus unknown targets are presented). If the child demonstrates the inability to consistently wait for a few seconds (e.g., 3 s) prior to the delivery of the prompt, the practitioner may want to consider implementing wait training (Ledford et al., 2019) prior to instruction.

The mixed results of this review indicate several areas of need that require additional research in the TD literature: First, additional research is needed regarding how to teach children the prerequisite skills required to benefit from TD instruction and to provide guidance to practitioners regarding which TD procedure to choose (CTD, PTD). Second, information about the effectiveness of TD for older children is needed, as the outcomes reported here may not generalize to older children with more instructional experience. Finally, additional research is also needed to allow for confident conclusions to be drawn for use of TD in typical settings (e.g., in a small or whole group classroom format). Moreover, research is needed to determine if teachers can and will use TD, and the contexts in which it is currently being used.

Limitations

The outcomes and recommendations presented in this review should be interpreted in light of several limitations. Due to time and resource constraints, we did not complete a forward and backward search of the included literature and limited our search to sources found via PsychInfo and ProQuest rather than other electronic databases (e.g., ERIC). In addition, participant experience with TD, prior to intervention, was not coded during this review, which may affect the interpretation of participant prerequisite skills. In addition, due to the small size of some of the sample groups, there may be some differences that can be attributed to unidentified factors. Finally, although we identified and included theses and dissertations in this review, other unpublished sources may exist (e.g., conference presentations) and may have been, therefore, excluded.

Conclusions

This review provides an updated analysis on the TD literature. Robust data suggest that TD is effective for teaching young children with intellectual and developmental disabilities to engage in novel behaviors, when used in a one-on-one massed trial format. Because there were relatively few comparisons that used different variations of the TD procedure (e.g., massed vs embedded TD instruction), additional research is needed to clarify under what additional conditions (e.g., embedded, distributed) and for whom (e.g., older children) the TD procedure will be most likely to result in optimal outcomes.

Footnotes

Acknowledgements

Special thanks to Sienna A. Windsor, Kelsey H. Smith, Anna E. Cantrell, Payton L. Watton, Lori B. Throne, and Jocelyn Ng for their assistance with data collection.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported, in part, by the Office of Special Education and Rehabilitative Services (H190004).

ORCID iDs

Paige Bennett Eyler

Jennifer R. Ledford

Supplemental Material

Supplemental material for this article is available online.

Notes

References

Alber

S. R.

Heward

W. L.

Hippler

B. J.

(1999). Teaching middle school students with learning disabilities to recruit positive teacher attention. Exceptional Children, 65(2), 253–270.

Alig-Cybriwsky

Wolery

Gast

D. L.

(1990). Use of a constant time delay procedure in teaching preschoolers in a group format. Journal of Early Intervention, 14(2), 99–116.

Barton

E. E.

Murray

O'Flaherty

Sweeney

E. M.

Gossett

(2020). Teaching object play to young children with disabilities: A systematic review of methods and rigor. American Journal on Intellectual and Developmental Disabilities, 125(1), 14–36.

Bury

S. M.

Jellett

Spoor

J. R.

Hedley

(2023). “It defines who I am” or “It’s something I have”: What language do [Autistic] Australian adults [on the autism spectrum] prefer? Journal of Autism and Developmental Disorders, 53, 677–687. https://doi.org/10.1007/s10803-020-04425-3

Cengher

Budd

Farrell

Fienup

(2018). A review of prompt-fading procedures: Implications for effective and efficient skill acquisition. Journal of Developmental and Physical Disabilities, 30, 155–173. https://doi.org/10.1007/s10882-017-9575-8

Chazin

K. T.

Ledford

J. R.

(2021). Constant time delay and system of least prompts: Efficiency and child preference. Journal of Behavioral Education, 30, 684–707. https://doi.org/10.1007/s10864-020-09396-0

Chazin

K. T.

Ledford

J. R.

Pak

N. S.

(2021). A systematic review of augmented input interventions and exploratory analysis of moderators. American Journal of Speech-Language Pathology, 30(3), 1210–1223.

Division for Early Childhood. (2014). DEC recommended practices in early intervention/early childhood special education 2014. http://www.dec-sped.org/recommendedpractices

Fleming

L. A.

(1991). The effects of integrating a constant time delay procedure into a circle time activity to teach letter naming to preschoolers with developmental delays [Doctoral dissertation, University of Kentucky]. ProQuest Dissertations.

10.

Francis

Winchester

Barton

E. E.

Ledford

J. R.

Velez

(2020). Using progressive time delay to increase levels of peer imitation during play with preschoolers with disabilities. American Journal on Intellectual and Developmental Disabilities, 125, 3186–3199. https://doi.org/10.1352/1944-7558-125.3.186

11.

Gast

D. L.

Collins

B. C.

Wolery

Jones

(1993). Teaching preschool children with disabilities to respond to the lures of strangers. Exceptional Children, 59(4), 301–311. https://doi.org/10.1177/001440299305900403

12.

Giunta-Fede

Reeve

DeBar

Vladescu

Reeve

(2016). Comparing continuous and discontinuous data collection during discrete trial teaching of tacting by children with autism. Behavioral Interventions, 31(4), 311–331. https://doi.org/10.1002/bin.1446

13.

Greenwood

C. R.

Beecher

Atwater

Petersen

Schiefelbusch

Irvin

(2018). An ecobehavioral analysis of child academic engagement: Implications for preschool children not responding to instructional intervention. Topics in Early Childhood Special Education, 37(4), 219–233.

14.

Handen

B. L.

Zane

(1987). Delayed prompting: A review of procedural variations and results. Research in Developmental Disabilities, 8(2), 307–330.

15.

Hawkins

S. R.

(2008). Embedding constant time delay instruction to teach discrete tasks to preschool children who have significant disabilities in a one to one and small group format: A comparison study [Dissertations]. ProQuest Dissertations.

16.

Heckaman

K. A.

Alber

Hooper

Heward

W. L.

(1998). A comparison of least-to-most prompts and progressive time delay on the disruptive behavior of students with autism. Journal of Behavioral Education, 8(2), 171–201.

17.

Kinder

K. A.

(2010). Comparing the effects of descriptive comments versus descriptive comments plus prompted trials on children's letter naming [Doctoral dissertation, Vanderbilt University].

18.

Klaus

Hixson

M. D.

Drevon

D. D.

Nutkins

(2019). A comparison of prompting methods to teach sight words to students with autism spectrum disorder. Behavioral Interventions, 34(3), 352–365. https://doi.org/10.1002/bin.1667

19.

Lane

J. D.

Gast

D. L.

Shepley

Ledford

J. R.

(2015). Including social opportunities during small group instruction of preschool children with social-communication delays. Journal of Early Intervention, 37(1), 3–22. https://doi.org/10.1177/1053815115588828

20.

Ledford

J. R.

Chazin

K. T.

Harbin

E. R.

Ward

S. E.

(2017). Massed trials versus trials embedded into game play: Child outcomes and preference. Topics in Early Childhood Special Education, 37(2), 107–120. https://doi.org/10.1177/0271121417704703

21.

Ledford

J. R.

Chazin

K. T.

Lane

J. D.

Zimmerman

K. N.

Ayres

K. A.

(2020). Single case analysis and review framework (SCARF). https://ebip.vkcsites.org/scarfv2/

22.

Ledford

J. R.

Gast

D. L.

(2018). Single case research methodology: Applications in special education and behavioral sciences (3rd ed.). Routledge.

23.

Ledford

J. R.

Lane

J. D.

Barton

E. E.

(2019). Methods for teaching in early education. Routledge.

24.

Ledford

J. R.

Lane

J. D.

Elam

K. L.

Wolery

(2012). Using response-prompting procedures during small-group direct instruction: Outcomes and procedural variations. American Journal on Intellectual and Developmental Disabilities, 117(5), 413–434.

25.

Ledford

J. R.

Pustejovsky

J. E.

(2023). Systematic review and meta-analysis of stay-play-talk interventions for improving social behaviors of young children. Journal of Positive Behavior Interventions, 25(1), 65–77.

26.

Ledford

J. R.

Windsor

S. A.

(2022). Systematic review of interventions designed to teach imitation to young children with disabilities. Topics in Early Childhood Special Education, 42(2), 202–214.

27.

Ledford

J. R.

Wolery

(2013). Peer modeling of academic and social behaviors during small-group direct instruction. Exceptional Children, 79(4), 439–458.

28.

Ledford

J. R.

Wolery

(2015). Observational learning of academic and social behaviors during small-group direct instruction. Exceptional Children, 81(3), 272–291. https://doi.org10.1177/0014402914563698

29.

Moher

Liberati

Tetzlaff

Altman

D. G.

, & Prisma Group. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLOS MED, 6(7), Article e1000097.

30.

Olszewski

Soto

Goldstein

(2017). Modeling alphabet skills as instructive feedback within a phonological awareness intervention. American Journal of Speech-Language Pathology(3), 769–790. https://doi.org/10.1044/2017_AJSLP-16-0042

31.

Ouzzani

Hammady

Fedorowicz

Elmagarmid

(2016). Rayyan — A web and mobile app for systematic reviews. Rayyan. https://rayyan.ai/

32.

Pritchett

Ala’i-Rosales

Cruz

A. R.

Cihon

T. M.

(2022). Social justice is the spirit and aim of an applied science of human behavior: Moving from colonial to participatory research practices. Behavior Analysis in Practice, 15, 1074–1092. https://doi.org/10.1007/s40617021-00591-7

33.

Rakap

(2019). Re-visiting transition-based teaching: Impact of pre-service teacher’s implementation on child outcomes. Learning and Instruction, 59, 54–64. https://doi.org/10.1016/j.learninstruc.2018.10.001

34.

Robertson

R. E.

Sobeck

E. E.

Wynkoop

Schwartz

(2017). Participant diversity in special education research: Parent-implemented behavior interventions for children with autism. Remedial and Special Education, 38, 259–271.

35.

Rogers

Hemmeter

M. L.

Wolery

(2010). Using a constant time delay procedure to teach foundational swimming skills to children with autism. Topics in Early Childhood Special Education, 30(2), 102–111.

36.

Schebell

Shepley

Mataras

Wunderlich

(2018). Comparing pictures and videos for teaching action labels to children with communication delays. Topics in Early Childhood Special Education, 37(4), 234–245. https://doi.org/10.1177/0271121417746511

37.

Schrandt

J. A.

Townsend

D. B.

Poulson

C. L.

(2009). Teaching empathy skills to children with autism. Journal of Applied Behavior Analysis, 42(1), 17–32. https://doi.org/10.1901/jaba.2009.42-17

38.

Schuster

J. W.

Morse

T. E.

Ault

M. J.

Crawford

M. R.

Wolery

(1998). Constant time delay with chained tasks: A review of the literature. Education and Treatment of Children, 21, 74–106.

39.

Severini

K. E.

Ledford

J. R.

Robertson

R. E.

(2018). Systematic review of problem behavior interventions: Outcomes, demographics, and settings. Journal of Autism and Developmental Disorders, 48(10), 3261–3272. https://doi.org/10.1007/s10803-018-3591-0

40.

Sham

Smith

(2014). Publication bias in studies of an applied behavior-analytic intervention: An initial analysis. Journal of Applied Behavior Analysis, 47(3), 663–678.

41.

Shepley

Lane

J. D.

Shepley

S. B.

(2016). Teaching young children with social-communication delays to label actions using videos and language expansion models: A pilot study. Focus on Autism and Other Developmental Disabilities, 31(4), 243–253. https://doi.org/10.1177/1088357614552189

42.

Sweeney

Barton

E. E.

Ledford

J. R.

(2023). Using progressive time delay to increase levels of peer imitation during sculpting play. Journal of Autism and Developmental Disorders, 53, 516–524. https://doi.org/10.1007/s10803-018-3638-2

43.

Todt

M. J.

Barton

E. E.

Ledford

J. R.

Robinson

G. N.

Skiba

E. B.

(2023). Teaching and promoting generalization of peer imitation with preschoolers with disabilities. Journal of Early Intervention, 45(1), 63–82.

44.

Touchette

P. E.

(1971). Transfer of stimulus control: Measuring the moment of transfer 1. Journal of the Experimental Analysis of Behavior, 15(3), 347–354.

45.

Valk

J. E.

(2003). Teaching imitation skills to preschool children with severe disabilities: The effects of embedding constant time delay within a small group activity [Doctoral dissertation, The Ohio State University].

46.

Venn

M. L.

Wolery

Werts

M. G.

Morris

DeCesare

L. D.

Cuffs

M. S.

(1993). Embedding instruction in art activities to teach preschoolers with disabilities to imitate their peers. Early Childhood Research Quarterly, 8(3), 277–294. https://doi.org/10.1016/S0885-2006(05)80068-7

47.

Walker

(2008). Constant and progressive time delay procedures for teaching children with autism: A literature review. Journal of Autism and Developmental Disorders, 38(2), 261–275.

48.

Whatley

A. D.

Gast

D. L.

Hammond

D. L.

(2009). Visual activity schedules: Teaching independent transitioning during recreation and leisure. Therapeutic Recreation Journal, 43(2), 27–42.

49.

What Works Clearinghouse. (2020). Standards handbook (Version 4.1). U.S. Department of Education, Institute of Education Sciences. https://ies.ed.gov/ncee/wwc/Docs/referenceresources/WWC-Standards-Handbook-v4-1-508.pdf

50.

Wolery

Gast

D. L.

(1984). Effective and efficient procedures for the transfer of stimulus control. Topics in Early Childhood Special Education, 4(3), 52–77.

51.

Wolery

Gast

D. L.

Kirk

Schuster

(1988). Fading extra-stimulus prompts with autistic children using time delay. Education and Treatment of Children, 11(1), 29–44.

52.

Wolery

Holcombe

Cybriwsky

Doyle

P. M.

Schuster

J. W.

Ault

M. J.

Gast

D. L.

(1992). Constant time delay with discrete responses: A review of effectiveness and demographic, procedural, and methodological parameters. Research in Developmental Disabilities, 13(3), 239–266.

Systematic Review of Time Delay Instruction for Teaching Young Children

Abstract

Keywords

Method

Coding Team

Literature Search

Study Screening and Inclusion Criteria

Coding

Participants

Dependent Variables

Independent Variables

TD Type and Interval Details

Antecedent and Consequence Components

Contexts and Implementers

Rigor and Outcomes Coding

Generalization and Maintenance Coding

Interobserver Agreement

Results

Participants

Dependent Variables

Independent Variables

TD Type and Interval Details

Additional Components

Contexts and Implementers

SCARF 2.0 Analysis

Quality and Rigor

Outcomes

Outcomes by Publication Status

Outcomes by Behavior Topography

Outcomes by Instructional Arrangement and TD Type

Generalization Measurement and Outcomes

Maintenance Measurement and Outcomes

Discussion

For Whom is TD effective?

For What Behaviors is TD Effective?

Under What Conditions is TD Effective?

Recommendations and Future Directions

Limitations

Conclusions

Footnotes

Acknowledgements

Declaration of Conflicting Interests

Funding

ORCID iDs

Supplemental Material

Notes

References