Abstract
A critical tenet of education research is establishing what works. Another is exploring theorized mechanisms of change to help ascertain why academic programs work, for whom, and under which conditions. In other words, unpacking the black boxes of academic programs. This study explored the quality of teachers’ facilitation of (a) scientific investigations and (b) science discourse during and after the implementation of a systematic, explicit second-grade science program (Scientific Explorers-2). Our results demonstrated that relative to comparison classrooms, Scientific Explorers-2 classrooms delivered significantly higher quality scientific investigations. The quality of science discourse and maintenance effects for both measures were not statistically significant, but favored treatment classrooms in each case. Implications for designing science programs that support the delivery of high-quality science instruction that meets the needs of all students, particularly students with or at risk of learning disabilities, are discussed.
Keywords
Students with or at risk of learning disabilities (LD) are general education students first, receiving nearly all their education services within these classrooms (U.S. Department of Education, 2022). This is particularly true in science education where, unlike in reading and mathematics, there are few instructional supports available beyond core science instruction. Given this standalone role, core science instruction is responsible for promoting science literacy among all learners, including students with or at risk of LD. Therefore, core science instruction must be designed and delivered to address the instructional needs of the full range of learners. Yet, compelling evidence suggests that core science instruction may not adequately address the instructional needs of U.S. students in science, particularly those with LD. On the 2019 National Assessment of Educational Progress (NAEP) science assessment, for example, only 36% of fourth-grade students reached Proficient (National Center for Educational Statistics [NCES], n.d.). For students with disabilities, merely 15% of fourth-grade students met proficiency levels (NCES, n.d.). Such data suggest the quality of the enacted science curriculum in U.S. schools is falling short of helping all students become scientifically literate.
Facilitating Science Success for All Through High-Quality Instruction
A primary recommendation of the Next Generation Science Standards (NGSS Lead States, 2013) is to employ a three-dimensional approach to science education, teaching students how to connect and simultaneously work with the three dimensions of science (i.e., disciplinary core ideas, crosscutting concepts, and science and engineering practices). The National Research Council (NRC; 2012) posits that a three-dimensional understanding of science permits students to make sense of the world and tackle problems of high societal importance. Promoting three-dimensional science learning among all learners, including students with or at risk of LD, begins with the quality of science instruction delivered in general education classrooms. In this pursuit, the NGSS (NGSS Lead States, 2013) recommend that students have direct experiences to (a) observe natural phenomena, (b) work with and build physical models, (c) collect and analyze data, and (d) disseminate their findings. While scientific investigations and science discourse are integral tenets of these direct experiences, the quality of their presence in early elementary science classrooms remains unknown. Thus, this study explores the quality of scientific investigations and science discourse facilitated by teachers using a recently tested second-grade science program called Scientific Explorers-2 (Sci2; Doabler et al., 2021).
Scientific investigations
In this study, scientific investigations are operationally defined as hands-on experiences that support students in directly engaging in the practices that scientists use to understand and investigate the world. Investigations in the early elementary grades often employ physical models, such as a second-grade classroom using hand-held fans to simulate how the forces of wind can differentially impact the shape of the land or in the case of the investigation a pile of sand. When observing and investigating natural phenomena in real-world settings is impractical due to safety, cost, or access issues, classrooms often turn to simulation-based models as an effective alternative. In this case, students might use a technology-based application to, for example, manipulate different variables that contribute to avalanches and investigate the extent to which these adjustments affect the magnitude and direction of an avalanche.
Science discourse
Science discourse is a verbal interaction among scientists. Experts suggest such discourse often begins with a scientist observing a phenomenon and then subjecting that explanation or claim to professional scrutiny (NRC, 2012; Schwarz et al., 2017). Despite its importance, science discourse is not commonly found in science classrooms (Berland et al., 2017; Therrien et al., 2022). Several factors may account for this deficiency. To illustrate, akin to mathematics, facilitating purposeful science discussions in whole class settings can be difficult for teachers to manage (Doabler et al., 2015; Borko et al., 2021). Research indicates that science instruction often neglects to leverage students’ pre-existing knowledge, potentially restricting students’ chances to verbally contribute their understanding of science and their surrounding world (Bae et al., 2021; Therrien et al., 2022). Relatedly, many students face opportunity gaps in academic language and thus often receive limited opportunities to develop science vocabulary knowledge necessary to convey their scientific understanding (Morgan et al., 2016). Considering these factors and the significant role that science discourse plays in student learning, researchers have initiated studies to examine the occurrence of science discourse in classroom settings.
As an example, Fishman et al. (2017) developed the Science Discourse Instrument (SDI) to understand which instructional practices support science discourse and argumentation. During the SDI’s development, three teacher (i.e., ask, press, and link) and three student practices (i.e., explain/claim, co-construct, and critique) were identified that constitute what might be considered high-quality science discourse and argumentation. At the teacher level, the SDI aims to unpack whether teachers encourage students to expand on their claims and understand how ideas can build from one another (Fishman et al., 2017). The SDI’s focus on students seeks to understand how well students articulate a claim that is supported by evidence while, at the same time, connecting their ideas to others and critically evaluating the claims of their peers.
In 2017, Fishman et al. (2017) used the SDI to investigate the effects of intensive (~19 days) and less-intensive (~9 days) versions of a professional development program on the abilities of third- through fifth-grade teachers to use high-quality science discourse as a mechanism to improve the overall quality of their science instruction. To apply the SDI, Fishman et al. used video segments to rate teachers’ and students’ science discourse practices. Each of the 44 participating teachers were videotaped two times during the study. Findings suggested that both versions of the professional development program significantly improved teachers’ facilitation of high-quality science discourse. Interestingly, they found no differences in discourse practices between the intensive and less-intensive versions of the program. In addition, Fishman et al. observed an absence of students critiquing the responses of their peers. The authors conjectured that improving the opportunities that students receive to critique the scientific claims of their peers may require teachers to experience more specialized professional development training.
Systematic, Explicit Science Instruction
Despite the recommendations of the NGSS to promote three-dimensional science learning in today’s science classrooms (NGSS Lead States, 2013), a debate continues in the field on what type of instructional approach teachers should employ to reach this critical outcome. One approach that has gained empirical momentum is systematic, explicit instruction (Therrien et al., 2017; Zhang et al., 2022). Systematic, explicit science instruction is a critical analog to three-dimensional science learning as direct instruction is to supporting students’ proficiency in reading (Vaughn et al., 2022) and mathematics (Fuchs et al., 2021). Systematic, explicit science instruction refers to an instructional design and delivery approach that seeks to promote science literacy for all students, including students with or at risk of LD. At its core, systematic, explicit science instruction uses purposefully designed tasks, activities, and investigations to facilitate important instructional interactions between teachers and students, and among students, around natural phenomena (Doabler et al., 2015; Therrien et al., 2017).
While the evidence behind the use of a systematic, explicit teaching approach is still burgeoning in science instruction (Zhang et al., 2022), research has begun to demonstrate its promise for promoting important content-related outcomes. For example, a study focused on middle school chemistry found that students with disabilities in general education science classrooms that used more direct teaching techniques significantly outperformed their peers in classrooms that used more discovery-based materials (Lynch et al., 2007). More recently, Gray et al. (2022) investigated the treatment effects of Zoology One, an integrated literacy and science intervention that uses explicit instructional techniques to teach targeted reading and science content. Results suggested significant treatment effects on students’ reading comprehension (∆
In separate randomized controlled trials (RCTs), Doabler et al. (2021) and Gersib et al. (2025) examined the efficacy of Sci2, a systematic, explicit second-grade science program. Sci2 is a whole-class program that prioritizes three-dimensional learning in the context of earth science (NGSS Lead States, 2013). Combined, the studies involved nearly 500 second-grade students from approximately 30 classrooms. Of the participating students, 15% received special education services. In addition, 14% and 18% of students, respectively, scored at or below the 25th percentile on middle-of-year, standardized reading and mathematics outcome measures. The 25th percentile is a cut point where students are conventionally considered at risk for LD (Fletcher et al., 2019). In the original study, Doabler et al. (2021) tested Sci2 against a business-as-usual (BAU) condition, which included STEMscopes (Accelerate Learning, 2017) and district-developed science materials. In a conceptual replication, Gersib et al. (2025) compared Sci2 to a discovery-based, second-grade science program rooted in NGSS earth science content.
Collectively, findings from Doabler et al. (2021) and Gersib et al. (2025) suggested that students who received Sci2 instruction outperformed their comparison peers on multiple science outcome measures. Specifically, in the original Sci2 study, Doabler et al. (2021) reported statistically significant effects on a proximal assessment of science vocabulary (g = 0.94), a distal measure of the NGSS science and engineering practices (g = 0.48), and a distal measure focused on the NGSS disciplinary core ideas around earth science (g = 0.60). Results also suggested a non-significant effect on a commercially available, distal science assessment (g = 0.02). Differential response analyses involving data collected in the original study also indicated the Sci2 program was efficacious for all students, regardless of students’ initial skill levels in science, reading, and mathematics as well as disability status and proficiency in English (Doabler et al., 2024; Rojo et al., 2024). In the replication study, Gersib et al. (2025) found the Sci2 program demonstrated effects in favor of the Sci2 condition on two distal science outcome measures and a significant treatment effect on the science vocabulary assessment, with effect sizes (Hedges’ g) ranging from 0.19 to 0.40. In sum, while findings from these two RCTs suggest the promise of systematic, explicit science programs in the early elementary grades, little information exists regarding the mechanisms of change that lead to these improved science outcomes for students. One plausible mechanism is the quality of science instruction.
Purpose of the Current Study
A primary aim of education research is to investigate the impact of academic programs on critical student outcomes. Another is to explore why academic programs work, for whom, and under which conditions (Doabler et al., 2017). Against that backdrop, the field seeks to unpack the black boxes of validated academic programs. For example, researchers have begun to investigate direct observation data to gauge whether and to what extent systematic, explicit academic programs impact the quality of instruction. In the context of core reading, Nelson-Walker et al. (2013) found the Enhancing Core Reading Intervention improved the frequency in which first-grade teachers facilitated student practice opportunities. Doabler et al. (2018) reported similar findings when using direct observation data to investigate whether the Early Learning in Mathematics program increased teachers’ quality of core kindergarten mathematics instruction. There is some evidence that systematic, explicit science programs improve the quality of early science instruction. However, those results are limited to pre-kindergarten settings (Whittaker et al., 2020). Thus, it remains unknown whether systematic, explicit-designed science programs influence the quality of science instruction delivered in early elementary classrooms. This study sought to address this void in the literature by exploring the black box of the Sci2 science program, a recently tested second-grade science program. Specifically, this study investigated direct observation data collected by Doabler et al. (2021) to understand whether the Sci2 program impacted the quality of science instruction. Given that Sci2 shares a systematic, explicit instructional architecture similar to the programs explored by Nelson et al. (2013), Doabler et al. (2018), and Whittaker et al. (2020), we anticipated that Sci2 teachers would facilitate higher quality scientific investigations and science discourse opportunities than teachers in comparison classrooms that used less-explicit materials and teaching techniques. We were also interested in exploring whether the effects of Sci2 on instructional quality would maintain. Three research questions were addressed.
To what extent do teachers in the Sci2 condition differ from comparison teachers in their quality of scientific investigations?
To what extent do teachers in the Sci2 condition differ from comparison teachers in their quality of science discourse?
To what extent are these effects maintained after the Sci2 program is completed?
Method
Research Design
This study undertook a secondary analysis of direct observation data, which were initially gathered during an efficacy trial conducted by Doabler et al. (2021). The original study investigated the treatment effects of Sci2 on the science outcomes of 294 second-grade students across 18 classrooms (see Doabler et al., 2021). Nine of the 18 classrooms were randomly assigned to a treatment condition and nine to a comparison condition. Treatment classrooms implemented the Sci2 program and comparison classrooms implemented BAU science instruction, combining STEMscopes (Accelerate Learning, 2017) and district-developed science materials. Trained research staff conducted four direct observations in each participating classroom. Using the observation data, this study explored the effects of the Sci2 program and its corresponding professional development workshops on the quality of second-grade science instruction. In this study, we define the quality of science instruction as the extent to which second-grade classrooms facilitate engaging scientific investigations and purposeful science discourse opportunities (a) between teachers and students and (b) among students.
Setting and Participants
The study took place in a suburban school district in the south with an enrollment of approximately 50,000 students, including 3,709 second-grade students. Demographic data indicated that 38% of students in the district identified as White, 31% Hispanic, 18% Asian, 9% African American, 4% Two or More Races, and 1% Pacific Islander. In addition, 28% of the students received free or reduced-price lunch, 11% were considered English learners, and 10% of students received special education services.
A total of 18 second-grade teachers across three campuses participated in the study. All classroom teachers were certified and provided science instruction 5 days per week. On average, teachers had 10.92 years of overall teaching experience (SD = 2.50) with a mean of 7.31 years of experience in second-grade classrooms (SD = 4.39). All participating teachers identified as female and their race as White. Two identified their ethnicities as Hispanic. Four participating teachers held master’s degrees in education. The majority of participating teachers conducted academic instruction in English, except one who was part of the district’s Dual Language Spanish program. For this study, however, all science instruction was delivered in English.
Experimental Conditions
Treatment classrooms
Teachers randomly assigned to the treatment condition implemented Sci2, a 10-lesson second-grade science program designed to promote three-dimensional learning of Earth science content. On average, the duration of Sci2 lessons was 41.2 min (SD = 10.1) taught to 17.1 treatment students (SD = 3.2) per classroom. The Sci2 program devotes particular attention to how wind and water change Earth’s landscapes and how these changes can occur very quickly or slowly over time. To illustrate, consider flooding caused by major storms. Hurricanes Katrina and Harvey ravaged New Orleans and Houston metropolitan areas, respectively. The impact of these catastrophic events had no boundaries, affecting learners from all backgrounds. As such, we contend all students deserve to learn and understand the processes that shape our planet and impact living organisms.
Scientific Explorers-2 program
At its core, Sci2 uses purposefully designed tasks, activities, and investigations to promote three-dimensional science learning. These direct experiences allow students to act as scientists do when they investigate the world. For example, the program offers an opportunity for students to investigate the effects of erosion in the mountains after a thunderstorm and how weathering can break rocks down over time. To accommodate students with or at risk of LD, Sci2 is anchored to a systematic, explicit instructional framework. We elected to engineer Sci2 in this way because there is a converging knowledge base that spotlights strong support for using systematic, explicit instruction to increase the academic outcomes of students with or at risk of LD (e.g., Fuchs et al., 2021; Therrien et al., 2017; Vaughn et al., 2022). By embracing a systematic, explicit instructional framework, Sci2 offers guidelines for how teachers can explicitly demonstrate or explain targeted science content as well as facilitate opportunities for students to demonstrate their scientific understanding through science discourse and collaborative hands-on experiences with peers. To extend learning opportunities and address the misconceptions that many students bring to the science classroom, the Sci2 program also offers recommendations for how teachers can provide timely, specific academic feedback.
The Sci2 program centers on five instructional activities. The first activity, Spark Your Thinking, reviews prior content, activates students’ background knowledge, and builds student interest in the lesson’s targeted phenomenon. Vocabulary represents the program’s second activity and aims to pre-teach a select set of key science vocabulary terms. Sci2 employs direct vocabulary routines because research suggests their beneficial impact on the outcomes of students with or at risk of LD (Coyne et al., 2010). The Sci2 vocabulary activities require teachers to introduce vocabulary terms using student-friendly definitions and then facilitate opportunities for students to (a) observe pictures of the terms in relevant scientific contexts and (b) hear examples of the terms in scientifically based sentences. Students then receive opportunities to collaborate with peers to discuss what they know about the new vocabulary terms and then share their understanding with the whole class.
The program’s third activity (Read Aloud) is a shared reading activity that uses researcher-developed, expository science texts that contain captivating characters such as Gina the Geologist. Research suggests read alouds are an effective way to help at-risk learners access and comprehend informational text (Baker et al., 2013). Therefore, we designed the Read Aloud activities to spark whole-class discussions and build students’ knowledge about the program’s targeted phenomena (e.g., how rainfall can alter the landscape around the school).
The fourth activity, called Investigation, uses collaborative, scientific investigations for students to build and work with physical and simulation-based models of targeted phenomena. For example, the program contains a multi-day investigation about the breaking down of rocks, allowing students to model the concept of weathering using nail files on different types of rocks. To support students with or at risk of LD, investigation activities offer guidelines for how teachers can model the objectives of the experiments and show how to properly and safely work with the science models. Each lesson concludes with Share Your Thinking, which is a whole-class discourse activity to promote student verbalizations of lesson findings.
Professional development
Prior to implementing the Sci2 program, treatment teachers participated in two 3-hour professional development workshops. Both workshops were conducted by research staff and focused on building teachers’ content and pedagogical knowledge for delivering early science instruction. Specifically, the workshops sought to develop teachers’ knowledge of Earth’s systems, validated principles of instruction, and implementation of the Sci2 program. During the workshops, teachers practiced implementing the core components of the Sci2 program and received feedback on their instructional delivery from research staff. In addition, the workshops offered treatment teachers practice facilitating scientific investigations and managing science discourse opportunities. Treatment teachers also received in-classroom coaching support to improve their fidelity of implementation with Sci2 and the quality of science instruction. Doabler et al. (2021) reported that teachers implemented Sci2 with strong fidelity when observing 20% of total lessons, demonstrating an overall average fidelity rating of 4.06 on a 5-point rating scale (SD = 0.73).
Comparison classrooms
Teachers in the comparison condition provided BAU science instruction, including commercially available (STEMscopes; Accelerate Learning, 2017) and district-developed science materials. STEMscopes is a core science program that employs an inquiry-based 5E approach (i.e., Engage, Explore, Explain, Elaborate, Evaluate; Bybee et al., 2006) to teach science content identified in the NGSS (NGSS Lead States, 2013). STEMscopes includes hands-on labs, simulated experiences, and science reading activities. On average, comparison classrooms delivered 37.8 min (SD = 7.6) of daily science instruction and enrolled 18.1 (SD = 1.4) second-grade students.
During the Sci2 efficacy trial, comparison teachers were tasked with delivering 45 minutes of daily science instruction focused on Earth’s systems. Classroom observations conducted by research staff noted comparison classrooms used a variety of instructional mediums to teach such content, including hands-on investigations and shared book reading of narrative and expository science texts. Observers documented comparison classrooms typically using contextual vocabulary instruction, such as bringing key terms to the attention of students during a scientific investigation. Pre-teaching of science vocabulary was not observed. Finally, observers noted that comparison teachers incorporated educational technology into their science instruction. However, the technology was used primarily to display science content on large screens. Comparison students had no direct interactions with the educational technology.
Observation Measures
Trained research staff used two observation instrument to rate the quality of science instruction across treatment and comparison classrooms. The first focused on capturing the quality of scientific investigations, while the second targeted the quality of science discourse. Six research staff members were trained in the use of each instrument in a 4-hour training led by the principal investigator. Trained staff observed each treatment and comparison classroom four times. This study used data collected by the two instruments as its targeted dependent variables.
Instructional quality of scientific investigations
The Instructional Quality of Scientific Investigations (IQSI; Doabler et al., 2020) is a researcher-developed, moderate-inference tool designed to assess the quality of scientific investigations implemented in science instruction. For this measure, trained observers rated teachers on a scale of 0 (i.e., none) to 4 (i.e., high) on how well they incorporated six principles of instruction: (a) teachers modeling science content, (b) teachers preparing students for investigations, (c) students carrying out investigations, (d) students using models, (e) teachers scaffolding instruction, and (f) teachers providing independent practice opportunities.
Science discourse instrument
The SDI (Fishman et al., 2017) requires observers to rate how well teachers facilitated opportunities for science discourse during science instruction. The SDI targets three discourse components at the teacher level (i.e., ask, press, and link) and three at the student level (i.e., explain/claim, co-construct, and critique). Our team also rated academic feedback to complement the discourse components at the teacher level. Trained observers rated the quality of these seven components during core science instruction. Each observed lesson was divided in half (~22 min each) and rated across two separate observation periods. For each observation period, observers rated the quality of the seven components using a 5-point Likert-type scale, where zero signified no science discussion and four represented consistent, high-quality discussion. Scores from each component were averaged across the two observation periods.
Within the teacher practices, the ask component focuses on how well teachers include open-ended questions in their instruction. Compared to short-answer questions that seek to assess students’ knowledge of facts, open-ended questions lend themselves to the facilitation of more robust discussion, given the possibility of multiple answers that could be valid (Fishman et al., 2017). These multiple answers, in turn, result in argumentation among students to be able to arrive at an agreed upon “correct” answer (Fishman et al., 2017). Relatedly, the press component documents how well teachers use guiding and follow-up questions to encourage their students to expand on their answers, which furthers student argumentation and collaboration to generate answers with supporting evidence. The SDI also measures how well teachers link students’ answers to those provided by their peers, with a goal for teachers to promote a common understanding of a targeted concept, practice, or phenomenon. Finally, the feedback component measured how well teachers provided affirmative or corrective feedback to students.
At the student level, the SDI documents the quality of three discourse components. The first component, explain/claim, measures how well students elaborate and provide evidentiary support for their responses. In a similar vein, the co-construct component documents students’ ability to build upon a prior answer or ask their peers to elaborate on their responses. The final student component, critique, focuses on how well students analyze their peers’ ideas. This is an important skill within science discourse as it suggests a robust understanding of science content.
Observation Procedures
Research staff conducted four observations in each classroom. The first observation (i.e., Time 0) was considered a baseline phase where all teachers (treatment and comparison) implemented their typical classroom science instruction. This initial observation round took place prior to classrooms being randomized to treatment or comparison conditions. Next, two observations occurred during the Sci2 program implementation. For the data analysis, we used the second observation to observe program effects (considered Time 1). We used the second observation because it captured the effects after treatment teachers had received at least one coaching session regarding implementation of the Sci2 program. The second timepoint, therefore, allowed us to capture a more reliable assessment of the program effects once teachers had more familiarity and experience with the Sci2 framework. Finally, observers conducted the final round of observations 1 to 4 weeks after the conclusion of Sci2. At this final observation (Time 2), all teachers implemented their typical classroom science instruction. In all, observers conducted a total of 72 observations. All observations were scheduled in advance with teachers.
Interobserver Reliability
Twenty-six observations (36%) included two observers who simultaneously evaluated inter-observer reliability. This study calculated two forms of interobserver reliability using an item-for-item analysis, where the primary observer’s score was compared directly to the secondary observer’s score for each observation item. First, we calculated exact matches or agreements (i.e., if primary and secondary observers provided the same rating for a given item). Second, because obtaining exact agreement on moderate-high inference observation tools can be difficult even when making informed judgments (Gersten et al., 2005; Valentine & Cooper, 2003), we also examined whether there were one-point discrepancies between primary and secondary observers’ ratings. To calculate both forms of inter-observer reliability, the number of matching scores was divided by the number of matching scores plus nonmatching scores and then multiplied by 100. In terms of exact agreement, interobserver reliabilities for scientific investigations (i.e., the ISQI measure) and science discourse (i.e., the SDI measure) were 58.33% and 68.13%, respectively. Regarding the 1-point discrepancies, estimates for scientific investigations and science discourse were 74.36% and 94.51%, respectively.
Data Analysis
To examine whether teachers in the Sci2 condition differ in their quality of scientific investigations and discourse across intervention time points, we analyzed the data using a piecewise linear growth curve model (Harring et al., 2020). Using piecewise linear growth modeling allowed us to capture the difference in teachers’ science practices between the baseline (i.e., Time 0; 1–4 weeks before the intervention began) and Time 1 (in week 2 during the implementation of Sci2) and the interval from the baseline through Time 2 (2–4 weeks after the intervention ended). Piecewise linear growth curve models offer greater flexibility to model data when the growth process of a variable changes over time and cannot be adequately represented by a single linear model (Ning & Luo, 2017). Because of the relatively limited number of schools (n = 3) in the study sample, we treated schools as fixed rather than random effects and included L – 1 dummy codes (where L is the number of schools) for school membership (McNeish & Stapleton, 2016). The piecewise growth model with L – 1 dummy code is specified as follows:
where
Results
Means, standard deviations, and bivariate correlations between study variables are presented in Table 1. The quality of scientific investigations at baseline was moderately related to the quality of scientific investigation at Time 2 (r = .36) and to the quality of scientific discourse at baseline (r = .44). Quality of scientific investigations at Time 1 was strongly related to the quality of science discourse at Time 1 (r = .66). The quality of scientific investigations at Time 2 was also strongly related to the quality of science discourse at Time 2 (r = .66).
Descriptive Statistics for Measures in the Treatment and Control Groups.
Note. *p < .05, **p < .01, ***p < .001.
Baseline Equivalence
To capture baseline equivalence, we observed all classrooms before teachers were randomly assigned or received Sci2 materials or professional development related to the study. We used piecewise linear growth curve models to capture changes in science instruction quality over time, providing a nuanced understanding of teacher learning over time. Results of the piecewise linear growth curve models for the two outcomes are presented in Table 2. For the quality of scientific investigations, Sci2 and comparison teachers started at a similar level at baseline (
Results of Piecewise Linear Growth Curve Model.
Note. SE = standard error.
*p < .05, **p < .01, ***p < .001.
Scientific Investigations
To determine the impact of the Sci2 program on the quality of scientific investigations in each condition, we examined the growth rates across conditions, holding constant baseline effects. Our results demonstrated that the Sci2 teachers had a statistically significantly faster rate of growth between the baseline and Time 1 (
Science Discourse
Following the same statistical procedure as with science investigations, we used a piecewise linear growth model to examine the effects of Sci2 on classroom science discourse opportunities. For this measure, the teachers in Sci2 classrooms demonstrated a marginally faster rate of growth between baseline and Time 1 (
Discussion
Scientific investigations and student discourse opportunities are cornerstones of early science learning. When designed and delivered well, they can serve as important backdrops for students to develop a deep and robust understanding of key science practices, concepts, and ideas (NRC, 2012). There is a nascent understanding, however, of the quality of scientific investigations and discourse opportunities delivered in early science education. This work, therefore, examined whether and to what extent Sci2, a second-grade science program with demonstrated promise (Doabler et al., 2021), improved the quality of teachers’ facilitation of scientific investigations and student science discourse opportunities, and whether these impacts were maintained beyond the implementation of the program. Such findings highlight important implications for how early science instruction can be designed and delivered to promote three-dimensional science outcomes.
Quality of Scientific Investigations
The first research question explored to what extent teachers differed across conditions in their overall quality of scientific investigations. Our findings suggest that teachers implementing the Sci2 program produced significantly higher quality scientific investigations than comparison peers when controlling for baseline differences. The IQSI instrument measured the quality of scientific investigations by documenting the frequency with which teachers were observed: (a) overtly demonstrating and explaining scientific content, (b) providing physical models for students to model targeted phenomena, (c) scaffolding instruction for at-risk learners, (d) facilitating opportunities for students to explore key phenomena using the NGSS science and engineering practices, and (e) offering opportunities for students to plan and carry out investigations.
These findings align with the work conducted by Whittaker et al. (2020), who found that providing teachers with a systematic, explicit science program can positively impact teacher outcomes. This alignment may be due to several reasons. Above all, it may be that many commercially available science programs intended for early elementary classrooms lack the instructional supports needed to meet the needs of students who face opportunity gaps in science. A preponderance of evidence suggests that students with or at risk of LD significantly benefit when instructional tasks and activities are systematically designed and explicitly delivered (Fuchs et al., 2021; Vaughn et al., 2022). Yet, despite this, many science programs take a discovery-based approach to science instruction and learning (What Works Clearinghouse [WWC], n.d.). Consequently, many students may have limited access to science instruction that is designed to embrace their strengths and meets their instructional needs.
Our findings could also be attributed to the fact that we complemented a systematic, explicit science program with professional development workshops and in-class coaching support. Our professional development workshops offered hands-on opportunities for treatment teachers to become familiar with the design and delivery features of the Sci2 program. In addition, each treatment teacher received at least three coaching visits during the implementation of the Sci2 program. Such efforts likely exceeded the implementation support that teachers in the control condition received regarding science instruction.
Quality of Science Discourse
The second research question explored differences in the quality of science discourse across conditions during the implementation of the Sci2 program. While total scores on the SDI instrument (Fishman et al., 2017) did not differ significantly between groups, on average, Sci2 teachers delivered higher quality discourse opportunities than their comparison peers (g = 1.20, [95% CI (−0.58, 2.99)]. The SDI measured both student and teacher discourse behaviors, including (a) teacher frequency of asking open-ended questions, pressing students for deeper reasoning and explanations, and linking different student responses and (b) student verbalizations to explain science concepts based on evidence, co-construct evidence-backed explanations with peers, and critique claims based on evidence. These findings, while preliminary, are encouraging given the critical role of science discourse in student science learning. Arguably, verbalization opportunities allow students to express their scientific thinking and understanding more efficiently than other response options such as written answers.
Maintenance Differences Between Conditions
The third research question explored whether Sci2 and comparison teachers differed in the quality of scientific investigations and science discourse opportunities provided after the conclusion of the efficacy trial. As in the original study (Doabler et al., 2021), we collected maintenance data 1 to 4 weeks following the end of the Sci2 program. In this study, results suggested that Sci2 teachers continued to deliver higher quality scientific investigations and discourse opportunities than their comparison peers. While these findings were not statistically significant and remain preliminary, the effect sizes for both scientific investigations (g = 0.78) and science discourse (g = 0.58) during the maintenance phase show promise of persistence of treatment effects. It is plausible that even though Sci2 is of short duration (2–3 weeks), its systematic design helped teachers maintain the capacity to facilitate group-level science talk to their BAU science instruction, such as posing open-ended questions and providing academic feedback. However, future research with larger participating samples are needed to obtain a more robust understanding of maintenance effects on teacher outcomes.
Limitations
When interpreting this study’s findings, several limitations should be considered. First, the study included 72 direct observations, which limits its statistical power. Doabler et al. (2021) faced limited resources to conduct additional observations during the treatment period. The onset of the COVID-19 pandemic also cut short the number of maintenance observations Therefore, to obtain a more robust estimate of instructional quality in science classrooms, future research should consider daily observations of science instruction during the treatment phase and continue to monitor the persistence of treatment effects well beyond program implementation.
Another potential limitation was the use of an observation tool that was originally designed for capturing science discourse in upper elementary classrooms. As such, future refinements to the SDI may be necessary to best capture the quality of science-related verbalizations among early elementary students. For example, in the early elementary grades, students are still learning how to articulate their science thinking and form arguments around evidence. Moreover, early elementary students are just beginning to co-construct their own responses and scrutinize those of their peers. Therefore, it may take a more nuanced approach to document the developmental progression of science discourse among early elementary students.
On a related note, Doabler et al. (2020) developed the IQSI for the purposes of the initial Sci2 efficacy trial. Consequently, this observation tool was not previously applied. Like the availability of student science outcome measures, there is a shortfall of observation tools intended for early elementary science classrooms. This paucity forced our team to develop the IQSI. Continued research is needed, therefore, to further validate this new observation tool.
Finally, research suggests obtaining exact agreement with quality rating systems, such as the SDI and IQSI observation tools, can be difficult. We found this held true in this study, where exact inter-rater reliability for the IQSI and SDI was 58.33% and 68.13%, respectively. Unlike low-inference observation tools, such as those by Nelson-Walker et al. (2013) and Doabler et al. (2018) that employ strict coding protocols, moderate-inference tools are dependent upon on an observer’s impressions to rate the quality of a predetermined teaching event or behavior (Gersten et al., 2005). As noted in this study, we found substantive differences when examining for exact agreement with both observation tools. However, Fishman et al. (2017) encountered similar challenges when first implementing the SDI, reporting low inter-rater reliabilities at the individual teacher and student discourse practice level (e.g., whether a teacher connects or links students’ ideas and positional statements). As such, it may take additional training and support with these rating systems to obtain more robust estimates of interobserver reliability. In addition, it may behoove the field to complement rating systems, such as the SDI and the IQSI, with lower-inference observation protocols to obtain a richer understanding of science instruction. In this way, researchers will be able to document instructional quality as well as the frequency of evidence-based practices (Doabler et al., 2015).
Implications for Practice
Overall, findings suggest that systematic, explicit science programs can support teachers in facilitating high-quality scientific investigations and science discourse opportunities. While preliminary, these findings are critical as they offer a practical pathway for how teachers can support three-dimensional science learning among all students, especially students with or at risk of LD. For example, research suggests that at-risk learners benefit when teachers explain and demonstrate new and complex academic content (Fuchs et al., 2021; Vaughn et al., 2022). Therefore, in the context of science instruction, we recommend that teachers overtly share the overall purpose of scientific investigations with students and clearly explain to them how such investigations are linked to NGSS disciplinary core ideas and crosscutting concepts. Helping students make connections between the “what” and the “why” of scientific investigations can promote a deeper understanding of science learning. In addition, we suggest that teachers concretely demonstrate to students how to engage in the practices of science. For example, the Sci2 program offers instructions for how teachers can appropriately demonstrate developing scientific models, such as using a nail file against a rock to simulate the process of weathering. Thus, by offering clear explanations and concrete demonstrations, teachers can meet the diverse range of student needs in early elementary classrooms.
In addition, the SDI observation tool captured to what extent teachers provided robust student discourse opportunities. Research suggests student verbalization opportunities are important for teaching students with or at risk of LD (Doabler et al., 2015; Fuchs et al., 2021). To maximize student verbalizations, we recommend that teachers probe students to (a) explain their thinking in response to well-constructed open-ended questions, (b) situate their responses in evidence, and (c) work with peers to co-construct ideas. Teachers might set up a discourse routine to facilitate rich verbalizations and include structured prompts that students learn and build proficiency with across the school year. Such prompts might include “What evidence supports your claim?” or “How does your idea build on your classmate’s explanation?” Teaching students how to use such routines can not only deepen their understanding of core ideas and concepts, but also engage in key practices of science such as scientific argumentation.
Implications for Research
In terms of future research, one implication is for continued work in the early elementary grades to support three-dimensional science learning among all learners. Research suggests opportunity gaps in science surface as early as kindergarten (Morgan et al., 2016). Therefore, additional studies are needed to understand how to improve the quality of core science instruction in the early grades. Another implication is for the field to continue to design and test the impact of systematic, explicit science programs in early elementary grades. To date, little research has focused on these types of science programs (WWC, n.d.). However, given the strong body of evidence behind systematic, explicit instruction (Fuchs et al., 2021), the field may want to reconsider its stance on incorporating this instructional approach into U.S. science education (see Zhang et al., 2022). It may take the amalgamation of systematic, explicit instruction and more inquiry-based approaches to fully meet the objectives of federal initiatives around providing all students with equitable, high-quality science instruction (see ed.gov/stem).
Finally, it is critical to further spotlight the importance of teachers as a mechanism of change toward promoting student academic outcomes. While most efficacy trials prioritize student responsiveness to academic interventions, investigating variation across classrooms, campuses, and districts is becoming commonplace in educational research (Cowan et al., 2022). To capture some of this variation, emerging research has highlighted the mediating relationship between teacher behaviors and student achievement (Doabler et al., 2019; Whittaker et al., 2020). Such findings, combined with work as in this study examining teacher outcomes specifically, may help further unpack the black boxes of science programs to ensure student science success, especially those with or at risk of LD.
Conclusion
Recognizing that systematic, explicit programs impact teachers’ instructional quality in core reading (Nelson-Walker et al., 2013), early mathematics (Doabler et al., 2018), and pre-kindergarten science (Whittaker et al., 2020) classrooms, this study sought to explore whether an early science program that centers on a systematic, explicit platform could improve the quality of science instruction delivered in second-grade classrooms. Our findings suggest the use of such a program significantly improved science teaching practices that teachers employed in second-grade classrooms. Thus, we contend there is an urgent need to conduct similar types of investigations at all grade levels and across all domains of science. If, as a nation, we want to promote science proficiency for all, arguably more needs to be done to raise the bar in science for students with or at risk of LD.
Supplemental Material
sj-docx-1-rse-10.1177_07419325251314119 – Supplemental material for The Role of Instructional Design in the Delivery of Early Science Instruction
Supplemental material, sj-docx-1-rse-10.1177_07419325251314119 for The Role of Instructional Design in the Delivery of Early Science Instruction by Steven A. Maddox, Jenna A. Gersib, Anna-Maria Fall, Maria A. Longhi, William J. Therrien, Greg Roberts, Jason B. Phelps, Shadi Ghafghazi and Christian T. Doabler in Remedial and Special Education
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The research reported here was supported by the National Science Foundation (grant no. 1720958).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
