Sage Journals: Discover world-class research

Abstract

Instructional leaders require timely and predictive data to guide post-pandemic systemic changes. We investigated the predictability of an observational instrument, the Rigor Appraisal, and its association with achievement and other school effectiveness measures in a sample of 53 schools in Illinois. We found that increases in the Rigor Appraisal had a small to moderate and statistically significant association with achievement, a positive and moderate statistically significant association with attendance, a negative relationship between referrals and suspensions, and a positive and moderate statistically significant association with the 5Essentials. Additionally, the implementation of academic teaming was associated with greater achievement in schools with more low-income students. We also found that leaders who conducted non-evaluative instructional walks experienced a positive and statistically significant impact on achievement. As such, the Rigor Appraisal merits further study in different contexts, as it shows promise as a leading indicator that can be used for the continuous improvement of instructional systems.

Plain language summary

This study investigated the utility of the Rigor Appraisal observation instrument as an aid to school leaders in improving instructional systems in schools. We assessed the relationships between the measures in this instrument and student achievement as well as other indicators of school effectiveness using sample data from 48 to 53 schools in Illinois during the 2021 to 2022 school year. We found that higher Rigor Appraisal scores were associated with higher student achievement and increased weekly attendance, as well as with lower rates of student behavioral referrals and suspensions. Higher scores in one specific measure, Activating Student Teams to Achieve the Standard, were also associated with greater achievement in schools with larger populations of students from low-income backgrounds. These findings indicate that the Rigor Appraisal is a promising tool to help instructional leaders to make timely, proactive decisions that improve systems affecting the quality of teaching and student learning in their schools. This study was limited to data from one school district and its findings may not be applicable to other districts and schools. Further study is warranted.

Keywords

academic achievement,instructional leadership instructional systems instructional improvement observation

Introduction

The pandemic has revealed that public elementary and secondary educational systems in the United States have not addressed the needs of diverse learners (Center on Reinventing Public Education 2022b). Student performance in reading and mathematics has dropped substantially in the United States since the COVID-19 pandemic. The National Assessment of Educational Progress’ (NAEP) long-term trend assessments of reading and math for nine-year-old students in 2022 revealed the largest average score decline in reading since 1990 and the first ever decline in math scores (National Assessment of Educational Progress, 2022a). The results of the main NAEP assessments for 2022 also revealed overall lower scores in reading and math for fourth- and eighth-grade students (National Assessment of Educational Progress, 2022b, 2022c, 2022d, 2022e) compared to 2019 scores. Disaggregated by race, ethnicity, and socioeconomic status, most student groups showed a decline. Although current preliminary studies have shown some rebounding of academic performance, it is still significantly lower than pre-pandemic levels (Lewis & Kuhfeld, 2022).

Current estimates of the median learning loss have been a quarter of a year for reading and half a year for math, with high-poverty school districts losing as much as a third of a year in reading and two-thirds of a year in math (Fahle et al., 2022). These declines indicate that it may take many years for student achievement to recover to pre-pandemic levels (Lewis & Kuhfeld, 2022). Declining academic performance and the slow pace of recovery could result in lower rates of high school graduation, students graduating high school without essential skills for college and career success, and lower rates of college completion, all leading to decreases in lifetime earnings and losses to the national gross domestic product (GDP) (Azevedo et al., 2020; Betthäuser et al., 2023; Center on Reinventing Public Education, 2022b; Hanushek, 2022; Kane et al., 2022).

These troubling findings have been magnified for students of color and students from low-income households. The 2022 NAEP scores revealed a continuation of historically persistent gaps between fourth- and eighth-grade White students and American Indian/Alaskan Native, Black, and Hispanic students in both reading and math (National Assessment of Educational Progress, 2022b, 2022c, 2022d, 2022e) and widening gaps among groups who were already performing at the lowest deciles (Center on Reinventing Public Education, 2022a, 2022b; Kuhfeld et al., 2022; Lewis & Kuhfeld, 2022; Lewis et al., 2022). These persistent gaps also affected low-income students most negatively, widening the gap in performance between these students and their more affluent peers (Hammerstein et al., 2021).

Effective instructional leaders are critical for enacting the systemic changes needed to accelerate learning and meet the needs of all students. Although the effect of principals on students is nearly as large as that of teachers (Grissom et al., 2021), there is little specificity regarding the most essential features of instructional leadership (Grissom et al., 2013) and little guidance on how leaders can overcome persistent disparities in student outcomes based on race, ethnicity, home language, ability, and socioeconomic status (Ishimaru & Galloway, 2014). Classroom walkthroughs have been frequently cited as essential to school improvement, but principals may not collect the right data to help them differentiate between high- and low-quality teaching practices and provide actionable feedback to teachers (Boston et al., 2017; Grissom et al., 2013).

Leaders in elementary and secondary schools need better data and tools for data-driven decision-making to improve the quality of instruction, accelerate learning for all students, and eliminate achievement gaps (Hamilton et al., 2009; Mandinach & Schildkamp, 2021; Marsh et al., 2006). Historically, decision-making in education has narrowly focused on assessment data and has not collected information that would help educators develop students to their fullest potential, identify systemic inequities within schools, and highlight opportunity gaps (Dodman et al., 2023; Isaacs, 2021; Mandinach & Schildkamp, 2021; Marsh et al., 2006).

We created the Rigor Appraisal Instrument to help leaders inform their decisions with leading indicators to improve instructional systems, track instructional improvements, and provide meaningful feedback to teachers. As such, we expected the Rigor Appraisal to be predictive of achievement and other measures of school effectiveness. We propose the following research questions used in the investigation:

Is the Rigor Appraisal predictive of achievement?

Is the Rigor Appraisal predictive of other school effectiveness and wellness measures?

In the following sections, we outline the theoretical and empirical framework of the Rigor Appraisal instrument, the methods and data used in the research study, the results of the study, a review of the policy implications and limitations of the study, and offer concluding remarks.

Theoretical and Empirical Foundations

The theory on which we based the Rigor Appraisal is that there are leading indicators of educational quality that, when collected and analyzed regularly, provide actionable data that leaders can use to proactively improve the quality of education students receive. An indicator is leading if it provides early signals that allow educators to predict performance and make strategic and timely investments in time, effort, and resources to improve student learning (Foley et al., 2008; National Academies of Sciences, Engineering, and Medicine, 2019). Leading indicators also inform productive inquiries about conditions and organizational systems that affect the quality of education students’ experience (Supovitz et al., 2012). In contrast, large-scale standardized assessment scores do not meet these criteria and are lagging measures of school effectiveness (May & Sanders, 2013). School leaders in multiple districts throughout the United States have used the Rigor Appraisal to identify and implement measurable system-based pathways for whole-school improvement. This study represents the application of the Rigor Appraisal instrument in a real-world setting to validate its usefulness as a leading indicator of schoolwide academic performance. It represents one of a series of studies to assess the predictability of the tool.

The Rigor Appraisal Instrument has five pillars: (1) Creating Conditions for Learning Rigorous Standards, (2) Using Standards-based Student Evidence, (3) Activating Student Teams to Achieve the Standard, (4) Verifying Learning to Take Action within a Lesson, and (5) Using Data to Track Student Progress toward Standards. In this section, we discuss each pillar, including its theoretical and empirical underpinnings.

Creating Conditions for Learning Rigorous Standards

The Creating Conditions for Learning Rigorous Standards pillar measures observable systems including school and classroom climate, support for student self-regulation, and high-functioning teacher teams. Optimal interactions between these systems at the school level foster a rigorous learning environment.

School and Classroom Climate

High-quality and consistent interactions between school staff and students create a positive school climate (Akman, 2021; Öngel & Tabancalı, 2022; Suldo et al., 2006). School and classroom climates affect students’ well-being, life satisfaction, ethnic and moral identity, self-efficacy, and resilience (Aldridge et al., 2018; Graham, 2022; VanLone et al., 2019). Climate is also positively associated with student achievement (Benbenishty et al., 2016; Berkowitz et al., 2017; Davis & Warner, 2018; Konold et al., 2018; Osher et al., 2020). A positive school climate can narrow the achievement gap among student subgroups (Suldo et al., 2006).

Student Self-Regulation

Teaching strategies that build students’ self-regulation and co-regulation skills support student engagement and sustain the cognition, emotions, and behaviors required to accomplish learning goals (Darling-Hammond & Cook-Harvey, 2018; Hinnant-Crawford et al., 2016; Zimmerman & Kitsantas, 2014). Many positive outcomes are associated with self-regulation, including improved student achievement (Blitz et al., 2020; Marantika, 2021; Zimmerman & Kitsantas, 2014), the development of visuomotor, mathematics, emergent literacy, and vocabulary skills in young children (Becker et al., 2014; DeFlorio et al., 2019; Williams et al., 2016), reading comprehension, vocabulary, and math in elementary school students (Day & Connor, 2017; Day et al., 2015; Lenes, et al., 2020; Skibbe et al., 2019), and improved mathematics performance of secondary and post-secondary students (Cleary et al., 2017; Musso et al., 2019). There is also evidence that students in high-achieving schools use more self-regulated learning strategies than those in low-performing schools (Guo et al., 2019).

Collaboration in Teacher Teams

To establish effective teacher teams, school leaders support and encourage professional growth, autonomy, and teacher ownership in shared decision-making processes, setting high expectations for academic performance (Argon & Ekinci, 2016; Conner, 2015; Hirsh & Segolsson, 2019; Katzenbach & Smith, 1993; Pierce et al., 2003; Schaap & De Bruijn, 2018; Seglem, 2017). Effective teacher–team collaboration can positively impact school culture and increase student achievement (Green & Allen, 2015; Kraft & Falken, 2020; LeClerc & Moreau, 2011; Muñoz & Branham, 2016; Ratts et al., 2015; Ronfeldt et al., 2015; Tichnor-Wagner et al., 2016). The higher the quality of collaboration in teacher teams, the more effective teachers are at improving student achievement (Park et al., 2018; Ronfeldt et al., 2015; M. Sun et al., 2017).

Using Standards-Based Student Evidence

Standard-based student evidence indicates the presence of structures that support consistent and systematic teaching and learning tasks with student evidence at the taxonomy level of the standard, student progress toward mastery of the standards, and the quality of tasks for collaboration among students.

Target, Task, and Taxonomy Alignment

All students benefit from rigorous and rich learning experiences. When appropriately designed, even students who have previously performed at low levels can simultaneously attain foundational skills as they engage in “authentic intellectual work” (Newmann et al., 2001: 13) that requires higher levels of cognition (Scardamalia & Bereiter, 2022; Y. Sun et al., 2010). Teachers must be able to accurately judge task complexity and design learning to enact appropriate metacognitive strategies. However, the learning tasks they create often do not match the taxonomic levels of the standards and are at lower levels of cognitive complexity (Anees, 2017; Pieschl, 2009). The alignment of learning tasks with the rigor of the standards requires strategic planning of the curriculum, instruction, and assessment to ensure that students demonstrate learning at the level of performance or taxonomy of the standards (Maye, 2013; Ziebell & Clarke, 2018).

Tasks Designed for Teams

Alignment between targets, tasks, and taxonomies is essential. The task must also be suitable for teamwork in groups, be teamworthy (Lotan, 2003; Pennant, 2018), and produce evidence of learning. Appropriately structured teams and tasks are observable features of student learning communities in autonomy-supportive classrooms (Reeve & Cheon, 2021; Reeve & Su, 2014; Wehmeyer et al., 2002, 2017).

Student Work Produces Evidence of Mastery

Teachers decide on the evidence that they expect students to produce, which reflects their mastery of the standards. Both mastery and appropriate alignment are interdependent, as the level of the learning task predicts the level of student performance (Elmore, 2008). The concept of student mastery of standards is rooted in B. S. Bloom’s (1971) mastery learning, in which there must be instructional alignment between the specific learning standards that students are expected to meet and the lessons, feedback, corrective, and enrichment activities. To achieve mastery learning, instruction must provide feedback, correction for students who have not mastered the learning, and enrichment for those who have attained mastery (Guskey, 1997, 2007).

Activating Student Teams to Achieve the Standard

This pillar measures the degree to which students are in academic learning teams, whether teams develop resilience through productive struggles, and whether they function autonomously. Academic teams demand more from students than typical small groups or cooperative learning strategies. In teams, students must verify the effectiveness of their own learning processes and the quality of their collective work.

Academic Teams

Academic teams are student-led, small, and diverse with clear protocols for engaging in standard-based academic work (Toth & Sousa, 2019). Each member of an academic team has a specific responsibility, such as being a facilitator or tracking group progress toward the attainment of standards. Students develop elevated levels of autonomy and critical thinking skills through collaboration in academic teams (Francisco, 2013). As with team-based learning (Michaelsen & Sweet, 2008), academic teaming is student centered.

Organizing students in small collaborative groups creates opportunities for interaction, which leads to improved cognitive learning for all students (Valls & Kyriakides, 2013). Post-secondary empirical studies on team-based learning have provided evidence of student achievement gains, higher student satisfaction and engagement, and positive effects on team diversity (Kearney et al., 2009; Newmann et al., 1992; Roberge & Van Dick, 2010; Stahl et al., 2010; Stipek, 1996; Swanson et al., 2019).

Little empirical literature exists at the elementary or secondary levels that directly evaluates the hypotheses of increased student achievement from academic teaming. We have found only one study has directly assessed this relationship (Basileo, 2018). This study used propensity score matching to create a comparable control group by matching teaming students with similar students in a school district whose teachers did not receive professional development in teaming. The results of the study revealed statistically significant and positive effects in both reading and math.

Developing Resilience Through Productive Struggle

Instruction that encourages and supports productive struggle conveys the teacher’s belief that all students can learn and provides opportunities for students to persist as they develop resilience and efficacy in the face of challenging tasks (Ewing et al., 2019; Livy et al., 2018; Reeve & Halusic, 2009; Reeve & Su, 2014; Reeve et al., 2020; Warshauer, 2015; Zeybek, 2016). The term productive struggle refers to the effort that students must expend to make sense of what they are learning when their answers are not readily apparent (Edwards & Beattie, 2016; Hiebert & Grouws, 2007). Although the concept of productive struggle is rooted in mathematics education, it is applicable to all academic subjects and is essential to transformative learning processes (Murdoch et al., 2020). It develops students’ conceptual transfer skills among disparate knowledge disciplines (Sinha & Kapur, 2021; Valentine & Bolyard, 2018).

Building Student Autonomy Through Teaming

Teachers develop students’ autonomy for learning by creating proper structures and supportive democratic learning environments for them to work autonomously in teams (Han, 2020; Karademir & Akgul, 2019; Reeve, 2009; Reeve & Cheon, 2021; Reeve & Jang, 2006; Reeve & Su, 2014; Stefanou et al., 2004). Cognitive autonomy leads to the deepest level of student engagement in learning (O’Brien, 2018; Wehmeyer, 1997). Students develop autonomy and self-determination as they acquire skills in choice-making, decision-making, problem-solving, goal-setting and attainment, self-regulation, self-efficacy, self-awareness, and self-knowledge within academic teams (Kosko, 2015; Marshik et al., 2017; Ryan & Deci, 2017).

Verifying Learning to Take Action Within a Lesson

This pillar measures teacher monitoring and the use of instructional adjustments, actionable teachers, and peer feedback to improve learning processes and student self-verification.

Teacher Monitoring and Instructional Adjustments

During lessons, teachers recognize trouble spots and respond with micro interventions—adjustments made during the lesson—to clarify their understanding and regain the flow of reasoning (Alibali et al., 2013; Bonne, 2016). Students whose teachers make these adjustments earlier perform at higher levels than those whose teachers delay instructional adjustments (Coyne et al., 2013).

Teachers monitor student progress by examining evidence of learning. When teachers use evidence to adjust instruction and provide feedback to move learning forward and when students use it to adjust their learning strategies, it is formative assessment (Black & Wiliam, 2009; Panadero et al., 2018). A large body of evidence shows that formative assessment practices can improve student outcomes (Black & Wiliam, 1998; Crooks, 1988; Kingston & Nash, 2011; Klute et al., 2017; Wiliam et al., 2004).

Actionable Feedback

Although teachers believe that feedback is important for improving learning and student self-confidence, they find it exceedingly difficult to do it well (Dessie & Sewagegn, 2019; Van den Bergh et al., 2014). Effective teacher feedback uses respectful, non-evaluative, flexible, and informational language that encourages student engagement while helping them understand and correct their own behavior or performance (Hattie & Yates, 2014; Reeve & Halusic, 2009). Actionable feedback clarifies expectations and standards during learning and it provides information about the next steps after the present learning is achieved (Brooks, et al., 2019; Fluckiger et al., 2010; Guskey, 1996, 2001, 2007; Hattie, 2023; Hattie & Timperley, 2007; Looney, 2011; Shute, 2007; Stiggins, 2008). Studies have found a positive relationship between effective feedback and student achievement (Hattie & Timperley, 2007; Hattie et al., 2017; Vollmeyer & Rheinberg, 2005); however, feedback is only effective if learners receive it and understand it (Hattie et al., 2017). Peer feedback through ongoing classroom dialog among students supports understanding as it builds students’ metacognitive skills and self-regulation (Braund & DeLuca, 2018; Davies, 2001). It also provides a sense of belonging and self-worth (Shin & Johnson, 2021).

Students Verify Their Learning

Students’ verification of their own learning is a form of self-assessment in which students evaluate the results and processes of their work (Brown & Harris, 2014). Student self-assessment is a learnable competence that is linked to improved motivation, self-efficacy, self-regulation, behavior, and the quality of relationships between students and teachers (De Smedt & Van Keer, 2018; Glaser et al., 2010; Griffiths & Davies, 1993; Munns & Woodward, 2006; Olina & Sullivan, 2002; Panadero et al., 2016; Schunk, 1996). Accurate self-assessment occurs when students use exemplars or other assessment data such as tests and graded work, and participate in establishing criteria for work quality (Brown & Harris, 2014; Harrison et al., 2015; Ross, 2006).

Self-assessment works most effectively when students receive feedback from their peers in a classroom climate that provides psychological safety for self-evaluation (Brown & Harris, 2014; Harrison et al., 2015; Ross, 2006). Furthermore, when peers hold each other accountable, they feel motivated to perform at higher levels and become more self-reliant as learners (Panadero et al., 2016; Stein et al., 2016).

Using Data to Track Student Progress Toward Standards

This pillar measures the effectiveness of school leaders and teachers by systematically incorporating short-, mid-, and long-cycle data to improve the quality of teaching and learning in the school, and the degree to which the school leader uses data to ensure teacher accountability for student learning.

Short-, Mid-, and Long-Cycle Data to Improve Student Learning

Effective school leaders and teachers use short-, mid-, and long-cycle data to continually improve the quality of teaching and learning in schools (Wiliam, 2006). Long-cycle data span marking periods, semesters, or years. Educators can collect medium-cycle data within and between teaching units, and short-cycle data during and between lessons.

To be formative, an assessment must provide evidence that allows teachers to interpret the learning needs of students and use that evidence to adjust instruction to meet these needs (Black & Wiliam, 1998, 2009). Examination of short-cycle formative, mid-cycle interim, and long-cycle summative data provides essential elements for decision making in schools and districts.

Teacher Accountability

The goal of teacher accountability is for teachers to hold themselves accountable and work within their peer teams to ensure that students progress toward mastery of the standard. The level of accountability influences teachers’ assessments of student performance and the cognitive processes they use to make judgements. These judgements are either category- or attribute-based. Teachers with low accountability tend to use category-based processing to determine student performance levels, whereas teachers with high accountability use attribute-based processing. Category-based processing involves social categories or stereotypes, requires simple cognition and can be inaccurate (Fiske, 1993; Fiske & Neuberg, 1990). Attribute-based processing considers numerous different personal attributes, requires complex cognition, and is more accurate and detailed than the former (Fiske & Neuberg, 1990; Krolak-Schwerdt et al., 2013). Given the importance of teacher judgement in assessing student performance, accountability is an important driver of accuracy and the elimination of assessment bias.

For teachers to be accountable, they must have the autonomy to control the conditions that contribute to successful student learning. The degree of professional autonomy that teachers exercise in the performance of their duties influences the quality of teaching and student learning (Ingersoll & Collins, 2017). Teachers’ professional autonomy is positively associated with student autonomy and self-regulation as learners (Ahn et al., 2021; Soenens et al., 2012). However, recent top-down accountability requirements for school improvement in the United States have decreased the autonomy that teachers and school leaders exercise over major decisions, affecting the quality of instruction and school climate (Ryan & Deci, 2020; Ryan & Weinstein, 2009). Ingersoll and Collins (2017) found that schools with the most centralized decision-making and the least teacher autonomy often perform the poorest.

Successful school and teacher accountability depend on leadership. Leaders must establish a clear path to the stated goals, support execution through team interdependence, and measure performance (Jamal et al., 2015). This requires an open and caring climate that allows honest informational feedback about improving instructional practice as it encourages teachers to achieve high levels of performance (Freed et al., 2021; McCarley et al., 2016). Leaders must be able to identify students who need interventions and hold teachers accountable for appropriate actions to ensure learning.

In summary, we designed a Rigor Appraisal to measure five critical components of rigor: Creating Conditions for Learning Rigorous Standards, Using Standards-based Student Evidence, Activating Student Teams to Achieve the Standard, Verifying Learning to Take Action within a Lesson, and Using Data to Track Student Progress toward Standards. Accordingly, we expected to find that the Rigor Appraisal would be predictive of achievement and other measures of school effectiveness. More specifically, we expected that a positive change in the Rigor Appraisal scores over time would predict changes in achievement and other leading indicators of school effectiveness. Therefore, we formulated the following hypotheses:

H₁. The change in the Rigor Appraisal has a positive and statistically significant relationship with achievement and other measures of school effectiveness.

H_1a. The Rigor Appraisal will have a positive and statistically significant relationship with achievement.

H_1b. The Rigor Appraisal will have a positive and statistically significant relationship with other school effectiveness measures.

H_1c. The Rigor Appraisal will have a positive and statistically significant relationship with school wellness measures.

H_1d. Program effectiveness measures of the Rigor Appraisal implementation will have a positive and statistically significant relationship with achievement.

H₂. Specific Rigor Appraisal pillars will impact achievement differently.

The next section discusses the methods and operationalization of measures used in the study.

Methods

The study sample included 53 schools in a district in Illinois that conducted Rigor Appraisals during the school year and recorded student achievement scores. The district served a diverse student population of approximately 35,000, of which 46.4% of students were low-income, 39.1% were English Learners, and 18.6% had disabilities (SWD) (Illinois State Board of Education, 2023).

School-level Rigor Appraisals were adjudicated by external instructional coaches using a software tool that included a scoring rubric for items in each pillar using a 12-point Likert scale. To ensure rater reliability, the instructional coaches participated in calibration training. This 6-hour training included the content of the Rigor Appraisal instrument and the process of capturing evidence of rigorous instruction and academic teaming structures during classroom observations. Participants practiced scoring the Rigor Appraisal using illustrative videos of classroom instruction. Subsequently, the coaches participated in three calibration sessions at the schools. Upon completion, the coaches were certified, and a master scorer accompanied them on the first Rigor Appraisal data collection cycle to ensure accuracy.

The coaches completed three appraisals per school building during the school year. During each Rigor Appraisal, the coaches met onsite with the principal and school leadership teams. They jointly scored at least 10 randomly selected classrooms to ensure the accuracy of schoolwide trends. Afterwards, they were debriefed on their observations, identified baseline conditions, and determined the next steps for school improvement. They used the aggregated results to guide the school leadership team in identifying system-based root causes and prioritizing actions for change. Teachers remained anonymous, as scores were non-evaluative (not a part of teachers’ annual professional evaluation), and no teacher-identifying information was gathered.

School leadership teams also completed a training course to learn how to conduct nonrandom instructional classroom walks. Unlike the Rigor Appraisals, instructional walks were conducted by school leadership teams to help advance improvement work between Rigor Appraisals and to build internal capacity.

Measures

This section discusses the outcome measures for achievement, school proficiency, and learning rates. It also reviews the independent variables used in the models, including the Rigor Appraisal and pillar scores, school effectiveness measures (Weekly Attendance, Student Referrals, Student Suspension, and Teacher Retention), school wellness measures (5Essentials, Ambitious Instruction, Supportive Environment, Involved Families, Collaborative Teachers, and Effective Leaders), and program effectiveness measures of the Rigor Appraisal implementation (Instructional Walks and Coaching Days).

Achievement

We calculated two school-level achievement measures using spring-to-spring state-assessment data. First, we considered the change in student proficiency in English Language Arts (ELA) and math to assess the correlation levels with the independent variables. We then calculated school-level Learning Rates in ELA and math. These measures are described below.

We used the Illinois Assessment of Readiness (IAR), ELA, and math scores to gauge changes in student proficiency. This was calculated by taking the difference between the percentages of students who met or exceeded proficiency in IAR in ELA and math from spring to spring. Change in proficiency was used as a proxy for student achievement because it is a traditional measure that is easily available.

While the percentage of proficient students is a common measure of achievement, it only measures the number of students below, at, or above a certain scale score level. Proficiency rates do not exist outside the value systems of setting cut-off scores (Cizek, 2011; Jaeger, 1989; McClarty et al., 2013; Shepard, 1979). Interpretations of the percentage of proficient students are unpredictable and can lead to incorrect or incomplete inferences about distributional changes (Ho, 2008). We purposely moved away from proficiency because of the known statistical and substantive costs of making comparisons due to their limited and unrepresentative depictions of test score trends. Furthermore, when operating in lower-performing schools, the percent proficiency was less likely to have variability compared to investigating growth on scale scores across two points in time. Lower-performing students can grow but still fall below proficiency levels depending on how far they are from the cut-off score. Thus, we include two additional proxies of achievement in this study: school Learning Rates in ELA and Math, respectively.

Using H. S. Bloom et al. (2008) as a guideline, we calculated the effect sizes for each grade-to-grade pair (third to fourth, fourth to fifth, etc.) for each school in the district. We used this approach because H. S. Bloom et al. (2008) found that regardless of the assessment, student growth in achievement is greatest in elementary grades (K to first grade is about 1.5 standard deviations), and then declines throughout middle and high school (11th to 12th grades is about 0.05 standard deviations). If the number of students in each grade level is not accounted for, schools with a larger number of younger students will appear to have higher achievement. Consequently, learning rates account for the average growth and the number of students in each grade.

We used Hedges’g effect size formula for each grade-level pair because it is the most appropriate effect size when accounting for small samples (What Works Clearinghouse, 2022). This effect size metric represents the mean difference between two groups of observations at two points in time on a standard deviation scale. We calculated the effect sizes using all students in the district who were tested from spring to spring for ELA and math. We then aggregated the pairs to the school using weighted averages that accounted for the number of students within each grade to create an overall school Learning Rate for each subject area. Hedges’g formula is shown in equation (1).

Hedges’g Formula

g = \frac{ω (y_{i} - y_{c})}{\sqrt{\frac{(n_{i} - 1) s_{i}^{2} + (n_{c} - 1) s_{c}^{2}}{n_{i} + n_{c} - 2}}}

(1)

• $y_{i}$ is the adjusted (or unadjusted) mean post-intervention group

• $y_{c}$ is the adjusted (or unadjusted) mean pre-intervention group

• $n_{i}$ is the sample size of the post-intervention group

• $n_{c}$ is the sample size of the pre-intervention group

• $s_{i}$ is the unadjusted standard deviation for the post-intervention group

• $s_{c}$ is the unadjusted standard deviation for the pre-intervention group

•Omega (ω) is the small sample size correction

○ It is equal to $1 - \frac{3}{4 N - 9}$ where $N$ is the total sample size (or $n_{i} + n_{c}$ )

The IAR is not a vertically scaled instrument. Because there were no other available district assessment data, we tested the premise that Hedges’g could still be calculated using a non-vertically scaled assessment. If we had not tested this, effect sizes based on a non-vertically scaled assessment might have produced an erroneous ordering of schools and distorted the correlation coefficients. Briggs and Domingue (2013) found that non-vertical scaling impacted grade-to-grade comparisons only when there was a large degree of scale shrinkage, which restricted the range of gain scores. Furthermore, they found that correlation levels dropped from .87 to .57 when grade-to-grade standard deviations were −0.30 or less. Consequently, anything equal or less than −0.30 could have impacted the ordering of schools.

In our study, we assessed effect sizes of all grade level pairs prior to calculating the weighted school averages and found that only 3% (4 out of 139) had standard deviations less than or equal to −0.30. Consequently, with such a small impact, we concluded that this would not have a significant effect on the ordering of schools, and that the effect sizes should represent a more precise measure than proficiency gains. The next step in the investigation was to validate this assumption by examining the correlation between learning rates and student proficiency. Next, we considered the independent variables used in the investigation. The first set of indicators measured the Rigor Appraisal, followed by school effectiveness measures, school wellness (as measured by the Illinois State Board of Education), 5Essentials survey, and program effectiveness measures.

Rigor Appraisal

Each pillar contained three to four items, and the instrument used a 12-point Likert scale (1–3 = strongly disagree, 4–6 disagree, 7–9, agree, 10–12 = strongly agree). To calculate the overall score, we averaged the scores for each item within the pillar, and then averaged each of the five pillar scores to determine the overall mean score for the observation. Table 1 lists the alpha coefficients of each pillar over time. A Cronbach’s alpha of .6 and higher was considered a reliable measure, and all alpha coefficients were greater than that threshold (What Works Clearinghouse, 2022).

Table 1.

Cronbach’s Alpha of Rigor Appraisal Pillars.

Pillar	Fall	Interim	Spring
Creating conditions for learning	.734	.851	.794
Using standards-based student evidence	.911	.932	.870
Activating student teams	.915	.946	.854
Verifying learning	.869	.898	.840
Using data to track progress	.953	.956	.909

We then calculated the raw score changes from fall to spring. In addition to the raw score changes, we used OLS regression to calculate the Beta coefficient for each school. The Beta estimated the magnitude of change over the course of the three Rigor Appraisals and accounted for the number of days on which the Appraisals occurred during the school year. This method reduced the measurement error of the coders, as it smoothed out the trendline over time.

School Effectiveness

School effectiveness measures incorporated all the traditional leading indicators of achievement, including Weekly Attendance, Referrals, Suspensions, and Teacher Retention. Weekly Attendance was defined as the average percentage of students that attended the school. Referrals and Suspensions were the sum of the total number of behavioral referrals and suspensions for the year. Typically, we would have investigated the changes over time but because of the pandemic and virtual schooling, the rate of change was unavailable for these indicators. In addition, we used publicly available teacher retention dates and calculated the changes for each year.

School Wellness

Schools participate in the Illinois 5Essentials Survey (Illinois State Board of Education, n.d.) annually. The 5Essentials Survey identified five indicators that improve school outcomes: Effective Leaders, Collaborative Teachers, Involved Families, Supportive Environments, and Ambitious Instruction. We included the 5Essentials survey data in our analysis to assess whether the Rigor Appraisal and the state indicators of school wellness were correlated, as the survey has been shown to be predictive of school achievement (Hart et al., 2020). We measured the change over time for each of the 5Essentials in addition to the overall 5Essentials change score by averaging the indicators for each year.

Program Effectiveness

Program effectiveness included four measures: the total number of Instructional Classroom Walks conducted by the principal and school leadership team, the total number of Classroom Walks in ELA and in Math, and the number of third-party Coaching Days focused on the Rigor Appraisal data. We included these as measures in the study because we expected that increases in them would lead to observable changes in classroom instruction, which would impact achievement.

We used Pearson’s correlation coefficients to examine the validity of the Rigor Appraisal and to assess the magnitude and direction of the relationships between the dependent and independent variables. Correlation analyses were performed to assess the predictability, strength, and direction of the observational instruments. Cohen’s (1988) conventions were used to interpret effect sizes of .10, .50, and .80 standard deviations as small, moderate, and large, respectively. As H. S. Bloom et al. (2008) pointed out, these guidelines are not always relevant to intervention effects in education. However, they may be used when there is no better basis for estimating the magnitude of the impact across studies (Cohen, 1988).

In addition to correlation analyses, we used OLS regression to investigate the relationship between the Rigor Appraisal score, pillar scores, program effectiveness measures, and school demographic estimates to predict ELA and Math Learning Rates. School demographic measures included the percentages of SWD and low-income students. The OLS regression provided the best unbiased linear estimates of the coefficients and was a simple way to estimate the parameters of a linear relationship between variables at the school-level.

Results

Table 2 presents the sample’s descriptive statistics. The number of schools ranged from 48 to 53 depending on the measures examined. The change in proficiency varied from −25% to an increase of 14% proficiency. The ELA Learning Rate exhibited a drop of −0.85 standard deviations to an increase of 0.72. Rigor Appraisal change scores also fluctuated from −0.45 scale score point decrease to a 3.27 increase from fall to spring.

Table 2.

Descriptive Statistics.

	N	Min.	Max.	Mean	SD
Achievement
ELA proficiency	48	−0.25	0.14	0.00	0.08
Math proficiency	48	−0.25	0.14	0.01	0.07
ELA learning rate	48	−0.85	0.72	0.22	0.28
Math learning rate	48	−0.29	0.35	0.05	0.15
Rigor appraisal and pillars
Rigor appraisal	53	−0.10	30.27	0.98	0.66
Rigor appraisal beta	53	−0.79	1.00	0.83	0.32
Creating conditions for learning	53	−1.75	2.75	0.49	0.82
Using standards-based student evidence	53	−1.25	4.75	1.31	1.10
Activating student teams	53	−1.75	4.00	1.31	1.11
Verifying learning	53	−2.00	4.00	0.91	1.07
Using data to track progress	53	−1.00	4.25	0.88	0.99
School effectiveness
Weekly attendance	49	0.83	0.94	0.91	0.02
Referrals	49	45.00	7,387.00	580.92	1,098.64
Suspensions	49	0.00	390.00	28.92	69.28
Teacher retention	53	−9.00	5.00	0.28	2.75
School wellness
5Essentials (5E)	49	−20.00	15.00	−4.46	6.58
Ambitious instruction	49	−9.00	27.00	9.49	7.46
Supportive environment	49	−31.00	10.00	−5.31	10.04
Involved families	47	−25.00	3.00	−10.57	6.96
Collaborative teachers	47	−33.00	15.00	−6.96	9.92
Effective leaders	47	−29.00	7.00	−10.96	8.80
Program effectiveness
Instructional classroom walks	53	125.00	1,282.00	356.45	183.95
ELA classroom walk	53	44.00	483.00	118.70	69.93
Math classroom walk	53	38.00	568.00	99.98	78.66
Coaching days	49	4.50	40.00	11.34	9.53
Control variables
Percent students with disabilities	53	10.50	28.70	17.28	4.02
Percent low income students	53	20.80	95.20	58.67	19.77

Next, we investigated the validity of the ELA and Math Learning Rates that were calculated and compared with proficiency. Table 3 shows a moderate to large association and statistically significant relationships between ELA and Math Learning Rates and Proficiency. Furthermore, it should be noted that after removing the four schools that could possibly have impacted the ordering of schools (standard deviations of grade pairs that were −0.30 or less), the magnitude of the correlations remained unchanged. We also investigated correlations among students who took only spring-to-spring tests. The missing assessment data from the baseline year lowered the correlation levels with proficiency. Consequently, ELA and Math Learning Rates that included all tested students from both years were a more valid predictor of Proficiency in this instance.

Table 3.

Proficiency and Learning Rate Correlation Coefficients.

		ELA proficiency	Math proficiency	ELA learning rate	Math learning rate
ELA proficiency	R	1	.696**	.582**	.311*
	p-value		<.001	<.001	.032
	N	48	48	48	48
Math proficiency	R	.696**	1	.449**	.424**
	p-value	<.001		.001	.003
	N	48	48	48	48
ELA learning rate	R	.582**	.449**	1	.675**
	p-value	<.001	.001		<.001
	N	48	48	48	48
Math learning rate	R	.311*	.424**	.675**	1
	p-value	.032	.003	<.001
	N	48	48	48	48

Note. Correlation is significant at *p < .05 and **p < .01 levels.

Next, we investigated the first hypothesis that the Rigor Appraisal has a positive and statistically significant relationship with achievement and other effectiveness measures. We found partial support for H_1a as demonstrated in Table 4, which shows that the Rigor Appraisal had a small to moderate and statistically significant association with ELA Learning Rate and it approached statistical significance with math Learning Rate. Rigor Appraisal and ELA and Math Proficiency had weak relationships that were not statistically significant. Furthermore, the Rigor Appraisal Beta had a very small and negative relationship with achievement, and it was not statistically significant.

Table 4.

Rigor Appraisal, Beta, and Achievement Correlation Coefficients.

		ELA proficiency	Math proficiency	ELA learning rate	Math learning rate
Rigor appraisal	R	.225	.176	.388**	.267^†
	p-value	.123	.231	.006	.067
	N	48	48	48	48
Rigor appraisal beta	R	−.049	−.013	.009	−.083
	p-value	.74	.928	.954	.573
	N	48	48	48	48

Note. ^†Correlation is approaching statistical significance at the p < .10 level.

Correlation is significant at *p < .05 and **p < .01 levels.

Table 5 shows support for the assertions put forth in H_1b, that the Rigor Appraisal had a moderate and statistically significant association with three of the four program effectiveness measures. More specifically, it had a positive and statistically significant association with Weekly Attendance, meaning that as Rigor Appraisal scores increased so did attendance. Further, the Rigor Appraisal had a negative and statistically significant relationship with Referrals and Suspensions, meaning that as it increased, Referrals and Suspensions decreased. The Rigor Appraisal was not statistically significant with Teacher Retention. Interestingly, while the Rigor Appraisal Beta did not have a relationship with achievement, it had a moderate to large and statistically significant relationship with almost all school effectiveness measures except for Teacher Retention.

Table 5.

Rigor Appraisal, Beta, and School Effectiveness.

		Avg. weekly attend	Referrals	Suspensions	Teacher retention
Rigor appraisal	R	.259^†	−.388**	−.431**	.106
	p-value	.073	.006	.002	.451
	N	49	49	49	53
Rigor appraisal beta	R	.453**	−.730**	−.636**	.141
	p-value	.001	<.001	<.001	.314
	N	49	49	49	53

Note. ^†Correlation is approaching statistical significance at the p < .10 level.

Correlation is significant at *p < .05 and **p < .01 levels.

Table 6 reflects support for H_1c as it provides evidence for the relationship between the Rigor Appraisal and measures of school wellness. The 5Essentials had a moderate and statistically significant association with the Rigor Appraisal and all measures of achievement. In other words, as the 5Essentials increased year over year, so did the Rigor Appraisal and achievement. Four of the five 5Essentials categories had statistically significant associations with the Rigor Appraisal, except for Involved Families.

Table 6.

The 5Essentials, Rigor Appraisal, and Achievement.

		Rigor appraisal	ELA proficiency	Math proficiency	ELA learning rate	Math learning rate
5Essentials average change	r	.459**	.415**	.386**	.403**	.331*
	p-value	<.001	.003	.007	.005	.022
	N	51	48	48	48	48
Ambitious instruction	r	.361**	.328*	.067	.298*	.214
	p-value	.009	.023	.653	.040	.144
	N	51	48	48	48	48
Supportive environment	r	.270^†	.526**	.419**	.589**	.457**
	p-value	.055	<.001	.003	<.001	.001
	N	51	48	48	48	48
Involved families	r	.136	.153	.119	.157	.071
	p-value	.351	.309	.431	.298	.638
	N	49	46	46	46	46
Collaborative teachers	r	.425**	.179	.338*	.221	.199
	p-value	.002	.233	.022	.140	.185
	N	49	46	46	46	46
Effective leaders	r	.284*	.115	.254^†	.271^†	.237
	p-value	.048	.449	.088	.068	.113
	N	49	46	46	46	46

Note. ^†Correlation is approaching statistical significance at the p < .10 level.

Correlation is significant at *p < .05 and **p < .01 levels.

Finally, we found evidence to support H_1d as we examined the impact of program effectiveness and achievement. As shown in Table 7, Instructional Classroom Walks and ELA and Math Classroom Walks had small to moderate statistically significant associations with Math Proficiency and ELA and Math Learning Rates. In other words, the more school leadership teams conducted Instructional Classroom Walks, the stronger their association with achievement. Additionally, Coaching Days had a small-to-moderate statistically significant association with ELA and Math Learning Rates. Thus, the more coaching days a school had, the greater the achievement on the state assessment.

Table 7.

Program Effectiveness Measures and Achievement.

		ELA proficiency	Math proficiency	ELA learning rate	Math learning rate
Instructional classroom walks	R	.188	.277 ^†	.336*	.432**
	p-value	.201	.057	.019	.002
	N	48	48	48	48
ELA classrooms walk	R	.348*	.435**	.435**	.405**
	p-value	.015	.002	.002	.004
	N	48	48	48	48
Math classrooms walk	R	.207	.315*	.394**	.376**
	p-value	.158	.029	.006	.009
	N	48	48	48	48
Coaching days	R	.223	.114	.356*	.280^†
	p-value	.128	.439	.013	.054
	N	48	48	48	48

Note. ^†Correlation is approaching statistical significance at the p < .10 level.

Correlation is significant at *p < .05 and **p < .01 levels.

For the last hypothesis, H₂, we explored whether certain pillars of the Rigor Appraisal impacted achievement differently. To this end, we first investigated the Pearson’s correlation coefficients. Table 8 shows that two Rigor Appraisal pillars, Using Standards-based Student Evidence and Activating Student Teams, had a small to moderate and statistically significant association with achievement, with the latter having larger correlations in size.

Table 8.

Rigor Appraisal Pillars and Achievement.

		ELA proficiency	Math proficiency	ELA learning rate	Math learning rate
Creating conditions for learning	R	.187	.157	.253^†	.105
	p-value	.204	.287	.083	.478
	N	48	48	48	48
Using standards-based student evidence	R	.239	.287*	.276^†	.295*
	p-value	.102	.048	.058	.042
	N	48	48	48	48
Activating student teams	R	.439**	.468**	.589**	.433**
	p-value	.002	<.001	<.001	.002
	N	48	48	48	48
Verifying learning	R	.048	−.022	.151	.135
	p-value	.748	.882	.306	.362
	N	48	48	48	48
Using data to track progress	R	−.220	−.347*	−.094	−.182
	p-value	.132	.016	.523	.217
	N	48	48	48	48

Note. ^†Correlation is approaching statistical significance at the p < .10 level.

Correlation is significant at *p < .05 and **p < .01 levels.

Next, we used OLS regression to predict achievement. Three different models are depicted in Table 9. First, we wanted to investigate whether the Rigor Appraisal predicted achievement, controlling for Low Income and SWD. Second, we assessed which pillar was the strongest predictor of achievement by controlling for school characteristics. Third, we created an interaction effect between the Activating Student Teams and Low Income groups, as it has been hypothesized that Academic Teams may improve student learning in low-income schools (Toth & Sousa, 2019).

Table 9.

The Effects of Rigor Appraisal and ELA Achievement.

ELA learning rate (N = 48)	β	SE	p-Value	β	SE	p-Value	β	SE	p-Value
Coaching days	.277*	0.004	.036	.221	0.004	.113	.146	0.004	.258
ELA classroom walks	.327*	0.001	.015	.328*	0.001	.018	.210^†	0.000	.096
Rigor appraisal	.283*	0.064	.031	—	—	—	—	—	—
SWD	.050	0.009	.698	.039	0.009	.752	.065	0.008	.583
Low income	.105	0.002	.427	.101	0.002	.446	—	—	—
Creating conditions for learning	—	—	—	.103	0.050	.436	—	—	—
Using standards-based student evidence	—	—	—	−.075	0.042	.616	—	—	—
Activating student teams	—	—	—	.486**	0.045	.003	—	—	—
Verifying learning	—	—	—	−.166	0.040	.235	—	—	—
Using data to track progress	—	—	—	.173	0.039	.216	—	—	—
Activating teams × Low income	—	—	—	—	—	—	.501**	0.001	<.001
Adjusted R²			.305			.378			.406

Note. SE = standard error.

†

Approaching statistical significance at the p < .10 level.

Statistically significant at *p < .05 and **p < .01 levels.

In the first model, we found that regardless of school demographic measures, ELA Classroom Walks was the largest predictor in the model with the Adjusted R² explaining 30% of the variance in the ELA Learning Rate. The Beta value of 3.27 indicates that a one standard deviation increase in the Rigor Appraisal results in a 3.27 standard deviation increase in the ELA Learning Rate. The second model removes the full Rigor Appraisal measure and includes only the pillars. Interestingly, all program effectiveness measures except ELA Classroom Walks dropped out of statistical significance, leaving Activating Student Teams as the largest predictor of the ELA Learning Rate. The third model included the interaction variable, and we found that when Activating Student Teams increases and Low Income is higher, there was a greater impact on achievement.

We investigated the same models using the Math Learning Rate as outcome variable. We found that Low Income was statistically significant in the first model, whereas Activating Student Teams was not. However, the interaction variable was also statistically significant, and the Adjusted R² values increased by 6% from model 1 to model 3 (.25 to .31). Furthermore, Math Classroom Walks approached statistical significance in all three models, demonstrating the importance of instructional leaders in observing teachers’ classrooms.

Discussion

In summary, we found that the Rigor Appraisal had a small to moderate and statistically significant association with achievement, a positive and moderate statistically significant association with Weekly Attendance, a negative relationship with Referrals and Suspensions, and a positive and moderate statistically significant association with the 5Essentials. In addition, the activation of student teams was associated with greater achievement in schools with more low-income students. We also found that Instructional Classroom Walks had a positive and statistically significant impact on achievement, regardless of the subject area.

Study Limitations and Future Research

The context in which we conducted the study should be noted as it occurred when there was a national decline in student achievement during the pandemic. However, we did not observe the same kind of regression in our analysis as we did nationally. The Rigor Appraisal process may have helped slow the decline in achievement that occurred elsewhere across the nation because of its use as part of “measurement for improvement” processes that are essential to system transformation (Takahashi et. al., 2020). In this sense, the Rigor Appraisal provided critical data that were motivating factors for collaborative improvement efforts. These data were closely connected to key processes, timely, and easy to analyze regularly. The data collection and analysis processes were also transparent and built stakeholder trust. Regardless of the exact cause, the instrument’s predictability to achievement appears strong in this context.

Despite this strength, this study has several limitations. First, there were disruptions to in-person instructions, which limited the time during which observations could occur. There were also unanticipated changes in the personnel assigned to teaching teams, which resulted in fewer classrooms being consistently observed throughout the study period. Moreover, there was a large amount of missing data. Approximately 35% to 48% of the assessment scores were missing at the baseline year for grade level pairs district-wide. In the calculation of Learning Rates, we retained all the tested students in the distribution to reflect the true population of students more accurately, as we did not want to inadvertently create testing bias by removing students that were not tested at baseline.

We investigated this further and used several models to assess whether the students with missing baseline scores were statistically different. We found that those who did not take the baseline assessment were more likely to score 2.31 scale score points lower on the prior year’s ELA assessment and 4.9 scale score points lower in math, controlling for student characteristics and accounting for the clustering of students within schools. These findings are statistically significant. Additionally, Black, Hispanic, and low-income students were significantly more likely to have a missing baseline assessment, and gifted students were significantly more likely to have a score. Further, the findings remained true across subject areas. Consequently, our calculations did not account for missing student data for the baseline year, and changes in proficiency did not account for those missing test scores.

Other limitations include the fact that the results may not be generalizable, as they are based on one school district. Furthermore, the methods used in this study should not be used to make causal inferences. The OLS models attempted to go beyond correlational analysis, and the Rigor Appraisal measures were scored prior to the administration of the state assessment; much more investigation is warranted to ultimately infer causality. Other confounding variables may have also affected the results.

Future studies should include replicating these findings in different contexts. In this study, the pillar of Activating Student Teams had the largest impact on achievement, but would this always hold true or only in certain circumstances? If the other pillars do not directly impact achievement, in what context do they matter? Moreover, the statistically significant negative correlation between the pillars when Using Data to Track Progress was in the unexpected direction. Further investigations with larger sample sizes are warranted to determine why this association was negative. Another area for future work is a more in-depth assessment of interrater reliability among the third-party-certified coaches. It proved costly to have additional third-party-certified coaches travel to ascertain interrater reliability. Future research should attempt to account for other ways of measuring this, such as video-recording walks for coder verification.

Finally, further exploration should try to pinpoint why Instructional Classroom Walks impacted student achievement. Qualitative interviews with leaders and teachers may be useful in determining how the Walks aided the change process. While there are scoring rubrics associated with the Walks, it was outside the scope of this study to isolate the impact of Walk scores. Future studies may be the key to highlighting how Walks help develop school leaders and build the internal capacity of schools.

Evidence-Based Policy Implications

This study has potential policy implications for both research and practice. First, it demonstrates how the Rigor Appraisal can be an important source of data as it provides a more comprehensive view of the conditions and practices that affect the quality and equity of education that students receive. The study shows that school leaders can use the Rigor Appraisal as a leading indicator of the quality of teaching and student learning, as low scores on the pillars provide actionable areas for improvement without resorting to frequent interruptions of instruction for testing. As such, these data could aid in continuous improvement rather than being based on accountability and compliance (Mandinach & Schildkamp, 2021). Accordingly, we offer the following recommendations.

Random school-wide trend data should be systematically collected by external coaches as they have proven to be a leading indicator of achievement. Replicating this process with a strict focus on classroom instruction could serve as a key area for continuous data-driven improvements. Using a certified coach was fundamental to creating an informative collaborative environment in which leaders could learn collaboratively.

Districts and school leaders should implement non-evaluative Instructional Classroom Walks to provide feedback on instruction on a regular basis.

Professional development to foster student academic teams may help increase achievement, particularly in lower-performing schools.

Overall, the Rigor Appraisal incorporates the essential conditions in schools that support a positive climate for learning, the structures that support the alignment of curricula and instruction with the intent and rigor of academic standards, student mastery of those standards, and the quality of learning tasks to engage all students in learning at higher levels of cognition. Coupled with Instructional Walks, Rigor Appraisal shows promise as a more efficient, consistent, and cost-effective source of data to inform educational improvements and policy.

Conclusion

In summary, our findings indicate that data from the Rigor Appraisal may provide a sound basis for practitioners’ decision making to improve the quality of instruction in their schools. It appears to provide both broad and focused measures with the potential to inform educational improvement and policy development. It shows promise as a valuable tool to aid schools in recovering from the effects of the pandemic, improving educational equity, and reducing academic performance disparities among students from low-income backgrounds. It also has the potential to decrease reliance on interim or benchmark assessments of student progress and reduce interruptions in instructional time for testing. Using it as a source of leading indicators may help educators make better-informed, proactive, and real-time changes to instruction that accelerate learning, improve the quality of teaching, and improve academic outcomes for all students.

Footnotes

Acknowledgements

None.

Ethical Approval

Not Applicable.

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors disclose that the research in this study was conducted under the auspices of Instructional Empowerment’s Applied Research Center, a division of Instructional Empowerment, LLC. The Rigor Appraisal instrument was created by the Applied Research Center. Instructional Empowerment’s consultancy uses the Rigor Appraisal in school districts to measure systems change.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Merewyn Elizabeth Lyons

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Ahn

Chiu

M. M.

Patrick

(2021). Connecting teacher and student motivation: Student-perceived teacher need-supportive practices and student need satisfaction. Contemporary Educational Psychology, 64, 101950. https://doi.org/10.1016/j.cedpsych.2021.101950

Akman

(2021). The relationship between school climate and students’ aggressive behaviors. International Journal of Progressive Education, 17(2), 430–448. https://doi.org/10.29329/ijpe.2021.332.26

Aldridge

Fraser

B. J.

Fozdar

Ala’i

Earnest

Afari

(2018). Students’ perceptions of school climate as determinants of wellbeing, resilience and identity. Improving Schools, 19(1), 5–26. https://doi.org/10.1177/1365480215612616

Alibali

M. W.

Nathan

M. J.

Church

R. B.

Wolfgram

M. S.

Kim

Knuth

E. J.

(2013). Teachers’ gestures and speech in mathematics lessons: Forging common ground by resolving trouble spots. ZDM, 45(3), 425–440. https://doi.org/10.1007/s11858-012-0476-0

Anees

(2017). Analysis of assessment levels of students’ learning according to cognitive domain of Bloom’s taxonomy (Online Submission ED586762). https://eric.ed.gov/?q=Analysis+of+Assessment+Levels+of+Students%e2%80%99+Learning+according+to+Cognitive+Domain+of+Bloom%e2%80%99s+Taxonomy&id=ED586762

Argon

Ekinci

(2016). Teachers’ views on organizational deviance, psychological ownership and social innovation. Universal Journal of Educational Research, 4(12A), 133–139. https://doi.org/10.13189/ujer.2016.041317

Azevedo

J. P.

Hasan

Goldemberg

Iqbal

S. A.

Geven

(2020). Simulating the potential impacts of COVID-19 school closures on schooling and learning outcomes: A set of global estimates. The World Bank. https://thedocs.worldbank.org/en/doc/798061592482682799-0090022020/original/covidandeducationJune17r6.pdf

Basileo

L. D.

(2018). How a great city school district is improving student achievement, increasing equity and closing achievement gaps. Instructional Empowerment. https://vb8e8a.p3cdn1.secureserver.net/wp-content/uploads/2023/03/IE01-121-Research-Report-Des-Moines-03-23.pdf

Becker

D. R.

Miao

Duncan

McClelland

M. M.

(2014). Behavioral self-regulation and executive function both predict visuomotor skills and early academic achievement. Early Childhood Research Quarterly, 29(4), 411–424. http://doi.org/10.1016/j.ecresq.2014.04.014

10.

Benbenishty

Astor

R. A.

Roziner

Wrabel

S. L.

(2016). Testing the causal links between school climate, school violence, and school academic performance: A cross-lagged panel autoregressive model. Educational Researcher, 45(3), 197–206. https://doi.org/10.3102/0013189X16644603

11.

Berkowitz

Moore

Astor

R. A.

Benbenishty

(2017). A research synthesis of the associations between socioeconomic background, inequality, school climate, and academic achievement. Review of Educational Research, 87(2), 425–469. https://doi.org/10.3102/0034654316669821

12.

Betthäuser

B. A.

Bach-Mortensen

A. M.

Engzell

(2023). A systematic review and meta-analysis of the evidence on learning during the COVID-19 pandemic. Nature Human Behaviour, 7(3), 375–385. https://doi.org/10.1038/s41562-022-01506-4

13.

Black

Wiliam

(1998). Inside the Black Box: Raising standards through classroom assessment. King’s College London School of Education.

14.

Black

Wiliam

(2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21(1), 5–31. https://doi.org/10.1007/s11092-008-9068-5

15.

Blitz

L. V.

Yull

Clauhs

(2020). Bringing sanctuary to school: Assessing school climate as a foundation for culturally responsive trauma-informed approaches for urban schools. Urban Education, 55(1), 95–124. https://doi.org/10.1177/0042085916651323

16.

Bloom

B. S.

(1971). Mastery learning. In Block

J. H.

(Ed.), Mastery learning: Theory and practice (pp. 47–63). Holt, Rinehart & Winston.

17.

Bloom

H. S.

Hill

Black

A. R.

Lipsey

M. W.

(2008). Performance trajectories and performance gaps as achievement effect-size benchmarks for educational interventions. MDRC. https://www.mdrc.org/sites/default/files/full_473.pdf

18.

Bonne

(2016). New Zealand students’ mathematics-related beliefs and attitudes: Recent evidence. New Zealand Journal of Educational Studies, 51(1), 69–82. https://doi.org/10.1007/s40841-016-0035-2

19.

Boston

M. D.

Henrick

E. C.

Gibbons

L. K.

Berebitsky

Colby

G. T.

(2017). Investigating how to support principals as instructional leaders in mathematics. Journal of Research on Leadership Education, 12(3), 183–214. https://doi.org/10.1177/1942775116640254

20.

Braund

DeLuca

(2018). Elementary students as active agents in their learning: An empirical study of the connections between assessment practices and student metacognition. Australian Educational Researcher, 45(1), 65–85. https://doi.org/10.1007/s13384-018-0265-z

21.

Briggs

D. C.

Domingue

(2013). The gains from vertical scaling. Journal of Educational and Behavioral Statistics, 38(6), 551–576. https://doi.org/10.3102/1076998613508317

22.

Brooks

Carroll

Gillies

R. M.

Hattie

(2019). A matrix of feedback. Australian Journal of Teacher Education, 44(4), 14–32. https://doi.org/10.14221/ajte.2018v44n4.2

23.

Brown

G. T. L.

Harris

L. R.

(2014). The future of self-assessment in classroom practice: Reframing self-assessment as a core competency. Frontline Learning Research, 3, 22–30. https://doi.org/10.14786/flr.v2i1.24

24.

Center on Reinventing Public Education. (2022a). Student achievement gaps and the pandemic: A new review of evidence from 2021–2022. Center on Reinventing Public Education. https://crpe.org/student-achievement-gaps-and-the-pandemic-a-new-review-of-evidence-from-2021-2022/

25.

Center on Reinventing Public Education. (2022b). The state of the American student: Fall 2022. Center on Reinventing Public Education. https://crpe.org/the-state-of-the-american-student/

26.

Cizek

G. J.

(2011). Setting performance standards: Foundations, methods, and innovations (2nd ed.). Routledge.

27.

Cleary

T. J.

Velardi

Schnaidman

(2017). Effects of the Self-Regulation Empowerment Program, (SREP) on middle school students’ strategic skills, self-efficacy, and mathematics achievement. Journal of School Psychology, 64, 28–42. https://doi.org/10.1016/j.jsp.2017.04.004

28.

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Erlbaum.

29.

Conner

(2015). Relationships and authentic collaboration: Perceptions of a building leadership team. Leadership and research in education: The Journal of the Ohio Council of Professors of Educational Administration (OCPEA), 2(1), 12–24. https://files.eric.ed.gov/fulltext/EJ1088557.pdf

30.

Coyne

M. D.

Simmons

D. C.

Hagan-Burke

Simmons

L. E.

Kwok

Kim

Fogarty

Oslund

E. L.

Taylor

A. B.

Capozzoli-Oldham

Ware

Little

M. E.

Rawlinson

D. M.

(2013). Adjusting beginning reading intervention based on student performance: An experimental evaluation. Exceptional Children, 80(1), 25–44. https://doi.org/10.1177/001440291308000101

31.

Crooks

T. J.

(1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58(4), 438–481. https://doi.org/10.3102/00346543058004438

32.

Darling-Hammond

Cook-Harvey

C. M.

(2018). Educating the whole child: Improving school climate to support student success. Learning Policy Institute. https://doi.org/10.54300/145.655

33.

Davies

(2001). Involving students in communicating about their learning. NASSP Bulletin, 85(621), 47–52. https://doi.org/10.1177/019263650108562106

34.

Davis

J. R.

Warner

(2018). Schools matter: The positive relationship between New York City high schools’ student academic progress and school climate. Urban Education, 53(8), 959–980. https://doi.org/10.1177/0042085915613544

35.

Day

S. L.

Connor

C. M.

(2017). Examining the relations between self-regulation and achievement in third-grade students. Assessment for Effective Intervention, 42(2), 97–109. https://doi.org/10.1177/1534508416670367

36.

Day

S. L.

Connor

C. M.

McClelland

M. M.

(2015). Children’s behavioral regulation and literacy: The impact of the first grade classroom environment. Journal of School Psychology, 53(5), 409–428. https://doi.org/10.1016/j.jsp.2015.07.004

37.

DeFlorio

Klein

Starkey

Swank

P. R.

Taylor

H. B.

Halliday

S. E.

Beliakoff

Mulcahy

(2019). A study of the developing relations between self-regulation and mathematical knowledge in the context of an early math intervention. Early Childhood Research Quarterly, 46, 33–48. https://doi.org/10.1016/j.ecresq.2018.06.008

38.

De Smedt

Van Keer

(2018). Fostering writing in upper primary grades: A study into the distinct and combined impact of explicit instruction and peer assistance. Reading and Writing, 31(2), 325–354. https://psycnet.apa.org/doi/10.1007/s11145-017-9787-4.https://doi.org/10.1007/s11145-017-9787-4

39.

Dessie

A. A.

Sewagegn

A. A.

(2019). Moving beyond a sign of judgment: Primary school teachers’ perception and practice of feedback. International Journal of Instruction, 12(2), 51–66. https://doi.org/10.29333/iji.2019.1224a

40.

Dodman

S. L.

DeMulder

E. K.

View

J. L.

Stribling

S. M.

Brusseau

(2023). “I knew it was a problem before, but did I really?”: Engaging teachers in data use for equity. Journal of Educational Change, 24(4), 995–1023. https://doi.org/10.1007/s10833-022-09477-z

41.

Edwards

A. R.

Beattie

R. L.

(2016). Promoting student learning and productive persistence in developmental mathematics: Research frameworks informing the Carnegie Pathways. NADE Digest, 9(1), 30–39. http://files.eric.ed.gov/fulltext/EJ1097458.pdf

42.

Elmore

R. F.

(2008). Improving the instructional core. Achieve the Core. https://achievethecore.org/content/upload/Improving%20The%20Instructional%20Core_Elmore%20Article.pdf.

43.

Ewing

Gresham

G. J.

Dickey

(2019). Pre-service teachers learning to engage all students, including English language learners, in productive struggle. Issues in the Undergraduate Mathematics Preparation of School Teachers, 2, 1–11. http://files.eric.ed.gov/fulltext/EJ1206251.pdf

44.

Fahle

Kane

T. J.

Patterson

Reardon

S. F.

Staiger

D. O.

(2022). Local achievement impacts of the pandemic. https://educationrecoveryscorecard.org/wp-content/uploads/2022/10/Education-Recovery-Scorecard_Key-Findings_102822.pdf

45.

Fiske

S. T.

(1993). Social cognition and social perception. Annual Review of Psychology, 44, 155–194. https://psycnet.apa.org/doi/10.1146/annurev.ps.44.020193.001103. https://doi.org/10.1146/annurev.ps.44.020193.001103

46.

Fiske

S. T.

Neuberg

S. L.

(1990). A continuum of impression formation, from category-based to individuating processes: Influences of information and motivation on attention and interpretation. Advances in Experimental Social Psychology, 23, 1–74. https://doi.org/10.1016/S0065-2601(08)60317-2

47.

Fluckiger

Vigil

Y. T.

Pasco

Danielson

(2010). Formative feedback: Involving students as partners in assessment to enhance learning. College Teaching, 58(4), 136–140. https://doi.org/10.1080/87567555.2010.484031

48.

Foley

Mishook

Thompson

Kubiak

Supovitz

Rhude-Faust

M. K.

(2008). Beyond the test scores: Leading indicators for Education. Annenberg Institute for School Reform at Brown University. http://files.eric.ed.gov/fulltext/ED533117.pdf

49.

Francisco

J. M.

(2013). Learning in collaborative settings: Students building on each other’s ideas to promote their mathematical understanding. Educational Studies in Mathematics, 82(3), 417–438. https://doi.org/10.1007/s10649-012-9437-3

50.

Freed

Sims

Tagaris

Hornberger

Safer

(2021). International school principals’ insights and experiences with teacher motivation. International Journal of Educational Leadership Preparation, 16(1), 60–73. http://files.eric.ed.gov/fulltext/EJ1313056.pdf

51.

Glaser

Keßler

Palm

Brunstein

J. C.

(2010). Improving fourth graders’ self-regulated writing skills: Specialized and shared effects of process-oriented and outcome-related self-regulation procedures on students’ task performance, strategy use, and self-evaluation. Zeitschrift für Padagogische Psychologie, 24(3−4), 177–190. https://doi.org/10.1024/1010-0652/a000015

52.

Graham

(2022). Explaining the racial school climate gap: Evidence from Georgia. AERA Open, 8(1), 1–19. https://doi.org/10.1177/23328584221131529

53.

Green

Allen

(2015). Professional development in urban schools: What do teachers say? Journal of Inquiry & Action in Education, 6(2), 53–79. http://files.eric.ed.gov/fulltext/EJ1133585.pdf

54.

Griffiths

Davies

(1993). Learning to learn: Action research from an equal opportunities perspective in a junior school. British Educational Research Journal, 19(1), 43–58. https://doi.org/10.1080/0141192930190104

55.

Grissom

J. A.

Egalite

A. J.

Lindsay

C. A.

(2021). How principals affect students and schools: A systematic synthesis of two decades of research. The Wallace Foundation. http://www.wallacefoundation.org/principalsynthesis

56.

Grissom

J. A.

Loeb

Master

(2013). Effective instructional time use for school leaders: Longitudinal evidence from observations of principals. Educational Researcher, 42(8), 433–444. https://doi.org/10.3102/0013189X13510020

57.

Guo

Lau

K. L.

Wei

(2019). Teacher feedback and students’ self-regulated learning in mathematics: A comparison between a high achieving and a low-achieving secondary school. Studies in Educational Evaluation, 63, 48–58. https://doi.org/10.1016/j.stueduc.2019.07.001

58.

Guskey

T. R.

(1996). Reporting on student learning: Lessons from the past—Prescriptions for the future. In Guskey

T. R.

(Ed.), Communicating student learning: 1996 yearbook of the Association for Supervision and Curriculum Development (pp. 13–24). Association of Supervision and Curriculum and Development.

59.

Guskey

T. R.

(1997). Implementing mastery learning (2nd ed.). Wadsworth Publishing.

60.

Guskey

T. R.

(2001). Helping standards make the grade. Educational Leadership, 59(1), 20–27. https://www.ascd.org/el/articles/helping-standards-make-the-grade

61.

Guskey

T. R.

(2007). Closing achievement gaps: Revisiting Benjamin S. Bloom’s “learning for mastery.” Journal of Advanced Academics, 19(1), 8–31. https://doi.org/10.4219/jaa-2007-704

62.

Hamilton

L. S.

Stecher

B. M.

Yuan

(2009). Standards-based reform in the United States: History, research, and future directions. https://www.rand.org/pubs/reprints/RP1384.html

63.

Hammerstein

König

Dreisörner

Frey

(2021). Effects of COVID-19-related school closures on student achievement—A systematic review. Frontiers in Psychology, 12, 746289. https://doi.org/10.3389/fpsyg.2021.746289

64.

Han

(2020). On the relationship between teacher autonomy and learner autonomy. International Education Studies (International ed.), 13(6), 153–162. https://doi.org/10.5539/ies.v13n6p153

65.

Hanushek

E. A.

(2022). The economic cost of the pandemic: State by state. Hoover Education Success Initiative. http://hanushek.stanford.edu/sites/default/files/publications/Hanushek%202022%20HESI%20EconomicCost.pdf

66.

Harrison

Ohara

McNamara

(2015). Re-thinking assessment: Self- and peer-assessment as drivers of self-direction in learning. Eurasian Journal of Educational Research, 15(60), 75–88. http://files.eric.ed.gov/fulltext/EJ1076698.pdf.https://doi.org/10.14689/ejer.2015.60.5

67.

Hart

Young

Chen

Zou

Allensworth

E. M.

(2020). Supporting school improvement: Early findings from reexamination of the 5Essentials survey. University of Chicago Consortium on School Research. https://consortium.uchicago.edu/sites/default/files/2020-08/Supporting%20School%20Improvement%205Essentials%20Survey%20ES-Aug2020-Consortium.pdf

68.

Hattie

(2023). Visible learning, the sequel: A synthesis of over 2,100 meta-analyses relating to achievement. Routledge.

69.

Hattie

Gan

Brooks

(2017). Instruction based on feedback. In Mayer

R. E.

Alexander

P. A.

(Eds.), Handbook of research on learning and instruction (2nd ed., pp. 290–324). Routledge.

70.

Hattie

Timperley

(2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487

71.

Hattie

Yates

G. C. R.

(2014). Visible learning and the science of how we learn. Routledge.

72.

Hiebert

Grouws

D. A.

(2007). The effects of classroom mathematics teaching on students’ learning. In Lester

F. K.

(Ed.), Second handbook of research on mathematics teaching and learning (pp. 371–404). Information Age.

73.

Hinnant-Crawford

B. N.

Faison

M. Z.

Chang

(2016). Culture as mediator. Co-regulation, self-regulation, and middle school mathematics achievement. Journal for Multicultural Education, 10(3), 274–293. https://doi.org/10.1108/JME-05-2016-0032

74.

Hirsh

Å.

Segolsson

(2019). Enabling teacher-driven school-development and collaborative learning: An activity theory-based study of leadership as an overarching practice. Educational Management Administration and Leadership, 47(3), 400–420. https://doi.org/10.1177/1741143217739363

75.

A. D.

(2008). The problem with “proficiency”: Limitations of statistics and policy under No Child Left Behind. Educational Researcher, 37(6), 351–360. https://doi.org/10.3102/0013189X08323842

76.

Illinois State Board of Education. (2023). Illinois report card. https://www.illinoisreportcard.com/

77.

Illinois State Board of Education. (n.d.). 5Essentials survey. https://www.isbe.net/Pages/5Essentials-Survey.aspx

78.

Ingersoll

R. M.

Collins

G. J.

(2017). Accountability and control in American schools. Journal of Curriculum Studies, 49(1), 75–95. https://doi.org/10.1080/00220272.2016.1205142

79.

Isaacs

(2021). The problem with data-driven decision making in education. Journal of Educational Thought, 54(1), 77–98.

80.

Ishimaru

A. M.

Galloway

M. K.

(2014). Beyond individual effectiveness: Conceptualizing organizational leadership for equity. Leadership and Policy in Schools, 13(1), 93–146. https://doi.org/10.1080/15700763.2014.890733

81.

Jaeger

R. M.

(1989). Certification of student competence. In Linn

(Ed.). Educational measurement (3rd ed., pp. 485–514). American Council on Education.

82.

Jamal

Tilchin

Essawi

(2015). A teacher accountability model for overcoming self-exclusion of pupils. International Education Studies (International ed.), 8(9), 58–64. https://www.ccsenet.org/journal/index.php/ies/article/view/46288. https://doi.org/10.5539/ies.v8n9p58

83.

Kane

Doty

Patterson

Staiger

(2022). What do changes in state test scores imply for later life outcomes?Center for Education Policy Research, Harvard University. https://cepr.harvard.edu/files/cepr/files/long_term_outcomes.pdf?m=1666806941

84.

Karademir

C. A.

Akgul

(2019). Students’ social studies-oriented academic risk-taking behaviours and autonomous learning skills. Cypriot Journal of Educational Sciences, 14(1), 56–68. https://files.eric.ed.gov/fulltext/EJ1211747.pdf. https://doi.org/10.18844/cjes.v14i1.4038

85.

Katzenbach

J. R.

Smith

D. K.

(1993). The discipline of teams. Harvard Business Review, 71(2), 111–120.

86.

Kearney

Gebert

Voelpel

S. C.

(2009). When and how diversity benefits teams: The importance of team members’ need for cognition. Academy of Management Journal, 52(3), 581–598. https://doi.org/10.5465/amj.2009.41331431

87.

Kingston

Nash

(2011). Formative Assessment: A meta-analysis and a call for research. Educational Measurement: Issues and Practice, 30(4), 28–37. https://doi.org/10.1111/j.1745-3992.2011.00220.x

88.

Klute

Apthorp

Harlacher

Reale

(2017). Formative assessment and elementary school student academic achievement: A review of the evidence. Central Regional Educational Laboratory (REL. 2017, 259). U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance. http://ies.ed.gov/ncee/edlabs

89.

Konold

Cornell

Jia

Malone

(2018). School climate, student engagement, and academic achievement: A latent variable, multilevel multi-informant examination. AERA Open, 4(4), 1–17. https://doi.org/10.1177/2332858418815661

90.

Kosko

K. W.

(2015). Geometry students’ self-determination and their engagement in mathematical whole class discussion. Investigations in Mathematics Learning, 8(2), 17–36. https://doi.org/10.1080/24727466.2015.11790349

91.

Kraft

M. A.

Falken

G. T.

(2020). Why school climate matters for teachers and students. State Education Standard, 20(2), 33–35. https://nasbe.nyc.3.digitaloceanspaces.com/2020/05/Kraft-Falken_May-2020-Standard.pdf

92.

Krolak-Schwerdt

Böhmer

Gräsel

(2013). The impact of accountability on teachers’ assessments of student performance: A social cognitive analysis. Social Psychology of Education, 16(2), 215–239. https://doi.org/10.1007/s11218-013-9215-9

93.

Kuhfeld

Soland

Lewis

(2022). Test score patterns across three COVID-19-impacted school years (Ed Working Paper: 22–521). Annenberg Institute at Brown University. https://doi.org/10.26300/ga82-6v47

94.

LeClerc

Moreau

A. C.

(2011). Communautés d’apprentissage professionnelles dans huit écoles inclusives de l’ontario. Education et Francophonie, 39(2), 189–206. https://doi.org/10.7202/1007734ar

95.

Lenes

McClelland

M. M.

ten Braak

Idsøe

Størksen

(2020). Direct and indirect pathways from children’s early self-regulation to academic achievement in fifth grade in Norway. Early Childhood Research Quarterly, 53, 612–624. https://doi.org/10.1016/j.ecresq.2020.07.005

96.

Lewis

Kuhfeld

(2022). Progress toward pandemic recovery: Continued signs of rebounding achievement at the start of the 2022–23 school year. Center for School and Student Progress. https://www.nwea.org/research/publication/progress-towards-pandemic-recovery-continued-signs-of-rebounding-achievement-at-the-start-of-the-2022-23-school-year/

97.

Lewis

Kuhfeld

Langi

Peters

Fahle

(2022). The widening achievement divide during COVID-19. Center for School and Student Progress. https://www.nwea.org/research/publication/the-widening-achievement-divide-during-covid-19/

98.

Livy

Muir

Sullivan

(2018). Challenging tasks lead to productive struggle. Australian Primary Mathematics Classroom, 23(1), 19–24. https://eprints.utas.edu.au/27642/2/125774%20-%20Challenging%20tasks%20lead%20to%20productive%20struggle.pdf

99.

Looney

J. W.

(2011). Integrating formative and summative assessment: Progress toward a seamless system? (OECD Education Working Papers, No. 58). OECD Publishing. https://doi.org/10.1787/5kghx3kbl734-en

100.

Lotan

(2003). Group-worthy tasks. Educational Leadership, 60(6), 72–75. https://www.ascd.org/el/articles/group-worthy-tasks

101.

Mandinach

E. B.

Schildkamp

(2021). Misconceptions about data-based decision making in education: An exploration of the literature. Studies in Educational Evaluation, 69, 100842. https://doi.org/10.1016/j.stueduc.2020.100842

102.

Marantika

J. E. R.

(2021). Metacognitive ability and autonomous learning strategy in improving learning outcomes. Journal of Education and Learning, 15(1), 88–96. https://doi.org/10.11591/edulearn.v15i1.17392

103.

Marsh

J. A.

Pane

J. F.

Hamilton

L. S.

(2006). Making sense of data-driven decision making in education: Evidence from recent RAND research. https://doi.org/10.7249/OP170

104.

Marshik

Ashton

P. T.

Algina

(2017). Teachers’ and students’ needs for autonomy, competence, and relatedness as predictors of students’ achievement. Social Psychology of Education, 20(1), 39–67. https://doi.org/10.1007/s11218-016-9360-z

105.

May

J. J.

Sanders

E. T. W.

(2013). Beyond standardized test scores: An examination of leadership and climate as leading indicators of future success in the transformation of turnaround schools. Journal of Urban Learning, Teaching, and Research, 9, 42–54. http://files.eric.ed.gov/fulltext/EJ1028857.pdf

106.

Maye

(2013). Hitting the mark: Strategic planning for academic rigor. Delta Kappa Gamma Bulletin, 79(4), 29–36.

107.

McCarley

T. A.

Peters

M. L.

Decman

J. M.

(2016). Transformational leadership related to school climate: A multi-level analysis. Educational Management Administration and Leadership, 44(2), 322–342. https://doi.org/10.1177/1741143214549966

108.

McClarty

K. L.

Way

W. D.

Porter

A. C.

Beimers

J. N.

Miles

J. A.

(2013). Evidence-based standard setting: Establishing a validity framework for cut scores. Educational Researcher, 42(2), 78–88. https://doi.org/10.3102/0013189X12470855

109.

Michaelsen

L. K.

Sweet

(2008). The essential elements of team-based learning. New Directions for Teaching and Learning, 116, 7–27. https://doi.org/10.1002/tl.330

110.

Munns

Woodward

(2006). Student engagement and student self-assessment: The REAL framework. Assessment in Education: Principles, Policy and Practice, 13(2), 193–213. https://doi.org/10.1080/09695940600703969

111.

Muñoz

M. A.

Branham

K. E.

(2016). Professional learning communities focusing on results and data-use to improve learning: The right implementation matters. Planning and Changing, 47(1/2), 37–46. https://education.illinoisstate.edu/planning/articles/vol47.php

112.

Murdoch

English

A. R.

Hintz

Tyson

(2020). Feeling heard: Inclusive education, transformative learning, and productive struggle. Educational Theory, 70(5), 653–679. https://doi.org/10.1111/edth.12449

113.

Musso

M. F.

Boekaerts

Segers

Cascallar

E. C.

(2019). Individual differences in basic cognitive processes and self-regulated learning: Their interaction effects on math performance. Learning and Individual Differences, 71, 58–70. https://doi.org/10.1016/j.lindif.2019.03.003

114.

National Academies of Sciences, Engineering, and Medicine. (2019). Monitoring educational equity. The National Academies Press. https://doi.org/10.17226/25389

115.

National Assessment of Educational Progress. (2022a). NAEP long-term trend assessment results: Reading and mathematics. https://www.nationsreportcard.gov/highlights/ltt/2022/

116.

National Assessment of Educational Progress. (2022b). NAEP report card. Reading. https://www.nationsreportcard.gov/reading/nation/groups/?grade=4

117.

National Assessment of Educational Progress. (2022c). NAEP report card. Reading. https://www.nationsreportcard.gov/reading/nation/groups/?grade=8

118.

National Assessment of Educational Progress. (2022d). NAEP report card: Math. https://www.nationsreportcard.gov/mathematics/nation/groups/?grade=4

119.

National Assessment of Educational Progress. (2022e). NAEP report card: Math. https://www.nationsreportcard.gov/mathematics/nation/groups/?grade=8

120.

Newmann

F. M.

Bryk

A. S.

Nagaoka

J. K.

(2001). Authentic intellectual work and standardized tests: Conflict or coexistence?Consortium on Chicago School Reform. https://consortium.uchicago.edu/sites/default/files/2018-10/p0a02.pdf

121.

Newmann

F. M.

Wehlage

G. G.

Lamborn

S. D.

(1992). The significance and sources of student achievement. In Newmann

F. M.

(Ed.), Student engagement and achievement in American secondary schools. Teachers College Press.

122.

O’Brien

(2018). Self-determination for primary school children: Theory and practice. Reach Journal of Special Needs Education in Ireland, 31(2), 155–168. https://www.reachjournal.ie/index.php/reach/article/view/21

123.

Olina

Sullivan

H. J.

(2002). Effects of classroom evaluation strategies on student achievement and attitudes. Educational Technology Research and Development, 50(3), 61–75. https://doi.org/10.1007/BF02505025

124.

Öngel

Tabancalı

(2022). Teacher enthusiasm and collaborative school climate. Education Quarterly Reviews, 5(2), 347–356. https://doi.org/10.31014/aior.1993.05.02.494

125.

Osher

Neiman

Williamson

(2020). School climate and measurement. State Education Standard, 20(2), 23–27. https://nasbe.nyc.3.digitaloceanspaces.com/2020/05/Osher-Neiman-Williamson_May-2020-Standard.pdf

126.

Panadero

Andrade

Brookhart

(2018). Fusing self-regulated learning and formative assessment: A roadmap of where we are, how we got here, and where we are going. Australian Educational Researcher, 45(1), 13–31. https://doi.org/10.1007/s13384-018-0258-y

127.

Panadero

Brown

G. T. L.

Strijbos

(2016). The future of student self-assessment: A review of known unknowns and potential directions. Educational Psychology Review, 28(4), 803–830. https://doi.org/10.1007/s10648-015-9350-2

128.

Park

Lee

I. H.

Cooc

(2018). The role of school-level mechanisms: How principal support, professional learning communities, collective responsibility, and group-level teacher expectations affect student achievement. Educational Administration Quarterly, 55(5), 742-780. https://doi.org/10.1177/0013161X18821355

129.

Pennant

(2018). Group-worthy tasks and their potential to support children to develop independent problem-solving skills. University of Cambridge NRICH. https://nrich.maths.org/9935

130.

Pierce

J. L.

Kostova

Dirks

K. T.

(2003). The state of psychological ownership: Integrating and extending a century of research. Review of General Psychology, 7(1), 84–107. https://psycnet.apa.org/doi/10.1037/1089-2680.7.1.84. https://doi.org/10.1037/1089-2680.7.1.84

131.

Pieschl

(2009). Metacognitive calibration—An extended conceptualization and potential applications. Metacognition and Learning, 4(1), 3–31. https://doi.org/10.1007/s11409-008-9030-4

132.

Ratts

R. F.

Pate

J. L.

Archibald

J. G.

Andrews

S. P.

Ballard

C. C.

Lowney

K. S.

(2015). The influence of professional learning communities on student achievement in elementary schools. Journal of Education & Social Policy, 2(4), 57–61. https://www.jespnet.com/journals/Vol_2_No_4_October_2015/5.pdf

133.

Reeve

(2009). Why teachers adopt a controlling motivating style toward students and how they can become more autonomy supportive. Educational Psychologist, 44(3), 159–175. https://doi.org/10.1080/00461520903028990

134.

Reeve

Cheon

S. H.

(2021). Autonomy-supportive teaching: Its malleability, benefits, and potential to improve educational practice. Educational Psychologist, 56(1), 54–77. https://doi.org/10.1080/00461520.2020.1862657

135.

Reeve

Cheon

S. H.

T. H.

(2020). An autonomy-supportive intervention to develop students’ resilience by boosting agentic engagement. International Journal of Behavioral Development, 44(4), 325–338. https://doi.org/10.1177/0165025420911103

136.

Reeve

Halusic

(2009). How K-12 teachers can put self-determination theory principles into practice. Theory and Research in Education, 7(2), 145–154. https://doi.org/10.1177/1477878509104319

137.

Reeve

Jang

(2006). What teachers say and do to support students’ autonomy during a learning activity. Journal of Educational Psychology, 98(1), 209–218. https://psycnet.apa.org/doi/10.1037/0022-0663.98.1.209. https://doi.org/10.1037/0022-0663.98.1.209

138.

Reeve

(2014). Teacher motivation. In Gagné

(Ed.), The Oxford handbook of work engagement, motivation, and self-determination theory (pp. 349–362). Oxford University Press.

139.

Roberge

M. E.

Van Dick

(2010). Recognizing the benefits of diversity: When and how does diversity increase group performance? Human Resource Management Review, 20(4), 295–308. https://doi.org/10.1016/j.hrmr.2009.09.002

140.

Ronfeldt

Farmer

S. O.

McQueen

Grissom

J. A.

(2015). Teacher collaboration in instructional teams and student achievement. American Educational Research Journal, 52(3), 475–514. https://doi.org/10.3102/0002831215585562

141.

Ross

J. A.

(2006). The reliability, validity, and utility of self-assessment. Practical Assessment, Research and Evaluation, 11(10), 1–13. http://pareonline.net/getvn.asp?v=11&n=10

142.

Ryan

R. M.

Deci

E. L.

(2017). Self-determination theory: Basic psychological needs in motivation, development, and wellness. Guilford Press.

143.

Ryan

R. M.

Deci

E. L.

(2020). Intrinsic and extrinsic motivation from a self-determination theory perspective: Definitions, theory, practices, and future directions. Contemporary Educational Psychology, 61, 101860. https://doi.org/10.1016/j.cedpsych.2020.101860

144.

Ryan

R. M.

Weinstein

(2009). Undermining quality teaching and learning: A self-determination theory perspective on high-stakes testing. Theory and Research in Education, 7(2), 224–233. https://doi.org/10.1177/1477878509104327

145.

Scardamalia

Bereiter

(2022). Knowledge building and knowledge creation. In Sawyer

R. K.

(Ed.), The Cambridge handbook of the learning sciences (3rd ed., pp. 385−405). Cambridge University Press. https://doi.org/10.1017/9781108888295

146.

Schaap

De Bruijn

(2018). Elements affecting the development of professional learning communities in schools. Learning Environments Research, 21(1), 109–134. https://doi.org/10.1007/s10984-017-9244-y

147.

Schunk

D. H.

(1996). Goal and self-evaluative influences during children’s cognitive skill learning. American Educational Research Journal, 33(2), 359–382. https://doi.org/10.3102/00028312033002359

148.

Seglem

(2017). Creating a circle of learning: Teachers taking ownership through professional communities (Vol. 16, No. 4, May 2009). Voices from the Middle, 25(1), 56–60. https://library.ncte.org/journals/vm/issues/v16-4/7156. https://doi.org/10.58680/vm201729279

149.

Shepard

L. A.

(1979). Setting standards. In Bunda

M. A.

Sanders

J. R.

(Eds.), Practices and problems in competency-based measurement (pp. 72–88). National Council on Measurement on Education.

150.

Shin

Johnson

Z. D.

(2021). From student-to-student confirmation to students’ self-determination: An integrated peer-centered model of self-determination theory in the classroom. Communication Education, 70(4), 365–383. https://doi.org/10.1080/03634523.2021.1912372

151.

Shute

V. J.

(2007). Focus on formative feedback (Research report). Educational Testing Service. https://files.eric.ed.gov/fulltext/EJ1111586.pdf

152.

Sinha

Kapur

(2021). When problem solving followed by instruction works: Evidence for productive failure. Review of Educational Research, 91(5), 761–798. https://doi.org/10.3102/00346543211019105

153.

Skibbe

L. E.

Montroy

J. J.

Bowles

R. P.

Morrison

F. J.

(2019). Self-regulation and the development of literacy and language achievement from preschool through second grade. Early Childhood Research Quarterly, 46, 240–251. https://doi.org/10.1016/j.ecresq.2018.02.005

154.

Soenens

Sierens

Vansteenkiste

Dochy

Goossens

(2012). Psychologically controlling teaching: Examining outcomes, antecedents, and mediators. Journal of Educational Psychology, 104(1), 108–120. https://doi.org/10.1037/a0025742

155.

Stahl

G. K.

Maznevski

M. L.

Voigt

Jonsen

(2010). Unraveling the effects of cultural diversity in teams: A meta-analysis of research on multicultural work groups. Journal of International Business Studies, 41(4), 690–709. https://doi.org/10.1057/s41267-020-00389-9

156.

Stefanou

C. R.

Perencevich

K. C.

DiCintio

Turner

J. C.

(2004). Supporting autonomy in the classroom: Ways teachers encourage student decision making and ownership. Educational Psychologist, 39(2), 97–110. https://doi.org/10.1207/s15326985ep3902_2

157.

Stein

R. E.

Colyer

C. J.

Manning

(2016). Student accountability in team-based learning classes. Teaching Sociology, 44(1), 28–38. https://doi.org/10.1177/0092055X15603429

158.

Stiggins

R. J.

(2008). Student-involved assessment for learning. Merrill Prentice Hall.

159.

Stipek

(1996). Motivation and instruction. In Berliner

Calfee

(Eds.), Handbook of educational psychology. Macmillan.

160.

Suldo

S. M.

Riley

K. N.

Shaffer

E. J.

(2006). Academic correlates of children and adolescents’ life satisfaction. School Psychology International, 27(5), 567–582. https://doi.org/10.1177/0143034306073411

161.

Sun

Loeb

Grissom

J. A.

(2017). Building teacher teams: Evidence of positive spillovers from more effective colleagues. Educational Evaluation and Policy Analysis, 39(1), 104–125. https://doi.org/10.3102/0162373716665698

162.

Sun

Zhang

Scardamalia

(2010). Developing deep understanding and literacy while addressing a gender-based literacy gap. La Revue Canadienne de l’Apprentissage et de la Technologie [Canadian Journal of Learning and Technology], 36(1), 1–20. https://doi.org/10.21432/T20P4D

163.

Supovitz

J. A.

Foley

Mishook

(2012). In search of leading indicators in education. Education Policy Analysis Archives, 20(19), 1–27. http://doi.org/10.14507/epaa.v20n19.2012

164.

Swanson

McCulley

L. V.

Osman

D. J.

Scammacca Lewis

N. S.

Solis

(2019). The effect of team-based learning on content knowledge: A meta-analysis. Active Learning in Higher Education, 20(1), 39–50. https://doi.org/10.1177/1469787417731201

165.

Takahashi

Norman

Jackson

Ing

Chinen

(2020). Measurement for improvement in education. In Faircloth

S. C.

(Ed.), Education. Oxford University Press. https://doi.org/10.1093/obo/9780199756810-0247

166.

Tichnor-Wagner

Harrison

Cohen-Vogel

(2016). Cultures of learning in effective high schools. Educational Administration Quarterly, 52(4), 602–642. https://doi.org/10.1177/0013161X16644957

167.

Toth

Sousa

(2019). The power of student teams: Achieving the social, emotional, and cognitive learning in every classroom through academic teams. Learning Sciences International.

168.

Valentine

Bolyard

(2018). Creating a classroom culture that supports productive struggle: Pre-service teachers’ reflections on teaching mathematics [Conference session]. Annual Conference of the American Educational Research Association, New York, NY, United States.

169.

Valls

Kyriakides

(2013). The power of interactive groups: How diversity of adults volunteering in classroom groups can promote inclusion and success for children of vulnerable minority ethnic populations. Cambridge Journal of Education, 43(1), 17–33. https://doi.org/10.1080/0305764X.2012.749213

170.

Van den Bergh

Ros

Beijaard

(2014). Improving teacher feedback during active learning: Effects of a professional development program. American Educational Research Journal, 51(4), 772–809. https://doi.org/10.3102/0002831214531322

171.

VanLone

Freeman

LaSalle

Gordon

Polk

Rocha Neves

(2019). A practical guide to improving school climate in high schools. Intervention in School and Clinic, 55(1), 39–45. https://doi.org/10.1177/1053451219832988

172.

Vollmeyer

Rheinberg

(2005). A surprising effect of feedback on learning. Learning and Instruction, 15(6), 589–602. https://doi.org/10.1016/j.learninstruc.2005.08.001

173.

Warshauer

H. K.

(2015). Strategies to support productive struggle. Mathematics Teaching in the Middle School, 20(7), 390–393. https://doi.org/10.5951/mathteacmiddscho.20.7.0390

174.

Wehmeyer

M. L.

(1997). Self-determination as an educational outcome. A definitional framework and implications for intervention. Journal of Developmental and Physical Disabilities, 9(3), 175–209. https://doi.org/10.1023/A:1024981820074

175.

Wehmeyer

M. L.

Sands

D. J.

Knowlton

H. E.

Kozleski

E. B.

(2002). Teaching students with mental retardation: Providing access to the general curriculum. Paul H. Brookes.

176.

Wehmeyer

M. L.

Shogren

K. A.

Toste

J. R.

Mahal

(2017). Self-determined learning to motivate struggling learners in reading and writing. Intervention in School and Clinic, 52(5), 295–303. https://doi.org/10.1177/1053451216676800

177.

What Works Clearinghouse. (2022). What Works Clearinghouse procedures and standards handbook version 5.0. United States Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance (NCEE). https://ies.ed.gov/ncee/wwc/Handbooks

178.

Wiliam

(2006). Formative assessment: Getting the focus right. Educational Assessment, 11(3–4), 283–289. https://doi.org/10.1080/10627197.2006.9652993

179.

Wiliam

Lee

Harrison

Black

(2004). Teachers developing assessment for learning: Impact on student achievement. Assessment in Education: Principles, Policy and Practice, 11(1), 49–65. https://doi.org/10.1080/0969594042000208994

180.

Williams

K. E.

White

S. L. J.

MacDonald

(2016). Early mathematics achievement of boys and girls: Do differences in early self-regulation pathways explain later achievement? Learning and Individual Differences, 51, 199–209. https://doi.org/10.1016/j.lindif.2016.09.006

181.

Zeybek

(2016). Productive struggle in a geometry class. International Journal of Research in Education and Science, 2(2), 396–415. https://files.eric.ed.gov/fulltext/EJ1110272.pdf.https://doi.org/10.21890/ijres.86961

182.

Ziebell

Clarke

(2018). Curriculum alignment: Performance types in the intended, enacted, and assessed curriculum in primary mathematics and science classrooms. Studia Paedagogica, 23(2), 175–203. https://doi.org/10.5817/SP2018-2-10

183.

Zimmerman

B. J.

Kitsantas

(2014). Comparing students’ self-discipline and self-regulation measures and their prediction of academic achievement. Contemporary Educational Psychology, 39(2), 145–155. https://doi.org/10.1016/j.cedpsych.2014.03.004

Leading Indicators of Academic Achievement: Investigating the Predictive Validity of an Observation Instrument in a Large District

Abstract

Plain language summary

Keywords

Introduction

Theoretical and Empirical Foundations

Creating Conditions for Learning Rigorous Standards

School and Classroom Climate

Student Self-Regulation

Collaboration in Teacher Teams

Using Standards-Based Student Evidence

Target, Task, and Taxonomy Alignment

Tasks Designed for Teams

Student Work Produces Evidence of Mastery

Activating Student Teams to Achieve the Standard

Academic Teams

Developing Resilience Through Productive Struggle

Building Student Autonomy Through Teaming

Verifying Learning to Take Action Within a Lesson

Teacher Monitoring and Instructional Adjustments

Actionable Feedback

Students Verify Their Learning

Using Data to Track Student Progress Toward Standards

Short-, Mid-, and Long-Cycle Data to Improve Student Learning

Teacher Accountability

Methods

Measures

Achievement

Hedges’g Formula

Rigor Appraisal

School Effectiveness

School Wellness

Program Effectiveness

Results

Discussion

Study Limitations and Future Research

Evidence-Based Policy Implications

Conclusion

Footnotes

Acknowledgements

Ethical Approval

Declaration of Conflicting Interests

Funding

ORCID iD

Data Availability Statement

References