Sage Journals: Discover world-class research

Abstract

The importance of improving teachers’ use of culturally responsive practice (CRP) in the classroom setting has been widely recognized. Although quantitative data on teachers’ use of CRP has potential to be a helpful decision-making tool in advancing that goal, little is known about the psychometrics of classroom-based CRP measures, their utility in evaluating the impact of interventions designed to improve teacher CRP, or their use to inform teacher professional development in CRP. The current study reports findings from a systematic review of the research on the quantitative measurement of CRP using the 2020 Preferred Reporting Items for Systematic reviews and Meta-Analyses standards to document how CRP is operationalized and measured in prekindergarten–12th-grade classrooms in the United States (U.S.). Searching across six databases, 27 measures were identified for inclusion. The vast majority of measures were teacher self-report surveys, and relatively few were student-report or external observer assessments. We examined the availability of classroom-based observational and survey instruments and critically analyzed each measure through an argument-based approach to validation. We concluded that although some CRP measures hold promise, the validity of their interpretation and use is not adequately supported by evidence, with some exceptions. This lack of empirical evidence is exacerbated by the limitations of single-informant measurement of CRP. More multi-informant assessment approaches are needed.

Keywords

culturally responsive teaching equity multicultural measurement

With increasingly diverse student enrollments in United States (U.S.) public schools, there is growing recognition of the importance of improving teachers’ use of culturally responsive practice (CRP; Gay, 2018). CRP is an asset-based approach intended to support students’ academic and social-emotional well-being at school by incorporating students’ personal strengths, cultural knowledge, lived experiences, and home communication styles into daily classroom interactions (Gay, 2018). Importantly, CRP goes beyond curricular adaptions. CRP theory recognizes that teachers can enrich students’ classroom experience through their instructional content (what they teach) and practices (how they teach) to make learning encounters more engaging and, in turn, increase students’ access to learning and opportunity for academic achievement. In this article, we use the term CRP to encompass three closely related terms: culturally responsive teaching (Gay, 2018), culturally relevant pedagogy (Ladson-Billings, 2021), and culturally sustaining pedagogy (Paris, 2012).

The important nuances between the foundational theories and subsequent iterations have been discussed extensively by the authors themselves (see Gay, 2018, 2023; Ladson-Billings, 2014, 2021; Paris & Alim, 2014). For example, critical consciousness was conceptually present in Ladson-Billings’s culturally relevant pedagogy, but not necessarily in Gay’s culturally responsive teaching. However, our choice of CRP in the current article is not intended to indicate a preference for Ladson-Billings’s or Gay’s or Paris’s definitions but, rather, to reflect of the nomenclature commonly recognized among educators in the field and to be inclusive of both instructional and relational aspects of classroom practices. The term CRP is intended to capture the constellation of critical, culturally-focused educational ideologies that came both before and after the identification of CRP in the academic literature nearly three decades ago (Gay, 2000; Ladson-Billings, 1995). Since then, there have been renewed calls for CRP as a promising approach for addressing academic and discipline disparities stemming from cultural and racial bias in schools (e.g., Holcomb-McCoy, 2021).

A persistent challenge for the field is the definitional and conceptual issues surrounding CRP. There is a consensus that CRP represents a paradigm shift—rejecting the notion that when students’ cultural identities do not conform to the dominant school culture (i.e., White, middle-class norms; Carter, 2005) it is a barrier to their academic success. Instead, CRP embraces the perspective that students’ cultural differences are strengths to be leveraged for the advancement of student learning. However, variations among the conceptualizations of CRP are many, particularly since the amplified calls for equitable, racially just education over the last several years (Ladson-Billings, 2021; National Education Association, 2021).

The inclusion of CRP as a possible vehicle for racial justice in education reflects the field’s evolving understanding of CRP in both concept and practice. Historically, conversations about how to provide high-quality education to racially minoritized students have circumvented direct references to race, racism, and White supremacy in the U.S. education system (NEA Center for Social Justice, 2020). Instead, classroom racial-ethnic diversity often has been addressed using a harmful color-evasive orientation (i.e., belief that race does not matter and should not be considered in interactions or decision making; Apfelbaum et al., 2012; Hazelbaker & Mistry, 2021) or a celebratory but often superficial emphasis on multiculturalism (i.e., the recognition and appreciation of distinct ethnic and cultural groups, similar to the Contributions Approach in multicultural education; see Banks, 1993). However, these approaches, which may be more comfortable for White teachers to adopt than speaking about race explicitly (e.g., Luther, 2009), have not been shown to improve student well-being and can even perpetuate racism and discrimination in the classroom setting (Au, 2017; Ngo, 2010; Plaut et al., 2018). For example, when considering multicultural education, it is not that the original theory was designed to discuss racial, ethnic, and cultural differences at a perfunctory level but that the approach was often misrepresented when adopted by school practitioners and omitted the crucially important critical perspective (Banks, 1993; Nieto, 2017). This superficial, or whitewashed, lens is a common pitfall in the implementation of classroom-based frameworks designed to support racially minoritized students (see Compton-Lilly et al., 2021; Sandoval et al., 2016), and CRP is not immune to this problem (Ladson-Billings & Dixson, 2021; Sleeter, 2012).

Ladson-Billings (2021) recently acknowledged the tension between encouraging the adoption of CRP as a focus of school improvement initiatives and ensuring that the theoretical foundations of CRP do not get overlooked. Efforts have been made to restate the critical race theory perspective that underlies CRP (e.g., Dixson, 2021) and point to CRP as a viable tool to promote racial justice in education (e.g., Curenton et al., 2022; Holcomb-McCoy, 2021). However, there are significant feasibility concerns that temper educators and researchers in making the direct connection between CRP and racial justice. For example, when working with school districts to introduce schoolwide initiatives or classroom-based interventions that address racism directly, there is a risk of resistance to discussions regarding race and equity, particularly in a polarized sociopolitical climate (Parkhouse et al., 2019). Fortunately, since hitting a racial justice tipping point in 2020, the field has demonstrated a greater willingness to grapple with racism in the education system (e.g., Metro, 2020), which has also resulted in more scholars weighing in about the critical components of CRP. However, educators and researchers’ increased interest in CRP has been met with pushback among some conservative groups regarding “critical race theory”–related topics, leading some researchers to adjust the language they use to maintain their working relationships with school partners (Kaplan & Owings, 2021).

Operationalizing Culturally Responsive Practice

Related to these broader definitional challenges, the field of CRP measurement has been constrained by a “jangle” fallacy (i.e., the use of many different labels for the same or overlapping, similar constructs; Kelley, 1927), whereby terms such as multicultural education, culturally responsive teaching, culturally relevant pedagogy, and culturally sustaining pedagogy have evolved over time through expanded interpretations while also often continuing to be used interchangeably (Ladson-Billings, 2021). Moreover, the following approaches could all arguably fall under the CRP umbrella: remedying home-school dissonance by improving the alignment between students’ home and school cultural ways-of-knowing and learning (Arunkumar et al., 1999; Tyler et al., 2018); enhancing teachers’ understanding of students’ cultural histories and experience as “funds of identity” (Esteban-Guitart & Moll, 2014); exploring teaching and learning as cultural practice (Gutiérrez & Rogoff, 2003; Nasir et al., 2006); and employing critical race theory and critical literacy (Aronson et al., 2020). The fluid and expansive terminology used in relation to CRP constructs, though vital for developing the theory and key concepts, has impeded their operationalization in the form of quantitative measures.

Resolving differences among conceptual and terminological CRP approaches is foundational to creating robust measurement tools. For example, CRP may be comprised of both generic and culturally specific classroom interactions, making CRP-specific measures difficult for educational leadership to identify and select (Jensen, Grajeda, et al., 2018). Some educators draw connections between CRP and differentiated instructional approaches (e.g., Universal Design for Learning; Kieran & Anderson, 2019). This is likely to stem in part from conflation of cultural differences (i.e., group-membership-related identity differences) and individual learning differences, which could result in the inaccurate measurement of CRP (Dixson, 2021). Furthermore, CRP has also been considered by some scholars and practitioners to be “just good teaching,” which could be interpreted in various ways. Ladson-Billings (1995) posited that CRP is arguably an approach for all students—yet is not actualized in practice for racially minoritized youth because U.S. mainstream education is largely responsive to hegemonic whiteness (Tevis et al., 2022). Alternatively, in a different context, reducing CRP to “just good teaching” could reflect a tendency to rationalize color-evasive perspectives and racial redirects (i.e., minimize the importance of CRP by implying it is an instructional coincidence; Liggett et al., 2017), underestimating the need to measure CRP more directly.

Morrison and colleagues (2008) sought to operationalize CRP in prekindergarten (PK) through 12th-grade education through an international review of the literature by analyzing 45 classroom-based studies. Their review identified 12 descriptors of what CRP “looks like” in a classroom. Yet none of the 45 studies reviewed included a definition that encompassed all 12 indicators and, most importantly, the quantitative measurement of each study’s framework (i.e., units of observation; DeCarlo, 2018) was not discussed. More recently, Aronson and Laughter (2016) presented a helpful comparison of Gloria Ladson-Billings’ culturally relevant pedagogy (1995, 2009, 2021) and Geneva Gay’s culturally responsive practices (2000, 2018) and subsumed both under the label culturally relevant education, inspired by Dover’s (2013) teaching for social justice conceptual map. A necessary next step to move the field forward is to build upon the existing literature syntheses through comparative and critical analysis of CRP measurement constructs.

Utility of Quantitative CRP Measures

Quantitative CRP measures are needed for formative (e.g., coaching, professional development) and research (e.g., intervention effectiveness) purposes. When used toward formative goals, such measurement tools can inform teacher professional development and growth guided by relevant and timely data. Indeed, school districts are showing increased interest in promoting CRP at the classroom and school level (Dixson, 2021), and quantitative tools could help advance these districts’ initiatives and goals. Regarding utility for research, tools that validly and reliably measure CRP could help to empirically test the theory of change that improving teachers’ mindsets and beliefs will influence their practices in the classroom, which, in turn, will increase students’ academic achievement. In addition, our own interest in this study was motivated in part by the difficulty we had finding adequate quantitative CRP measures to detect effects of a CRP-focused teacher intervention that we were developing. This demonstrates the need for CRP-focused outcome measures for use in intervention effectiveness–focused studies if we are to learn what interventions or intervention elements are effective at improving teachers’ CRP. Nonetheless, there are potential pitfalls of quantifying the rich, dynamic CRP scholarship, including reductionism and essentializing minoritized groups. For example, instructional materials used to prepare preservice teachers for working with PK–12 students from backgrounds different than their own have, in the past, essentialized group-specific cultural norms and customs in ways that oversimplified the complexity of culturally responsive and sustaining practices (see Gorski, 2008). For this reason, we were in search of a measure that did not focus on a sole demographic group but, instead, captured teachers’ responsiveness and critical consciousness of students’ multiple, complex intersectional identities.

We recognize that the extent and complexity of racialized inequality cannot be quantified given the legacy of enslavement, Jim Crow, and anti-Black brutality that is deeply rooted in our nation’s history and origin story (Gillborn et al., 2018). As such, it is important to note that quantitative measures can only provide a partial picture and that mixed methods, incorporating both qualitative and quantitative approaches in complementary ways, are necessary. In sum, there is utility in quantitative measurement of culturally responsive classroom practices for the purposes of accountability and to inform teacher feedback and coaching to guide practice changes. Quantitative measures are also critical to help us determine whether intervention approaches are effective at improving educators’ CRP. However, in research contexts, these measures should be utilized as companions to qualitative data collection.

Knowledge Gaps and Barriers to Implementation of CRP Measurement

Differences in the operational definition of CRP, as well as a lack of consensus on the terminology used to describe this multifaceted practice, have likely led to the limited body of research on valid and reliable measures of CRP in PK–12 classrooms. Of the available CRP measures, it is believed that the majority are teacher self-report, meaning that teachers rate their own culturally responsive attitudes, values, beliefs, and/or proficiency in working with students in a culturally responsive manner (Larson & Bradshaw, 2017; National Council for Accreditation of Teacher Education [NCATE], 2008). These self-ratings pose various validity threats such as the introduction of social desirability bias, as raters tend to provide overly positive self-assessments (Grimm, 2010), and implicit racial biases (Gilliam et al., 2016; Worrell, 2022), of which educators may be unaware but nonetheless may impact their teaching. Efforts to guard against these biases by embedding social desirability scales in self-report surveys, or working to address implicit racial biases with raters, respectively, have had variable success (Perinelli & Gremigni, 2016; Vitriol & Moskowitz, 2021). Alternate methods of evaluation, such as student-report or observational measures of CRP, would allow researchers and administrators to assess the effectiveness of professional development efforts to increase classroom teachers’ cultural responsivity without the interference of biases.

Despite school districts’ growing interest in promoting CRP at the classroom and school level (Dixson, 2021), a challenge has been how to advise school partners and researchers about the best CRP measures to utilize. Before making practice and policy recommendations, it is essential to have a complete understanding of the current CRP measurement landscape. As such, to improve the accurate measurement of CRP and understand the quality of existing CRP measures, a systematic review of the literature is warranted. Specifically, a systematic review can document the availability and evidence of validity among classroom CRP measures, while also determining the core elements of CRP through data extraction and inductive analysis. Although there have been several broad reviews of the qualitative CRP literature (e.g., Aronson & Laughter, 2016), and reviews that were specific to content areas such as mathematics (e.g., Thomas & Berry, 2019) and English language arts (e.g., Wetzel et al., 2019), these reviews did not focus specifically on quantitative measures that could be leveraged for intervention research or be used as tools to inform changes in teacher practice. CRP also has evolved definitionally over time and will continue to do so (e.g., as it is subject to external political pressure). Therefore, a systematic review of the literature on CRP operationalization in quantitative measurement is timely and may also help build consensus regarding its definition.

Overview of the Current Study

The purpose of the current study was to identify existing CRP measures and provide information to help researchers and practitioners select specific measures that are most appropriate for achieving their measurement goals (e.g., evaluate intervention effectiveness, assess CRP among in-service and preservice teachers). We conducted a systematic review of the literature to examine the state of the science as it relates to the measurement of CRP. We intentionally searched for measures that could be used across different U.S. student populations, in lieu of group-specific measures. Our focus on measures with nationwide dissemination potential aligned with our aspirations of discouraging the predominately White U.S. teacher workforce from developing reductionist, stereotypical beliefs about cultural group membership, rather than continuously developing an understanding and proficiency in CRP as a complex practice. We sought to address a specific gap in the literature with regards to consequential validity, that is, the cumulative validity evidence in support of the proposed score interpretations and uses (American Educational Research Association [AERA] et al., 2014; Messick, 1998). The four stages of establishing a validity argument (Kane, 2006), also known as an interpretation and use argument (Haertel, 2018; Kane, 2013), informed the development of two guiding research aims.

Our first aim was to summarize the state of the science of CRP measurement, as it relates to the reliability and validity of existing instruments. Specifically, we report on evidence supporting the first two stages of interpretation and use arguments: (1) scoring and (2) generalization, including reports on classical test theory metrics (e.g., reliability, content-, convergent-, discriminant-validity) and evidence of measurement invariance. Measurement invariance (MI) testing, which helps to establish whether the measure works the same in different samples or across settings, is particularly important in CRP measurement research (relative to generally effective teaching) because CRP is highly sensitive to context, and measures of CRP in one setting may not translate well to others. MI testing can provide critical information other researchers need to know to make decisions about the valid use of a given CRP measure in their sample and setting of interest. In Aim 2, we shifted our focus to the latter stages of interpretation and use argumentation: (3) extrapolation and (4) use or interpretation. We characterized the construct of CRP as it is conceptualized and operationalized in classrooms. It is our hope that the findings will provide greater clarity and useful information for end-users aiming to utilize CRP measures in their work and for multiple purposes (e.g., research, practice).

Our study was confined to the definition of CRP as it is currently operationalized in the quantitative literature thus far, leading us to highlight gaps within the CRP measurement field. We used the term CRP to include culturally responsive teaching, culturally relevant pedagogy, and culturally sustaining pedagogy, given their similarities as asset-based pedagogies, as has been acknowledged by both Gay (2018) and Ladson-Billings (2021). We recognize and include the more recent term culturally sustaining pedagogy (Paris, 2012) in our review, which has been embraced by Ladson-Billings (2014); but we generally used CRP as the umbrella term, given its broad familiarity and recognition among practitioners and researchers. Because this study is intended to provide guidance to school practitioners and researchers on how to best evaluate and possibly increase use of CRP in the classroom, it was important to assess and report on the measures’ intended use among specific samples and contexts and not merely classical reliability and validity metrics. This comprehensive approach allowed us to better understand the utility and limitations of using high-inference CRP measures for complex, real-world applications.

Method

The current study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 recommendations, whereby we also provided a rationale for any notable restrictions to study eligibility (Page et al., 2021). After the initial identification of studies and removal of duplicate records, we followed a multistep process that included an initial screening focused on relevance and study setting, and three consecutive eligibility phases specific to study design and reporting. Figure 1 provides a visual representation of the studies identified using the PRISMA-compliant flow diagram.

Figure 1.

PRISMA 2020 flow diagram.

Search Strategy

The extant literature was obtained for systematic review via six research databases. For efficiency, we used the combined search function via EBSCOhost research platform, which included five education databases: Education Resources Information Center (ERIC); Education Research Complete; Education Full Text; Academic Search Complete; and the Psychology and Behavioral Sciences Collection. Additionally, we also ran the same Boolean search terms in the PsycINFO database given the inclusion of CRP research in the educational psychology field. Notably, we did not set any publication year criteria in an attempt to capture the CRP measurement literature in its entirety. Records were last sourced via database search in January 2021 and via expert and peer review in July 2022.

The search strategy was carefully crafted using the guidance outlined in the Peer Review of Electronic Search Strategies (PRESS) for systematic reviews (McGowan et al., 2016). We consulted three research librarians with expertise in the systematic review process who ultimately suggested a stringent use of Boolean search terms. Due to the expansiveness of CRP-related terminology, we used search terms where an asterisk captured the letters at the end of the term and the connectors “AND” and “OR” combined concepts to search for terms. We provide a full list of the search term combinations related to CRP, education, school, and measurement in Table 1. Given the importance of teacher-student relationships in the classroom milieu, we limited the review to those studies designed to evaluate teacher practices in the classroom environment. For example, studies that examined CRP among school counselors or other related service professionals were not included. Moreover, only studies that were published in English and conducted in the U.S. were included in this review. Although the international literature provides an important perspective about cultivating multicultural school climates amidst global tensions between nationalism and globalization, this area of research was beyond the scope of our study, which focused on CRP in U.S. schools. Including studies conducted only within the U.S. allowed us to evaluate the quality of CRP measurement through the perspective of the United States’ geographic, racial, and sociocultural landscape.

Table 1

Search term combinations

CRP	+	Education	+	School	+	Measurement
cultural* responsiv* culturally relevant cultural* competen* multicultural* multicultural competen* warm* demand* Afrocentri* culturally sustain* cultural dimension cultural awareness sociopolitical conscious* sociocultural conscious* sociocultural competen* sociocultural equity racial competenc*		teach* instructionpracticpedagogclimateeducat		classroom school* elementary school* primary school* middle school* high school* secondary school* preschool* K-12		Measures (Individuals) evaluat* evaluat criteria* evaluat methods* assess* measur* self-report measure teacher-report measure observ* observation measure* observation tool observation protocol assessment protocol evaluation protocol checklist survey scale psychometric propert* psychometric* exploratory factor analy* confirmatory factor analy* rating scale rubric scoring rubric domain dimension inventory indicator instrument* questionnaire operationaliz* reliability validat* analytic tool efficacy empirical valid* method*

Note. The search terms within each column were separated with the “OR” Boolean term. The terms across columns were connected by the “AND” Boolean term.

Additionally, only studies that underwent a rigorous peer-review process, including peer-reviewed academic journals or chapters in edited volumes, were included. Other reports, such as unpublished manuscripts, online reports, doctoral dissertations, conference abstracts, and other gray literature, were excluded. Any study identified as retracted was also excluded. Finally, only studies that examined CRP in PK–12 were included in the review given the substantial variability of higher education settings. However, studies that specifically surveyed undergraduates about their work as preservice teachers were included.

Following the identification of studies, we utilized Zotero reference management software to remove duplicate records. Whereas some of the eligibility criteria were automatically screened through search filters (e.g., English, peer-reviewed), we also reviewed all records (title/abstract) during this initial screening. Two reviewers independently worked through a subsample comprised of 10% of the eligible records; they then compared their inclusion/exclusion decisions and established 84% reliability before proceeding with the remaining records. Throughout the selection and data collection processes, reviewers utilized organizational tools from Zotero, including color-coded tags denoting study inclusion or exclusion, folders, subfolder, and shared libraries. If uncertain about whether a study met the inclusion criteria, the primary reviewer (first author, MF) would confer with her coauthors, and a broader research team of senior and mid-career researchers with expertise in systematic reviews, until the coauthors reached a consensus using the preset eligibility criteria. The first author (MF) and second author (JB) also held biweekly meetings, during which they monitored for rater drift by independently applying criteria and cross-checking with studies randomly pulled from the dataset.

Eligibility Criteria

The first phase of eligibility screening focused on study design, following the successful retrieval of study records. Specifically, we determined studies to be ineligible if the authors (a) did not utilize a quantitative CRP instrument; (b) included a measure, but it was unrelated to CRP; or (c) utilized a measure of CRP that was originally published and validated by another author. If a measure was relevant but originally published and validated through another study, the dataset was cross-referenced to ensure that only the original study was included; if it was not captured in the extant records, the original study was added. The same procedures were applied when measures of potential relevance were reviewed and listed in the articles’ reference list.

For the second phase of eligibility screening, we needed access to each measure’s items to address our second aim of characterizing how CRP is conceptualized and interpreted. If the items of a measure were not published or accessible, the study was excluded (see Figure 1). When coding whether a study included the published items, we first reviewed the full text to determine if the items were included in the narrative, tables, or figures. Then, we referred to the appendices; and if no measure items were reported, the title of the measure would be searched via Google to find whether the items were published elsewhere by the study investigators. To ensure that the items of measures we included in our search were fully accessible to readers, we limited our internet search time to 10 minutes per measure, exploring the first page of search results. We tested this approach with roughly 25% of the measures and found that searching past 10 minutes and searching the second page of results yielded no additional information.

The third phase of eligibility screening was focused on the evidence available to support the use of each measure in racially and ethnically heterogenous school and classroom contexts in the United States. Thus, to be included, the methods section of each study was required to report (a) psychometric properties (i.e., at least some report of reliability and validity, which is further discussed below) and (b) a PK–12 sample that did not focus solely on the experience of one demographic group. We stipulated the latter requirement to identify CRP measures applicable to settings where there is diversity within educational settings (i.e., racial and ethnic heterogeneity within schools and classrooms), rather than across settings. To be culturally responsive in a diverse classroom is a different set of skills than to be culturally responsive to one cultural and linguistic group only. The purpose of this review was to identify measures of cultural responsiveness in the context of classroom racial-ethnic heterogeneity, whereby measures identified would be generalizable for teachers with within-classroom racial and ethnic heterogeneity. To be included in this review, a study had to report either on reliability or validity. Although evidence of both reliability and validity is preferable in measurement studies, we set criteria of at least one to reduce the risk of reporting bias. Specifically, for reliability, we looked for studies to report metrics of internal consistency, test-retest, or interrater reliability. When examining a scale or subscale, internal consistency reliability could be captured through Cronbach’s alpha (Cronbach & Meehl, 1955), McDonald’s omega (Hayes & Croutts, 2020; McDonald, 2013), or another empirically supported metric of internal consistency reliability (e.g., Phi coefficient, Theta coefficient, Spearman’s rho). Test-retest reliability was measured using Pearson’s correlation coefficient (Pearson, 1900; Rodgers & Nicewander, 2012). For observation tools, Cohen’s kappa (Cohen, 1960) or intraclass correlations (ICC) were used to report interrater reliability. For validity, we focused on metrics indicative of internal structure as captured through factor analytic techniques (e.g., exploratory and confirmatory factor analyses [EFA, CFA]). Recognizing the importance of holistic measurement development and the power of generalizability theory (Brennan, 2010), if there was no report on internal structure, we also considered content validity (e.g., expert review), criterion validity (e.g., associations with other validated measures), and consequential validity (e.g., evidence supporting interpretation/use).

Data Extraction Process

We decided upon a combination of instrument characteristics and quality indicators that aligned with an argument-based approach to evaluating evidence of measures’ validity (Kane, 2013). Specifically, we used the four stages of establishing an interpretation and use argument outlined by Haertel (2018), originally conceptualized by Kane (2006), to guide our critical analysis: (1) scoring, (2) generalization, (3) extrapolation, and (4) use or interpretation. During the data extraction phase, the following data were documented, as given in Table 2: (1a) response options and test score; (1b) traditional metrics of validity, such as factor analyses of measures’ internal structure and expert review; (2a) test reliability; (2b) measurement invariance; (3a) unit of observation (i.e., what is being observed, measured, or collected) and unit of analysis (i.e., who or what inferences are being made about; DeCarlo, 2018); (3b) measurement constructs; (3c) measurement domains; (4a) intended use (coded as Research, Formative, and Summative and listed in alphabetical order, whereby Research = to inform intervention effectiveness, gather evidence for theories, and/or contribute to developing knowledge in a field of study; Formative = to monitor learning, provide ongoing feedback, and/or identify relative strengths and skill areas that need work; and Summative = to evaluate learning at the culmination of instruction by comparing it against a standard or benchmark); (4b) test applications; and (4c) evidence supporting proposed interpretation and use (coded as No, Partial, Full, or N/A whereby “No” Evidence = reported study results were not applicable to the stated uses; “Partial” Evidence = reported study results were applicable to some, but not all, of the stated uses; “Full” Evidence = reported study results were applicable to all of the stated uses; and N/A = no measure use was reported and, therefore, evidence could not be evaluated). We report on MI to provide evidence on the use of these measures across specified populations. We looked for MI to be examined at a minimum based on the reporter’s race and ethnicity (in the case of the student- and teacher-report measures) and the classroom racial composition (in the case of observer-report measures), though we expected MI to be examined in other aspects of identity and culture as well (e.g., language, gender). In addition, we assessed whether a measure was developed or validated using qualitative reports of validity (Creswell & Miller, 2000), including cognitive interviewing or pretesting (Drennan, 2003), which is an important step for establishing validity (i.e., measure items are interpreted by users as the developers intended)

Table 2

Interpretation and use arguments for CRP measures

First author (year)	Stage 1: Scoring		Stage 2: Generalization		Stage 3: Extrapolation			Stage 4: Use or interpretation
First author (year)	Response options / test score	Validity	Reliability	MI	Unit of obs. / Unit of analysis	Construct	Domains	Use	Application	Ev.
Brabeck (2000)	3-pt rubric of interview responses /Sum	Not convergent w/ D’Andrea’s (2003) measureContent validity via syllabi and literature review	Domains:α = .27–.74Test-retest: .65IRA: 70%–72%	N/R	Observer report / Teacher sensitivity	Ethical sensitivity to racial and gender intolerance	1. Professional competence2. Integrity3. Professional and scientific responsibility4. Respect for others’ rights and dignity5. Concern for others’ welfare6. Social responsibility	ResearchSummative	Evaluate:• Intervention effectiveness• Impact of coursework• Adherence to professional ethical code	No
Byrd (2017)	5-pt Likert /Avg.	CFA:CFI = .95TLI = .95RMSEA = .038	CS: α = .81–.84CCS: α = .72–.73	N/R	Student report / Teacher socialization	School racial climate	CRP subscales:1. Cultural socialization (CS)2. Critical consciousness socialization (CCS)	FormativeResearch	Inform intervention selection	No
Cherng (2019)	6-pt Likert /Binary coded 1/0; coded 1 if score is in top 1/5th of values	CFA:Model fit statistics N/R	α = .83	N/R	Teacher report / Teacher awareness	Beliefs, attitudes, and dispositions toward teaching in multicultural settings	CRP subscale:1. Multicultural awareness	Research	Contribute to the knowledge base on teacher preparation for diversity	Full
Curenton (2019)	5-pt Likert;% of pupils affected / Avg.	EFA: 60% of the item variance explainedDivergent w/ CLASS	α = .74–.90IRA: 65.3%–72.6%	N/R	Observer report / Teacher interactions	Equitable sociocultural interactions	1. Challenging status quo knowledge2. Equitable learning opportunities3. Equitable discipline4. Connections to home life5. Personalized learning opportunities	ResearchSummative	Utilize as outcome measure in:• Future research• Preservice fieldwork• In-service professional development	No
D’Andrea (2003)	4-pt Likert / N/R	EFA: 62% of the item variance explained	Domains:α = .73–.93ICC: .50–.62	N/R	Teacher report / Teacher competence	Multicultural competence	1. Multicultural awareness2. Multicultural knowledge3. Multicultural skills	Summative	Evaluate preservice teachers; inform in-service teacher consultation	No
Debnam (2015)	5-pt Likert / Avg.	CFA:CFI = .960TLI = .956RMSEA = .037Convergent w/ Siwatu’s (2011) measure	α = .56IRA: 89%–92%	N/R	Observer report / Teacher proficiency	Cultural proficiency	N/A	FormativeResearchSummative	Evaluate intervention effectiveness; inform professional development and teacher coaching	No
Dickson (2016)	5-pt Likert / Avg.	EFA: % variance N/RCFA:CFI = .913RMSEA = .056Content validity via pretesting	Overall: α = .90Domains:α = .71–.89	= ♀/♂= L/-L	Student report / Teacher practices	Culturally responsive teaching practices	1. Diverse teaching practice2. Cultural engagement3. Diverse language affirmation	FormativeResearchSummative	• Advance the research of culturally responsive practice and related outcomes• Assess the extent to which students are experiencing teachers’ practices as culturally responsive• Involve students in the planning and assessment of their educational experiencesEvaluate:• Teacher education programs• Preservice teachers’ performance	Par.
Flores (2009)	5-pt Likert;3 open-ended / Avg.	EFA (prior version): 66% of the item variance explainedContent validity via focus group	α = .96IRA: Rater drift evaluated	N/R	Observation report / Classroom ecology	Culturally responsive classroom ecology	1. Sociophysical2. Socioemotional3. Sociolinguistic/cognitive4. Sociocultural	Formative	Engage teacher candidates in critical reflection and dialogue	Par.
Flores (2011)	5-pt Likert / N/R	EFA: 54% of the item variance explainedFace validity, content validity; expert review	α = .90Test-retest:r = .96, p < .001	N/R	Teacher report / Classroom ecology	Culturally responsive classroom ecology	1. Sociocognitive2. Sociocultural3. Sociolinguistic4. Socioemotional5. Sociophysical6. Sociocomfort	FormativeSummative	• Used by teacher candidates and teachers for self-reflection• Screening tool prior to preservice field experiences• Monitor teacher candidate development across time	No
Guyton (2005)	4-pt Likert;5-pt Likert / % Correct, Avg.	CFA:Model fit statistics N/R	Overall:α = .89Domains:α = .72–.93	N/R	Teacher report / Teacher efficacy	Multicultural efficacy	1. Experience2. Attitude3. Efficacy	Research	• Evaluate multicultural teacher education programs• Measure changes in preservice teachers• Diagnose levels of multicultural efficacy to indicate needed education• Predict teacher success in teaching diverse learners	No
Jensen, Grajeda (2018)	7-pt Likert / Avg.	CFA:CFI = .936RMSEA = .110Purposive sampling w/ videos high avg. in CLASS – “Regard for Student Perspectives”	Domains:φ = .76–.77κ = .38–.63Dimensions:φ = .35–.95κ = .34–.75IRA: N/R	N/R	Observer report / Classroom interactions	Cultural dimensions of classroom interactions	1. Life Applications:-Language use-Difference appreciation-Equity-Content personalization2. Self in Group: -Competition-Cooperation-Social motivation3. Agency: -Autonomy-Role flexibility-Equitable expectations	FormativeResearch	• Develop and test classroom interventions to enhance equitable learning opportunities• Pre-service and in-service teachers interpret scores with colleagues, coaches, and mentors to reflect on and revise their practices to be more equitable	No
Jensen, Whiting (2018)	9-pt Likert / N/R	EFA: 46% of the item variance explained	α = .79	N/R	Teacher report / Teacher dispositions	Multicultural dispositions	1. Meekness2. Advocacy3. Social Awareness	FormativeResearch	Inform:• Design of professional development• Theories of teacher development and teaching	Par.
Kumar (2019)	3-pt Likert / Avg.	CFA:CFI = .988TLI = .985RMSEA = .033Content validity via focus group	Overall:α = .75Domains:α = .66–.73	=ArA / Cha / AfA / EuA	Student report / Learning environment	Culturally responsive and inclusive learning environment	1. Promoting cultural openness and positive intergroup relationships2. Providing culturally inclusive and responsive curriculum3. Establishing culturally responsive school practices and policies	N/R	Assess school cultures, policies, and practices	No
Love (2005)	5-pt Likert / N/R	N/R	Overall:α = .75^a Domains:α = .72–.85	N/R	Teacher report / Teacher beliefs	Culturally relevant beliefs	1. Knowledge2. Student’s race, ethnicity, and culture3. Social relations in and beyond the classroom4. Teaching as a profession5. Teaching practice6. Students’ needs and strengths	N/R	N/R	N/A
Marx (2012)	5-pt Likert;1 open-ended / Avg.	EFA: % variance N/R	Overall:α = .94Domains:α = .81–.91	N/R	Student report /School climate	Multicultural school climate	1. Liking school2. Educator-student relationships3. Cultural relevancy4. School success	ResearchSummative	• Enable school leaders, teachers, and scholars to assess existing school climate for evidence of multicultural practices• Examine effectiveness of changes initiated by the school	No
McKoy (2013)	5-pt Likert / Avg.	N/R	Overall:α = .86Domains:α = .73–.75	N/R	Teacher report / Teacher competence	Cross-cultural competence	1. Factors fostering readiness to teach culturally diverse students2. Factors constraining readiness to teach culturally diverse students3. Teacher preparation for multicultural music education	Summative	Assess the attainment of cross-cultural competence among teacher candidates	No
Pohan (2001)	5-pt Likert / Sum	Content validity via expert review	α = .71–.81 /α = .78–.90	N/R	Teacher report / Teacher beliefs	Beliefs about diversity	1. Race/ethnicity2. Gender3. Social class4. Sexual orientation5. Disabilities6. Language7. Religion	FormativeResearchSummative	Used to:• Identify staff development needs• Assess the impact of interventions (e.g., workshops, seminars, course work, practica) through pre/posttest• Guide curriculum revisions for teachers, counselors, and administrators’ training• Inform a comprehensive staff development plan to address specific areas of ignorance, resistance, or closedness to diversity	No
Ponterotto (1998)	5-pt Likert / N/R	EFA: % variance N/RConvergent validity; Content validity via pretesting	α = .86θ = .89	N/R	Teacher reports / Teacher awareness	Multicultural awareness and sensitivity	N/A	Research	• Distinguish between subjects who had and had not completed focused multicultural workshop training• Discriminate between teachers high and low in multicultural awareness	Full
Powell (2016)	4-pt Likert informed by field notes / N/R	CRIOP Post-Observation and Family Collaboration Teacher Interview Protocols	α = .78κ = .84IRA = 80%	N/R	Observer report / Teacher instruction	Culturally responsive instruction	1. Classroom relationships2. Family collaboration3. Assessment4. Curriculum/planned experiences5. Instruction/pedagogy6. Discourse/instructional conversation7. Sociopolitical consciousness/diverse perspectives	FormativeResearch	• Inform individual classroom coaching, on-site professional development, and instructional planning support• Assess effectiveness of a classroom-based coaching and mentoring intervention	Par.
Saldana (1997)	Binary;% of time observed / Avg.	Face validity check by 5 profs. & 5 in-service urban teachers	IRA: 80%	N/R	Observer report / Teacher instruction	Multicultural education	1. Teacher support of students2. Classroom equity3. Integration of students’ culture	N/R	N/R	N/A
Scott (2001)	5-pt Likert / Avg.	EFA: 39% of the item variance explained	Domains:α = .50–.90	N/R	Teacher report / Teacher perceptions	Perceptions and attitudes toward multicultural education	1. Benefits2. Interest/motivation3. Awareness	Summative	Determine teachers’ willingness to engage in multicultural education trainings	No
Siwatu (2017)	Degree of confidence:No confidence at all (0) – Completely confident (100) / Sum; Avg.	EFA: 53% of the item variance explainedInternal structure aligned w/ Siwatu’s (2011) measure and the Teacher Sense of Efficacy scaleContent validity via pretesting	α = .97	N/R	Teacher report / Teacher efficacy	Culturally responsive classroom management self-efficacy	N/A	Research	Assessing preservice and in-service teachers’ beliefs and designing interventions to increase their culturally responsive classroom management knowledge, skills, disposition, and self-efficacy beliefs	No
Siwatu (2011)	Degree of confidence:No confidence at all (0) – Completely confident (100) / Avg.	N/R	α = .94–.96	N/R	Teacher report / Teacher efficacy	Culturally responsive teaching self-efficacy	N/A	N/R	N/R	N/A
Sparks (1996)	Yes/No/Unsure;5-pt Likert / Avg.	Face validity via expert review and experienced teachers	α = .85	N/R	Teacher report / Teacher attitudes	Multicultural knowledge, attitudes, and experiences	N/A	N/R	N/R	N/A
Stanley (1996)	6-pt Likert / N/R	EFA: % variance N/RContent validity via expert and literature review	Overall: α = .91Domain:α = .72–.85Test-retest: .84	N/R	Teacher report / Teacher attitudes	Appreciations of cultural pluralism and diversity attitudes	1. Appreciate cultural pluralism2. Value cultural pluralism3. Implement cultural pluralism4. Uncomfortable with cultural diversity	N/R	N/R	N/A
Thompson (2009)	9-pt Likert / Avg.	EFA: 56% of the item variance explained	Domains:α = .73–.86	N/R	Teacher report / Teacher dispositions	Multicultural dispositions	1. Cross-cultural competence2. Multicultural world view3. Personal knowledge of self4. Professional skills and commitment	Summative	• Assess preservice teacher personal and professional tools for work w/ diverse learners• Determine if teaching is a good professional match	No
Waddell (2014)	4-pt Likert / N/R	N/R	κ = .18–.36IRA = 42% full; 84% partial	N/R	Observer report / Teacher practices	Culturally ambitious teaching practices	1. Cultural competence2. Critical consciousness3. Academic achievement	Formative	• Establish a baseline of practices that reflect cultural relevance in math classrooms• Provide opportunities for critical reflection by teachers on their practices, beliefs, and cultural vision	Par.

Note. The four stages are based on Haertel (2018) and Kane (2006). The studies are ordered alphabetically by first author. The “Use” column is ordered alphabetically, with Research = to inform intervention effectiveness, gather evidence for theories, and/or contribute to developing knowledge in a field of study; Formative = to monitor learning, provide ongoing feedback, and/or identify relative strengths and skill areas that need work; and Summative = to evaluate learning at the culmination of instruction by comparing it against a standard or benchmark. The “Ev” (evidence) column was coded as No = reported study results were not applicable to the stated uses; Partial (abbreviated as Par.) = reported study results were applicable to some, but not all, of the stated uses; Full = reported study results were applicable to all the stated uses; and N/A= measure uses not stated, so evidence could not be evaluated. MI = measurement invariance; Obs. = observation; Avg. = average score; N/R = not reported; CRP = culturally responsive practice; IRA = interrater agreement; CFA = confirmatory factor analysis; CFI = comparative fit index; TLI = Tucker Lewis index; RMSEA = root mean square error of approximation; EFA = exploratory factor analysis; CLASS = Classroom Assessment Scoring System; ICC = intraclass correlation coefficient;= = invariant across groups; ♀ = female; ♂ = male; L = Latine; -L = Not Latine; ArA = Arab/Arab-American; Cha = Chaldean; AfA = African American; EuA = European American.

Reliability statistics calculated using a larger sample of 244 teachers, paraprofessionals, counselors, principals, instructional specialists, and media specialists

These data points were extracted to critically assess evidence of high-quality, interpretable data, an initial step towards establishing a measure’s consequential validity. However, because this study was designed to be a useful reference for practitioners, researchers, and policymakers, we also included descriptive information about each study’s sample and context in separate tables (see Tables 3, 4, and 5). Sample demographics were reported via percentage for ease of interpretation across studies; if race and gender descriptive statistics were reported in the original study numerically, we converted them to percentages. Missing summary statistics were denoted as Not Reported (N/R). Sample characterization summaries are provided in Tables 3 through 5 (refer to Supplemental Appendix A, in the online version of the journal, for further detail), which are critical to provide transparency regarding the normative sample the measure was tested with and its implications for the generalizability of measurement-related findings. These sample summaries may be of interest for those selecting a measure to use in practice or research settings.

Table 3

Eight observation measures of culturally responsive classroom practice

Measure: Subscale	Authors	N ^a	Sample characteristics			Data	# of Items	Time	# of Observations
Measure: Subscale	Authors	N ^a	Teachers	Students	Region	Data	# of Items	Time	# of Observations
Assessing Classroom Sociocultural Equity Scale (ACSES)	Curenton et al. (2019)	52	♀: 95%W: 67%B: 27%L: 2%A: 2%O: 2%Subj.: Math, ScienceIn-service teachers	Grade: PreK W:45%B:33%L: 8%A: 4%O:10%	Urban Midwest/ Southeast	2013–2014 video obs.	33	15 min.	1–4
Assessing School Settings: Interactions of Students and Teachers (ASSIST): Culturally responsive teaching strategies scale	Debnam et al. (2015)	148	♀: 84%W: 85%B: 7%L: 0.7%A: 2%O: 2%Subj.: Core AcademicIn-service teachers	Grade: K–8 Across schools58% ethnic minority	Mid-Atlantic	Live obs.	4	15 min.	1
Classroom Assessment of Sociocultural Interactions (CASI)	Jensen, Grajeda, et al. (2018)	40	♀: N/RW: N/RB: N/RL: N/RA: N/RO: N/RSubj.: ELAIn-service teachers	Grade: 4–5 Classes≥50% non-W students	Urban,region N/R	2009-2011 MET video obs.	35	15 min.	6
Culturally Ambitious Teaching Practices (CATP) in Mathematics	Waddell (2014)	6	♀: N/RW: N/RB: N/RL: N/RA: N/RO: N/RSubj.: MathIn-service teachers	Grade: 5–8 Majority B+L students,(68%–82%)	Southeast midsize city	Video obs.	20	N/R	2
Culturally Responsive Instruction Observation Protocol (CRIOP)	Powell et al. (2016)	27	♀: 100%W: 96%B: 4%L: 0%A: 0%O: 0%Subj.: N/RIn-service teachers	Grade: K–3 Student race N/R;Indirect mention of Latine students	Midwest rural areas + midsize city	Live obs.	N/A	>2.5hrs.	2
Early Childhood Ecology Scale (ECES) – Observation Form	Flores & Riojas-Cortez (2009)	98	♀: N/RW: N/RB: N/RL: N/RA: N/RO: N/RSubj.: N/RPreservice teachers	Grade: PreK Student race N/R;Indirect mention of Latine students	Southwest	Live obs.	28	4 hrs. total	2–3
Multicultural Teaching Observation Instrument (MTOI)	Saldana & Waxman (1997)	76	♀: N/RW: 34%B: 59%L: 7%A: 0%O: 0%Subj.: Reading, Math, SS	Grade: 4–5 W:10%B: 40%L: 50%A: 0%O: 0%	Urban South Central	Live obs.	18	45–50 min.	2
Racial Ethical Sensitivity Test (REST)	Brabeck et al. (2000)	42	♀: 76%W: 81%B: 12%L: 2%A: 5%O: 0%Subj.: N/RPreservice teachers	N/A	Urban Northeast	Teachers view then comment on video scenarios	37	N/R	2

Note. Multi. = multidimensional; Uni. = unidimensional; ♀ = gender is female; Subj. = subject taught by teachers; W = White; B = Black/African American; L = Latine; A = Asian/Pacific Islander; O = other and/or multiracial; ELA = English/language arts; Obs. = observations; MET = Measures of Effective Teaching Project, Bill & Melinda Gates Foundation; SS = social studies.

For observational measures, N signifies number of teachers observed.

Table 4

Four student-report measures of culturally responsive classroom practice

Measure:Subscale	Authors	N ^a	Sample characteristics			# of items
Measure:Subscale	Authors	N ^a	Teachers	Students	Region	# of items
Culturally Inclusive and Responsive Curricular Learning Environments (CIRCLE) Scale	Kumar et al. (2019)	2,894	♀: N/RDistrict A:W: 77%B: <1%L: N/RA: N/RO: 23%^* District B:W: 96%B: 2%L: N/RA: N/RO: 2%^* Subj.: N/R	Grade: 6–8 District A+B:W: 49%B: 9%L: N/RA: N/RO: 41%**Arab/Arab-American; Chaldean	N/R	15
Multicultural School Climate Inventory (MSCI)	Marx & Byrnes (2012)	1,511	♀: N/RW: N/RB: N/RL: N/RA: N/RO: N/RSubj.:N/R	Grade: 8–9 W: 94%B: 0%L: 6%A: 0%O: 0%	Semirural Intermountain West	23
School Climate for Diversity – Secondary (SCD-S) Scale: Cultural Socialization (CS) subscale and Critical Consciousness Socialization (CCS) subscale	Byrd (2017)	Study 1: 315Study 2: 504	♀: N/RW: N/RB: N/RL: N/RA: N/RO: N/RSubj.: N/R	Grade: 6–12 Study 1+2:W: 25%B: 25%L: 25%A: 25%O: 0%	N/R, online recruitment	CS: 3 CCS: 4
Student Measure of Culturally Responsive Teaching (SMCRT)	Dickson et al. (2016)	748	♀: N/RW: N/RB: N/RL: N/RA: N/RO: N/RSubj.: N/R	Grade: 7 W: 17.1%B: 2%L: 63.9%A: 2%O: 11.9%	Southwest	21

Note. Multi. = multidimensional; Uni. = unidimensional; ♀ = gender is female; W = White; B = Black/African American; L = Latine; A = Asian/Pacific Islander; O = other or multiracial; Subj. = subject taught by teachers; N/R = not reported.

For student-report measures, N signifies number of students who completed the form.

Table 5

Fifteen teacher-report measures of culturally responsive classroom practice

Measure:Subscale	Authors	N ^a	Sample characteristics			# of items
Measure:Subscale	Authors	N ^a	Teachers	Students	Region	# of items
Cross-Cultural Competence Survey (CCCS)	McKoy (2013)	337	♀: 47%W: 83%B: 4%L: 6%A: 2%O: 4%Subj.: MusicPreservice teachers	Grade: N/RW: N/RB: N/RL: N/RA: N/RO: N/R	South, East. Southwest, West, North Central, Northwest	31
Culturally Responsive Classroom Management Self-Efficacy (CRCMSE) Scale	Siwatu et al. (2017)	380	♀: 62%W: 83%B: 6%L: 7%A: N/RO: N/RSubj: N/RPreservice and in-service teachers	Grade: N/R**Most aspiring ES teachers W: N/RB: N/RL: N/RA: N/RO: N/R	South,Southeast	35
Culturally Responsive Teaching Self-Efficacy (CRTSE) Scale	Siwatu (2011)	34	♀: 62%W: 62%B: 14%L: 12%A: N/RO: N/RSubj: N/RPreservice teachers	Grade: N/R**ES, MS, HS preservice teachers W: N/RB: N/RL: N/RA: N/RO: N/R	Southwest	31
Early Childhood Ecology Scale-Revised (ECES-R) - Self Assessment Form	Flores et al. (2011)	389	♀: 97%W: 33%B: 3%L: 55%A: 2%O: 4%Subj: N/RPreservice teachers	Grade: N/R*74% of respondents pursuing early childhood through fourth grade (EC-4) certification* W: N/RB: N/RL: N/RA: N/RO: N/R	Southwest	35
Educational Beliefs and Multicultural Attitudes Survey (EBMAS): Multicultural Awareness subscale	Cherng & Davis (2019)	2,357	♀: 83%W: 53%B: 6%L: 9%A: 27%O: 5%Subjects =EC/ES: 35%ELA: 11%Math: 8%Sci.: 5%SS: 4%Arts: 19%FL: 5%TESOL: 10%Preservice teachers	Grade: N/RW: N/RB: N/RL: N/RA: N/RO: N/R*55% of respondents had taught minority students before	Northeast	8
Multicultural Awareness-Knowledge-Skills Survey – Teachers Form (MAKSS-Form T)	D’Andrea et al. (2003)	171	♀: 73%W: 19%B: 0%L: 1%A: 57%O: 23%Subj.: N/RPreservice teachers	Grade: N/RW: N/RB: N/RL: N/RA: N/RO: N/R57% respondents pursuing special education certification	West	41
Multicultural Dispositions Index (MDI)	Thompson (2009)	1091	♀: 77%W: 89%B: 6%L: 3%A: 2%O: 0%Subj.: N/RPreservice teachers and school counselors	Grade: N/RW: N/RB: N/RL: N/RA: N/RO: N/R	Urban Midwest	22
Multicultural Efficacy Scale (MES)	Guyton & Wesche (2005)	626	♀: 81%W: 82%B: 11%L: 3%A: 2%O: 2%Subj.: N/RPreservice teachers	Grade: N/RW: N/RB: N/RL: N/RA: N/RO: N/R	Southeast, South, North, West	35
Multicultural Physical Education Instrument (MPEI)	Sparks et al. (1996)	348	♀: 57%W: 72%B: 11%L: 7%A: 6%O: 1%Subj.: Physical EducationIn-service teachers	Grade: N/RW: N/RB: N/RL: N/RA: N/RO: N/R	Suburban Northeast, Midwest	41
Multicultural Staff Development Teacher (MSDTS) Survey	Scott & Pinto (2001)	88	♀: 88%W: 57%B: 36%L: N/RA: N/RO: 2%Subj.: N/RIn-service teachers	Grade: K–6W: N/RB: N/RL: N/RA: N/RO: N/R	Urban, Mid-Atlantic	29
Multicultural Teacher Dispositions Scale (MTDS)	Jensen, Whiting, et al. (2018)	191	♀: 95%W: 88%O: 12%Subj.: N/RPreservice teachers	Grade: N/R**ES preservice teachers W: N/RB: N/RL: N/RA: N/RO: N/R	Intermountain Region	15
Pluralism and Diversity Attitude Assessment (PADAA)	Stanley (1996)	215	♀: N/RW: N/RB: N/RL: N/RA: N/RO: N/RSubj.: Physical Education Preservice teachers	Grade: N/RW: N/RB: N/RL: N/RA: N/RO: N/R	Southeast, Northeast, West, Midwest	19
Personal / Professional Beliefs About Diversity Scale (PBADS)	Pohan & Aguilar (2001)	756	♀: N/RW: N/RB: N/RL: N/RA: N/RO: N/RSubj.: N/RPreservice and in-service teachers	Grade: N/RW: N/RB: N/RL: N/RA: N/RO: N/R	Southeast, Midwest, West	15 / 25
Teacher Multicultural Attitude Survey (TMAS)	Ponterotto et al. (1998)	227	♀: 81%W: 56%B: 13%L: 21%A: 3%O: 7%Subj.: N/RPreservice and in-service teachers	Grade: N/RW: N/RB: N/RL: N/RA: N/RO: N/R	Northeast	20
[Unnamed Survey] Inspired by Ladson-Billings (1994)	Love & Kruger (2005)	50	♀: 85%W: 28%B: 70%L: 0%A: 2%O: 0%Subj.: N/RIn-service teachers	Grade: K–5W: N/RB: N/RL: N/RA: N/RO: N/R	Southeast	48

Note. Multi. = multidimensional; Uni. = unidimensional; ♀ = teacher gender is female; W = White; B = Black/African American; L = Latine; A = Asian/Pacific Islander; O = other or multiracial; N/R = not reported; Subj. = subject taught by teachers; EFA = exploratory factor analysis; CFA = confirmatory factor analysis; EC = early childhood; ES = elementary school; MS = middle school; HS = high school; ELA = English/language arts; SS = social studies; FL = foreign language; TESOL = teaching English as a second language.

For teacher-report measures, N signifies number of teachers who completed the form.

Reliability statistics calculated using a larger sample of 244 teachers, paraprofessionals, counselors, principals, instructional specialists, and media specialists.

Quality Analysis

To address our first research aim, to summarize the state of the science of quantitative CRP measurement pertaining to the reliability and validity of existing instruments, we conducted a quality analysis. For ease of review and interpretation, the results of this quality analysis are presented visually in Figure 2. We designated reliability and validity as low, moderate, or high based on the benchmarks outlined below in the education literature, though we recognize and advise readers to consider that these criteria are high for measures of teaching in general and that other methods of assessing reliability have been used, particularly in observational research (e.g., Bell et al., 2012). For example, within the rigorously designed Measures of Effective Teaching Project some of the most commonly used, extensively studied general teaching observation measures (e.g., Classroom Assessment Scoring System [CLASS]; Framework for Teaching [FfT]; Protocol for Language Arts Teaching Observations [PLATO], and Mathematical Quality of Instruction [MQI]) would achieve low to moderate reliability and low evidence of validity based on the standards detailed in the following paragraphs (Kane et al., 2014; Kane & Staiger, 2012). We provide this caveat to strengthen readers’ interpretation of the psychometric findings presented in this study.

Figure 2.

Visual representation of quality analysis.

Measures were indicated as having low reliability if the internal consistency coefficients (e.g., alpha, omega, phi, theta) were <.60, moderate (minimally acceptable) reliability was between .60–.74, and ≥.75 was considered high reliability (Taber, 2018). For test-retest reliability, a Pearson’s correlation coefficient < .70 was considered low, .70–.79 was considered moderate, and ≥ .80 was considered high (Rodgers & Nicewander, 2012). The kappa and ICC statistics, which measure interrater reliability, were assessed in terms of quality with ≤ .40 as low reliability, .41–.60 as moderate reliability, ≥ .61 as high reliability (McHugh, 2012). If reliability metrics were reported for both overall measure and subscales, we assessed the overall (total) metric. However, if only a range was reported (e.g., three domain-level alphas, no overall alpha), we considered the upper and lower limits of the range. If the upper and lower limits of the range were of the same quality, they were given that designation (e.g., α = .76–.90 are both ≥.75, indicating high reliability). If the upper and lower limits of the correlation coefficient range were in different but adjacent quality levels, we gave the measure the benefit of the doubt and defaulted to the higher designation (e.g., α = .72–.90 are moderate and high reliability, respectively, resulting in a designation of high reliability). Finally, if the upper and lower limits of the correlation coefficient range were in widely different quality levels, we designated the median the quality level (e.g., α = .45–.90 are low and high reliability, respectively, resulting in a designation of moderate reliability).

In assessing the quality of each measures’ internal structure, we used CFA model fit criteria including, factor loadings above .40 (Brown, 2015); comparative fit index (CFI) ≥ .95, Tucker Lewis index (TLI) ≥ .95, and root mean square error of approximation (RMSEA) ≤ .06 (Hu & Bentler, 1999). Measures that did not utilize confirmatory factor analytic techniques to examine internal structure were considered to have low evidence of validity; this included those measures that only reported EFA metrics. EFA is primarily intended to uncover complex patterns by exploring a dataset to parsimoniously find common factors (Watkins, 2018), whereas CFA can be used to establish internal structure by reducing measurement error and assessing the fit of the data to an a priori hypothesized measurement model (DiStefano & Hess, 2005). Measures were designated as having low validity if a CFA was conducted but fewer than two model fit criteria were met. Measures were designated as having moderate validity if they met at least two or more CFA model fit criteria; if all four CFA model fit statistics were met, the measure was designated as having high validity. We indicated where MI was assessed in Figure 2.

To address Aim 2, to synthesize measurement constructs and uses (i.e., pertaining to extrapolation and use or interpretation stages of Kane’s (2006) model), we critically analyzed data extracted and reported in Table 2. Specifically, we characterized the construct of CRP as it was conceptualized, as well as how the scores of classroom CRP measures were proposed to be interpreted and used. Our analysis is reported in the Results section.

Author Reflexivity

We the authors are White researchers who conduct primarily quantitative research with a postpositivist epistemological lens at a predominantly White institution (PWI; a university where more than 50% of its student body is White). We are currently evaluating the effects of a coaching intervention designed to improve teachers’ CRP as part of a randomized controlled trial. We recognize that being White and conducting quantitative research in this space defined largely by interpretivist epistemological foundations of CRP-related constructs and theories necessarily means we bring gaps in understanding as well as biases to this work. We are aware of our positionality as White quantitative researchers in this space and, as such, are not seeking to alter or extend, but only to mirror and translate the rich tradition of interpretivist CRP scholarship to the field of education and prevention sciences, where it may be used to support formative professional development and intervention goals to improve teachers’ CRP. We bring our own experience as applied psychologists and developmental scientists committed to understanding the effects of equity initiatives and CRP for students and staff in schools. As previously mentioned, our team also has experience developing CRP measures, providing insight into the challenges of this work. Recognizing the strengths and limitations of our positionality in relation to the present research, we conducted the reporting bias and certainty assessment below to mitigate potential biases.

Reporting Bias and Certainty Assessment

Following the initial completion of the quality analysis, we utilized expert review via school-based CRP scholars. In addition to scholars within our own professional networks, we reached out to the 24 authors whose measures were included in the final synthesis and inquired as to whether they were aware of any other relevant CRP measures besides their own. Furthermore, reviewers of the manuscript as part of the peer-reviewed publication process noted two additional measures for consideration through the inclusion criteria. If these experts recommended that we investigate a measure that was not already included in the literature review, the team located the original study via research database and screened the measure through all eligibility criteria. Eight of the 12 recommended measures had already been captured in the original search and were screened out due to various eligibility criteria (e.g., non-U.S. sample, not peer-reviewed, not classroom or teaching). However, there were four recommended measures that were eligible to be screened through the inclusion criteria, two of which were ultimately included in the final review. Of the other two that were excluded, the first was screened out because we were unable to locate a peer-reviewed published article validating the measure. The unpublished manual of the Climate of Healthy Interactions for Learning and Development measure (Gilliam & Reyes, 2017) did not meet the peer-review criterion; nor was it specifically related to CRP despite its potential to capture the equitable treatment of children in early childhood settings. The second excluded measure, Reinholz and Shah’s (2018) Equity Quantified In Participation observation tool, was screened but not included. The measure is a tool that teachers can use formatively to assess student behaviors in their classrooms indicative of student engagement disaggregated by student identity characteristics. Although there is some assessment of teacher behavior (e.g., whether a student responded as a result of being called on by the teacher and whether the teacher elevated the students’ voice when they spoke), this is not substantively a measure of CRP and was therefore screened out in Eligibility Phase III.

The first addition was the Assessing Classroom Sociocultural Equity Scale (Curenton et al., 2019), as it was determined that this observation measure was missed in the initial search because of a search term error. The original search terms “sociocultural conscious*” OR “sociocultural competen*” missed the Curenton (2019) measure, referred to by the authors as a “sociocultural equity scale.” We reran the search adding the term “sociocultural equity” and screened all new records; the Curenton et al. (2019) study was the only record to meet inclusion criteria. The rest of the decisioned records were incorporated into the PRISMA flowchart in Figure 1. The second measure added was the Multicultural Teacher Dispositions Scale (Jensen, Whiting, et al., 2018); it did not appear in our original search because it was missing one of the nine school-related terms (e.g., school, K–12, classroom) in the article title, abstract, and keywords (see Table 1). To our knowledge, this was the only instance of this type of exclusion.

Results

The search retrieved 4,080 records across six databases, of which 1,980 were duplicates. The reporting bias and certainty assessment added three new records, resulting in 2,103 records to be screened. Following the initial screening phase, 997 studies were sought for retrieval. However, 118 of these studies were not retrievable even after attempting to use various institutional library resources. During Eligibility Phase I, the methods section of 879 studies were reviewed to ascertain whether they examined a specific classroom CRP measurement tool. The vast majority of Phase I studies (n = 698) did not examine a specific measure. Of the 181 studies that did examine specific measures, 52 were unrelated to CRP, whereas 36 were originally published elsewhere, so only the original study was included. Of the 93 studies that included measures of CRP, approximately a third of the remaining studies (n = 33) were excluded during Eligibility Phase II because the items were not accessible following a due diligence search. In Eligibility Phase III, 60 studies were reviewed for the methods to determine whether psychometric properties were reported and more than one demographic group was the focus. Ultimately, 27 studies were included in the review and assessed for evidence of validity and reliability: 8 observation measures (Table 3), 4 student-report measures (Table 4), and 15 teacher-report measures (Table 5).

To gain a better understanding of the state of CRP measurement, our research was focused on addressing two primary research aims: to understand (a) whether CRP measurements produce reliable and valid data sources and (b) how scholars operationalize CRP and whether there is evidence supporting their intended use and interpretation. The results of our critical analysis are presented in Figure 2 (Aim 1) and Table 2 (Aim 2) with special attention to the content and quality of each measure.

Aim 1: Scoring and Generalization

Addressing our first research aim, we found that most measures ranged widely in terms of reliability and internal structure, and very few measures (i.e., only two student report measures) assessed MI. The Kumar (2019) student-report measure was the sole measure to meet the standards for validity and reliability described in the method and shown in Figure 2. See Table 2 for a summary of each measure’s response options, scoring procedures, validity, reliability, and MI reporting; and Figure 2 for a visual representation of the psychometric quality analysis.

Response Options/Test Score

Observer report measures had the most variability in response options, including field notes (i.e., Powell [2016] measure), scored responses to a semistructured interviews (i.e., Brabeck [2000] measure), estimates of percentage of students affected by the teacher practice (i.e., Curenton [2019] measure), duration of behavior observed (i.e., Saldana [1997] measure), and, most commonly, Likert scales via observer report. Scoring procedures were not consistently reported for observation measures; of those published, an average score was the common metric. Similarly, all four student-report measures were scored using an average score. All response options for students were via Likert scale, with the Marx (2012) measure including an additional open-ended question for students. Thirteen teacher-report measures utilized Likert-style response scales, and both Siwatu measures (2011, 2017) utilized a 0 to 100 confidence rating system. Scoring procedures were not reported for 6 of the 15 teacher-report measures.

Validity

The studies of the measures varied in terms of the indicator of validity reported (see Table 2 and Figure 2). Eighteen of the 27 measures utilized factor analytic methods to describe the variability among observed CRP variables. Whereas some studies conducted both an EFA and CFA (e.g., Dickson [2016] measure), others conducted CFAs only (e.g., Debnam [2015], Kumar [2019], and Byrd [2017] measures), or EFAs only (e.g., Marx [2012] measure). Due to the lack of evidence for internal structure, which cannot be proven using EFA techniques, those measures that only conducted EFA techniques, such as principal components factor analysis, were designated as having low evidence of validity.

Other types of validity (e.g., face and content validity), were established through qualitative reports of validity, such as expert review (e.g., Flores [2011] and Pohan [2001] measures), or pretesting (e.g., Dickson [2016] and Siwatu [2017]). There were also studies that used a mixed methods approach to collect qualitative data via focus groups prior to developing item content (e.g., Kumar et al., 2019), which adds to the content validity of those measures; however, post-item development pretesting or cognitive interviewing is needed to better support item interpretability. In some studies, measure developers tested convergent validity with other nonobservational measures of CRP (e.g., Brabeck [2000] and Debnam [2015]). Finally, in one study, validating the Curenton (2019) measure, divergent validity was established with the Classroom Assessment Scoring System, a culturally nonspecific measure of classroom climate.

Reliability

The classroom observational measures of CRP varied widely. As reported via Cronbach’s alpha, reliability at the subdomain level ranged from relatively low reliability (e.g., Brabeck [2000] measure, α = .27–.74) to high reliability (e.g., Curenton [2019] measure, α = .74–.90). Cohen’s kappa statistics of interrater reliability were also highly variable; across the eight observation measures kappas ranged from low agreement (e.g., Waddell [2014] measure, κ = .18–.36) to high agreement (e.g., Powell [2016] measure, κ = .84; McHugh, 2012). All four student-report measures reported reliability, and those metrics ranged from moderate to high reliability. Cronbach’s alphas ranged from .71 to .91 for subscales and .66 to .94 for full scales (overall metric). Due to the nature of student self-report, there was no interrater agreement or reliability reported. Test-retest reliability was also not reported for any of the student measures. Among the student-report measures, the Marx (2012) measure and the Dickson (2016) measure demonstrated the highest overall reliability with α = .94 and .90, respectively. Fourteen out of 15 teacher-report measures reported alpha values of .70 or greater, which is an acceptable level of internal consistency reliability, particularly for measures in development (Taber, 2018). Both alpha and theta statistics were reported on the study validating the Ponterotto (1998) measure; the theta coefficient is an index of internal consistency for a composite score (Ercan et al., 2007). Test-retest reliability coefficients were reported for the Stanley (1996) measure and Flores (2011) measure, though the test-retest assessment for the Stanley (1996) measure was done with a small subsample of 35 teachers. None of the teacher-report measure studies included students’ racial demographics.

Measurement Invariance

MI was not assessed in any of the observation nor teacher-report measurement studies. Two of the student-report measures (i.e., the Kumar [2019] and Dickson [2016] measures) were assessed for MI. The Kumar (2019) measure demonstrated configural, metric, scalar, and first- and second-order latent factors intercept MI across the following four student racial groups: White, Black/African American, Arab/Arab American, and Chaldean racial-ethnic group membership. The Dickson (2016) measure was assessed for configural, metric, and scalar MI using its second-order factor model. Dickson and colleagues (2016) reported that the nonvariance sources in their second-order factor solution were less than 20% parameters, suggesting that the measurement of the CRP construct did not differ significantly between female and male students, Latine and non-Latine students, and immigrant and nonimmigrant students.

Aim 2: Extrapolation and Interpretation/Use

Units of Observation and Analysis

In assessing units of observations and units of analysis, some concordance was found. The purpose of seeking concordance between units of analysis and observation is to ensure that what is being measured aptly reflects the larger entity or construct being studied. Upon review, observer report measures were aptly designed to capture objective, observable behaviors, such as student-teacher interactions (e.g., Curenton [2019] and Jensen, Grajeda, et al. [2018] measures), teacher behaviors (e.g., Powell [2016] and Waddell [2014] measures), and physical features of classrooms (e.g., Flores [2009, 2011] measures). However, the units of analysis amongst teacher self-report measures revealed a different purpose—to collect data on teachers’ dispositions (e.g., Jensen, Whiting, et al. [2018] and Thompson [2009] measures), self-efficacy (e.g., Guyton [2005] and Siwatu [2011, 2017] measures), and awareness (e.g., Cherng [2019] measure). Student-report measures were the least prevalent type of measure; though of those identified, several were designed to capture a classroom’s overall climate or milieu (e.g., Byrd [2017], Kumar [2019], and Marx [2012] measures).

Construct and Domains

Across the 27 measures of CRP, less than a third explicitly used the phrase “culturally responsive” to describe the primary measurement construct (i.e., Dickson [2016], Flores [2009, 2011], Kumar [2019], Powell [2016], and Siwatu [2011, 2017] measures). There was noticeable terminological variation; each of the 24 authors used different terms to describe CRP-related constructs and domains (see Table 2). Temporal trends across the literature reflected the field’s evolving understanding of CRP as a complex, ongoing pedagogical approach rather than a finite competency. For example, the oldest measures included language regarding teachers’ comfort (e.g., Stanley [1996] measure), sensitivity (e.g., Ponterotto [1998] measure), and intolerance (e.g., Brabeck [2000] measure) to diversity in the classroom. In contrast, measures developed within the past 15 years tended to reflect advances in the field’s understanding of teacher dispositions, including their self-awareness and consciousness of how their beliefs, attitudes, and life experiences influence their world views and interactions with students and families (e.g., Jensen, Whiting, et al. [2018] and Thompson [2009] measures).

Interpretation and Use

To further examine the consequential validity (Messick, 1998) of the measures, we evaluated the supporting evidence for each measure’s intended use and proposed score interpretation (see Table 2; AERA et al., 2014; Kane, 2013). While nearly all 24 measure developers mentioned the intended use of their measure (i.e., formative, research, and/or summative), they were less clear about the proposed interpretation and application of scores. Over three quarters of the studies (i.e., Brabeck [2000], Byrd [2017], Cherng [2019], Curenton [2019], D’Andrea [2003], Debnam [2015], Dickson [2016], Flores [2009, 2011], Guyton [2005], Jensen, Grajeda, et al. [2018], Jensen, Whiting, et al. [2018], Kumar [2019], Marx [2012], McKoy [2013], Pohan [2001], Ponterotto [1998], Powell [2016], Scott [2001], Siwatu [2017], Thompson [2009], and Waddell [2014] measures) included some preliminary commentary about the measures’ possible applications, but very few measures provided any evidence to support these proposed real-life uses. Of the 27 measures, the validity of 5 measures (i.e., Dickson [2016], Flores [2009], Jensen, Whiting, et al. [2018], Powell [2016], and Waddell [2014] measures) were supported by partial evidence, and just 2 measures (i.e., Cherng [2019] and Ponterotto [1998] measures) included enough evidence to meet the benchmark for full support of consequential validity.

Discussion

In this systematic literature review, we took an argument-based approach to documenting evidence of validity (Kane, 2006, 2013), which involved mapping a four-stage process (Haertel, 2018) onto our two primary research aims. Taken together, the four stages of analysis provided important insights regarding the current state of the science in CRP measurement.

Scoring and Generalization

Overall, the extant CRP measurement literature underreports on strategies utilized to initially develop the measure, as well as its intended applications; greater transparency in future reporting is needed. In addition, more consistent reporting of measure metrics (i.e., scores, reliability, validity, and MI), as well as clear guidance on the proposed interpretation of scores, would support the field’s efforts to translate CRP theory into quantifiable practice. In this light, two measures stand out among others, both student-report. The first was a measure by Kumar and colleagues (2019), which proved, through a rigorous mixed methods approach, to be the most psychometrically sound against stringent classical test theory criteria. By taking a phased, exploratory approach to CRP research, Kumar and colleagues produced a measure that met traditional reliability and validity criteria. Though classical test theory criteria are just one element of an overall interpretation and use argument for a measure, the multimethod, asset-based approach that Kumar and colleagues adopted was exemplary in that it produced empirically valid results and centered student voice via focus group interviews. Although judgements about validity metrics are subjective (e.g., convergent validity vs. divergent validity vs. face validity vs. content validity), we believe that pretesting and cognitive interviewing are important strategies that can support QuantCrit efforts to humanize minoritized communities in CRP measurement research (Garcia et al., 2018). By involving key informants, particularly students themselves, in CRP operationalization and measure development, the field can center student voice to increase the quality and appropriateness of future measures. It should be noted, however, that the intended use of this student-report measure was not explicitly stated, leaving the question of consequential validity unanswered.

The second student-report measure with strong evidence of reliability and validity was the Dickson (2016) measure, which was developed using factor analyses, measure pretesting, and, like the measure by Kumar and colleagues (2019), an examination of MI. Synthesized under the Generalization stage of Table 2, evidence of MI was assessed and found only in these two studies of student-report measures (i.e., Dickson [2016] and Kumar [2019] measures). MI testing is critical because it helps to establish whether the measure works the same in different samples or across settings, which is especially important in CRP measurement research because CRP is highly sensitive to context. Overall, the sampling for many studies included in this review was relatively homogenous; and the rigor of sampling procedures, including whether invariance procedures were reported, was inconsistent across the final included studies, suggesting a need for greater attention to MI in future CRP measurement research.

Aligned with our focus on assessing argument-based validity, it is important for readers to understand that the test administration and scoring procedures extracted from the 27 original measurement studies may not reflect the authors’ recommendations for use by others in nonresearch settings. For example, in research, scores are often transformed for statistical analyses and may be of little to no use to school-based practitioners. In our review, we found it was not often stated explicitly but should be: If measures are ready for public use, clear guidelines on test administration and scoring procedures should be provided; otherwise, studies should state the measures are not yet ready for public use. Similarly, if a measure is only intended to be used for self-reflection and personal growth (e.g., Flores [2009, 2011] and Waddell [2014] measures), researchers should advise against using the tool for summative evaluation or research purposes. Once the field begins to consistently report intended use arguments, we will be better positioned to discuss appropriate levels of reliability and validity in the context of measure use.

Extrapolation

Our comparative analysis revealed that the field is referencing CRP in very different ways and, in turn, making inferences based on very different indicators. Remarkably, all the studies included in our synthesis (apart from those by the same first author) used different terms to describe their CRP-related constructs. This level of variation in terminology is more common in educational research when describing topics related to inequities in school (see Berkowitz et al., 2017). We reviewed the content of the 27 CRP measures for commonalities across the ways in which CRP was operationalized in the quantitative literature. Below we note select constructs that emerged frequently in the extant CRP measurement literature, though these themes do not encompass all constructs reported in Table 2; nor are they intended to comprehensively represent all CRP concepts in their full depth and complexity.

Making sociocultural connections in the classroom during both instructional and noninstructional time was a core tenet across student-report (e.g., Byrd [2017] and Dickson [2016] measures) and observer-report measures (e.g., Saldana [1997] and Powell [2016] measures), with noticeably less focus among teacher-report measures (except for Siwatu’s [2011, 2017] measures). The presence of this theme across extant CRP measures suggests that the measurement field has placed value on applying CRP principles to curricular content (i.e., what is taught), which is consistent with school-based efforts to recenter the narratives of historically marginalized groups through revised curricula (e.g., Griffin & James, 2018; Utt, 2018). A focus on instruction and curriculum may be indicative of a greater trend that prioritizes measuring teacher practice over teacher disposition, given the field’s increased understanding of unconscious biases and other threats to data validity.

The topic of increased intergroup contact (Tropp & Saxena, 2018) was evident in half of the observation measures (i.e., Curenton [2019], Jensen, Grajeda, et al. [2018], Flores [2009], and Saldana [1997] measures), where observers were instructed to capture intergroup contact in regard to race, gender, or another phenotypic trait between students within the observed classroom. Students were also asked about intergroup contact in schools, but with less of focus on friendship-making and more geared toward students’ feelings of belonging (i.e., Marx [2012] measure) and teachers’ role in conflict resolution between students of different racial-ethnic backgrounds (i.e., Kumar [2019] measure). This line of questioning in the student-reports may have been driven by developmental perspectives given these measures are designed for administration to secondary-aged students.

Interestingly, intergroup contact also was discovered in teacher-report measures with more dated measures asking teachers about their own formative experiences with people from different racial and ethnic backgrounds, including cross-racial childhood friendships and exposure via travel (e.g., Guyton [2005] and Sparks [1996] measures). In contrast, the teacher-report measures developed within the past decade asked about intergroup contact in reference to teachers’ own confidence in their ability to connect and foster relationships between diverse students (i.e., Flores [2011], Siwatu [2017] measures). The conceptual differences within this theme raise interesting questions about the theory of change behind CRP’s effectiveness, including which of the constructs extrapolated from our review are reflective of CRP itself versus precipitators (e.g., teachers’ childhood experiences) or outcomes (e.g., students’ feelings of belonging) of CRP.

The critical appraisal of power, including hegemonic whiteness (Tevis et al., 2022), within CRP is a defining feature that sets the construct apart from other types of differentiated instruction and was captured in the Byrd (2017) and Curenton (2019) measures. Teachers’ critical consciousness is a two-part process of critical reflection and critical action (Jemal, 2017). As student-report and observation measures, respectively, the Byrd (2017) and Curenton (2019) measures represent the complexity of CRP measurement and the need for multiple informants to weigh in on teachers’ critical reflection (e.g., internal beliefs) and critical action (e.g., behaviors, interactions, instructional strategies). We refer to this CRP theme as critical consciousness; however, it is also related to antiracism, which we discuss further in the future directions section due to the notable absence of mentions of race across the extant CRP measurement literature.

Numerous teacher-report measures, spanning across nearly three decades of research attempted to capture whether teachers approached student diversity with humility (e.g., Cherng [2019], McKoy [2013], and Thompson [2009] measures). This theme also appeared in observational CRP measures (e.g., Powell [2016] and Jensen, Grajeda, et al. [2018] measures), though unsurprisingly student-report measures did not focus on this area. In another form of redressing power dynamics in the classroom, and the action-based complement to a culturally humble disposition, support for student agency (e.g., distributing classroom roles and instructional leadership opportunities) was evident in the Jensen, Grajeda, et al. (2018) observational measure.

Across measures, we observed that, similar to general educational measurement tools, extant CRP measures are designed to distinguish who teachers are and their worldviews (e.g., attitudes, beliefs, dispositions) from what they do (e.g., practice, interactions, instruction). However, the assumption that changing teachers’ mindsets and beliefs will influence their behaviors and CRP enactment has been theorized (e.g., Warren, 2018), but not yet demonstrated in the extant literature, in part due to the lack of validated CRP measures to capture intervention effects. In addition, some student report measures may be conflating CRP with students’ perceptions of the learning environment or students’ own cultural competence (i.e., outcomes of CRP). Overall, when considering the reviewed measures in this light, it appeared that measures may be confounding the precursors, instantiation, and outcomes of CRP. Future research should seek to clarify the causal and temporal associations between various CRP-related constructs.

Use and Interpretation

While the traditional validation of CRP measures through statistical metrics of reliability (e.g., Cohen’s alpha) and validity (e.g., CFA) is helpful in establishing some criteria for psychometric soundness, often even state-of-the-art measures of teaching in general fail to meet strict classical test theory psychometric reliability and validity criteria. For example, the Gates Foundation spent over $500 million on the Measures of Effective Teaching study, and their measures of general teaching quality only reached low to moderate reliability and low evidence of validity based on the rigorous standards used in our quality analysis (Kane & Staiger, 2012; Kane et al., 2014; Stecher et al., 2018). To be clear, the failure to meet validity and reliability standards is not unique to CRP measures; nearly all K–12 teaching measures share this weakness.

In addition, careful consideration of the evidence in support of the measures’ intended use and score interpretation is an important and often overlooked step in assessing the validity of a measure (Kane, 2013). When reviewing the results of this stage in our analyses, many scholars envisioned their CRP measures as potentially serving multiple purposes (e.g., to measure intervention effectiveness, preservice teacher performance) with aspirations for applications in practice and/or research. The potential for such applications should be tested through rigorous outcome evaluation or effectiveness trials in future research. High-inference hypotheses have the potential to be societally impactful, but substantiating these findings comes with a high burden of evidence (Bell, 2012; Kane, 2013). We recommend that researchers (re)state the measure’s intended use at each stage of development and provide the relevant information on reliability and validity for that use. We also recommend scholars take a focused approach, as was done by Cherng and Davis (2019), who identified their measure as appropriate for research use (e.g., to contribute to the knowledge base) at the time of publication. In fact, it was those measures that reported more narrow uses that tended to include at least partial evidence of validity in the study findings. We recommend a mixed-methods approach to document how, or if, a given CRP measure is being used and interpreted in practice. Both qualitative data (e.g., focus groups, participant quotes, teacher testimonials) and quantitative data (e.g., improved grades, fewer office discipline referrals) would provide users with evidence of a measure’s practical applicability and effect on theorized CRP-related outcomes.

Future Directions

Taken together, our findings indicated that there were major differences in CRP-related terminology and measurement constructs across the studies reviewed. Promisingly, publications continue to emerge introducing new CRP measures to the field. We are encouraged, for example, by Jensen and colleagues’ (2023) update to the Jensen, Whiting, et al. (2018) teacher-report measure, including improvements in their reporting of validity evidence. To better understand the processes that underlie CRP-driven changes in students, teachers, and classrooms, it is necessary to produce empirical evidence over multiple years, across different educational settings, and with diverse student and teacher populations. The field would benefit from future research delineating a clear theory of change, including which of these constructs encompasses the act of CRP itself versus what may instead be the outcomes of CRP. Below, we consider several additional recommendations for future research to deepen our understanding and measurement of CRP.

Absence of Race, Power, and Identity Foci

Many have postulated that CRP-related constructs may precede more advanced forms of critical action, such as demonstration of antiracist behavior (e.g., Cherng & Davis, 2019; Curenton et al., 2022); however, in the absence of a theoretically sound process model, the field remains limited in its ability to accurately propose and test hypothesized ways to increase the use of CRP in U.S. PK–12 classrooms. For example, the field would benefit from future research on how teachers’ own racial ethnic identity development, much of which may be formed in childhood and adolescence (Umaña-Taylor & Rivas-Drake, 2021), affects their self-assessment of CRP. Moreover, only a handful of measures sought to explicitly capture information about teachers’ racial biases and the impacts of institutional racism in the classroom. Whereas items about race have been in circulation from early measures (e.g., “The discussion of racial and ethnic subjects is inappropriate at the elementary level”—a reverse-scored item from the Scott [2001] measure), very few measures center students of color and the injustices they may face in the classroom. The exception to this is the Curenton (2019) measure, which is specifically focused on the experience of “racially minoritized learners.”

To some, in the current sociopolitical climate, it may seem counterintuitive to encourage measures of CRP that explicitly reference racial identity/socialization, racial biases, and systemic racialized inequality. However, explicit inclusion of race in measures of CRP may provide meaningful data to move the conversation forward regarding critical race theory–related topics in schools. Specifically, psychometrically sound measurement has the potential to produce data that is reflective of the possible benefits of CRP for students of color and for White students. Such empirical support has the long-term potential to debunk the disinformation that “critical race theory” harms White students’ psychological safety in school. In future work, the field would benefit from studies that approach CRP measurement development through a critical, postpositivist lens.

Need to Triangulate Multi-Informants

Interestingly, woven throughout these themes is a mix of observable teacher behaviors, internalized teacher dispositions, and teachers’ beliefs about their own efficacy. The results revealed unsurprising findings that teacher self-report surveys typically ask about a teachers’ beliefs, attitudes, awareness, and self-efficacy related to CRP. However, within teacher disposition measures, there are also some items that ask teachers to report on their actions (e.g., “I frequently invite extended family members to parent teacher conferences”—an item from Ponterotto’s [1998] measure). This raises an interesting question—What would the family members of this teacher’s students report? How about the students themselves? Or an outside observer (e.g., family social worker) or school record (e.g., family-school communication log)? Although there are certainly logistical barriers to obtaining multiformat reports, there have been calls for a multi-informant approach from professional organizations (e.g., AERA, 2014) as well from scholars who recognize that their single measure provides just one perspective (e.g., Cherng et al., 2019; Debnam et al., 2015; Guyton & Wesche, 2005). Some of the studies included in this review attempt to utilize a multi-informant approach but are working with teacher- and family-report measures that are also still in development (e.g., Powell et al., 2016).

Multi-informant approaches also allow for validity evidence to be provided at multiple levels of the individual, classroom, and school context (e.g., Jensen, Grajeda, et al., 2018). If employing (or developing) a multiformat test battery, it is important to consider whether the measures capture the construct of CRP, other related outcomes, or both (e.g., an observational measure of CRP and a student-report on school belonging). The conflation of practice versus outcome appears particularly prevalent when considering measures across reporters, suggesting that teachers may be well–positioned to assess CRP precursors, observers may be best positioned to assess CRP instantiation in the classroom, and students may be optimally positioned to assess proximal outcomes of CRP.

Limitations

When developing our search strategy, we included CRP-related terminology from scholarly and practice-based sources and vetted these terms as a research team. Nonetheless, even since sourcing within the past year, we likely excluded studies due to missed search terms. For future reviews, we encourage others to consider the terms ethnic studies, asset-based, Latin*. The rapidly evolving terminology related to CRP between 2020 and 2023, before which CRP nomenclature was already quite expansive, imposed challenges that we addressed well in this study, but could not fully eliminate.

It is likely that there is a publication bias against publishing measures with null findings, as the measures with strongest psychometric properties are most likely to be published. The strict adherence to our inclusion/exclusion criteria (e.g., peer-reviewed, psychometric reporting) may have prevented us from capturing practice-based measures that are not prevalent in the extant literature but that are regularly used in the field. For example, measurement research can be difficult to conduct given that it requires both theoretical and methodological expertise, which can be difficult to come by in practice-based psychology and education programs (Fried & Flake, 2018), so measures currently used in the field may not be documented in the peer-reviewed literature. The implications of this “file drawer problem” are particularly complex for the CRP measurement field given the importance of psychometrically strong measures to rigorously assess and link CRP to positive academic outcomes for racially minoritized students (Lishner, 2021; Sleeter, 2012).

We also relied on commonly accepted structural equation modeling (SEM) model fit criteria (Hu & Bentler, 1999), though we recognize that this standard does not always apply depending on estimation methods (Xia & Yang, 2019) and depending on assessment type norms (e.g., observer report norms given high-inference nature of observation). As such, we aimed to balance these considerations by allowing flexibility in establishing a “moderate” characterization by only requiring that a subset of these criteria be met. That is, an RMSEA of .06 or less was not required to establish moderate validity. Some studies met many, but not all inclusion criteria (i.e., “near misses”). For example, both the Culturally Relevant and Responsive Education Observation Coding Scheme for Professional Development Sessions (Patton, 2011) and the Alaska Department of Education and Early Childhood Development Revised Rubric (Sigman et al., 2014) were excluded because they are observational measures that assess the content of professional development sessions rather than real classroom-based interactions. In a similar near miss, the School Interracial Climate Scale (Green et al., 1988) was excluded due to lack of specificity to the classroom setting; only 10 out of 61 questions on the scale referenced teachers. Because these 10 questions are not clustered into a single teacher-focused subscale, the measure was excluded, as it focused heavily on the actions of the school’s principal, assistant principal, and student body.

The primary exclusion criterion in the third eligibility phase was absence of psychometric reporting. In the review process, the Cultural Diversity Awareness Inventory (CDAI; Henry, 1986) was utilized by eight secondary studies; however, no psychometrics were reported in the CDAI’s original publication. To remain systematic and objective in our inclusion/exclusion criteria, we thus excluded the CDAI due to its lack of initial psychometrics; the alternative of searching all other measures for secondary reporting of psychometrics was not feasible. The extraction of data exclusively from original studies limited our ability to report updated psychometric findings; for example, Goldberg and colleagues (2023) recently replicated the validation of the Curenton (2019) measure using classical test theory metrics, but these findings were not included because they are from a secondary study. More studies were excluded due to non-reported psychometrics, such as the Local Systemic Change Classroom Observation Protocol (Johnson, 2011) and the Diversity Orientations Survey (Taylor et al., 2016).

The greatest risk of bias in the selected studies stemmed from the lacks-relevance criterion, which excludes measures intended to measure the culturally responsive classroom experience of a specific racial, ethnic, or cultural subpopulation. Our intention was to find measures with potential usability in racially and ethnically heterogeneous classrooms. However, the exclusion of subgroup-specific measures admittedly narrows the scope of the synthesized measures given the prevalence of racial segregation between schools in the United States today (Frankenberg et al., 2019). Because we recognize the importance of these specialized measures for educators looking to support specific local groups in their community, particularly given this country’s predominantly White teaching force, we have included these unique measures in Supplemental Table SI (available online). The online supplemental table features measures that are designed to capture the classroom experience of minoritized linguistic groups (e.g., Native Languages and Cultures; Van Ryzin et al., 2016), racial groups (e.g., Scale of Teacher Empathy for African American Males; Warren, 2015), ethnic groups (e.g., Concerns Teaching Latino Students Survey; Anhalt & Rodríguez Pérez, 2013), and ability groups (e.g., Multicultural and Special Education Survey; Utley, 2011).

CRP measures often are used for two purposes: (a) to motivate, inform, and scaffold teacher professional development in the area of teacher culturally responsive practices; and (b) to assess the impact of interventions designed to improve teachers’ use of culturally responsive practices. Regarding the former, the AERA et al. (2014) standards for educational and psychological testing describe standards for Workplace Testing and Credentialing. To our knowledge, this is not a current use of CRP measures, and so we did not hold CRP measures in this review against these standards. In the future, it may be relevant to assess CRP measures in the context of credentialing standards; however, in its current state, the field requires more integrative scholarship that assesses both the empirical and critical theoretical grounding of practice-based measures.

Conclusions and Implications

Taken together, the results of this systematic review indicated that there are 27 extant measures of CRP in the research literature with some evidence of validity and reliability, but most do not provide evidence to support their interpretation and use. In addition, only a few tap student perspectives of teacher practices. The number of teacher-report measures was nearly twice as many as the observer report and almost four times as many as the student-report measures, suggesting an overreliance on teacher report to measure a construct that is particularly prone to impression management in contemporary educational settings. Moreover, across measure types, the quality analysis demonstrated that measures capture some but not all aspects of CRP, and that this differed based on the reporter, suggesting that more work is needed to establish which informants are best suited to report on which aspects of CRP. A key implication of this systematic review is that the field would benefit greatly from coming to a consensus on how we should measure CRP in our research studies and school-based practice.

Most of the records on CRP retrieved for review in the initial eligibility phase did not include a measure of CRP; the excluded studies were either qualitative or quantitative studies with non-CRP outcome measures. These findings are partially reflective of the foundational qualitative research that led to the identification of CRP theory. However, the lack of existing CRP measures is suggestive of an oversaturation of theoretical work within the field relative to quantitative research measuring PK–12 CRP; the latter is critical to evaluate teacher-focused intervention effectiveness (e.g., Bottiani et al., 2018). Once the field can come to a greater consensus on how to measure CRP in the classroom setting, we may be better able to determine if teachers’ classroom CRP has a positive impact on students’ social-emotional and academic outcomes (Sleeter, 2012). Moreover, gaining additional clarity on best practices for CRP measurement will allow the field to assess how well measures work across demographically diverse student bodies (e.g., heterogeneous vs. homogeneous class compositions).

A challenge in the field continues to be that the terminology related to CRP is so expansive and often changes, which can prevent researchers and practitioners from identifying the instruments that would be most useful or theoretically aligned. CRP measures should be used with consideration to informant limitations; however, they may prove useful in combination with one another to capture complexities across the CRP-related constructs and domains highlighted in our synthesis. We recommend additional research to strengthen the existing measures in the field. Specifically, observational research could benefit from more attention to generalizability theory, as too few of the reviewed studies reported G-study data to formulate meaningful conclusions; this is clearly an area requiring additional investigation and measurement of CRP (see Hill et al., 2012). Moreover, there should be more attention to student-report measures, and teacher-report studies could benefit from triangulating with student- and observer-report to gain a more complete picture of what CRP is happening in the classroom. Another key finding of this study was the value of using an asset-based, multimethod approach for high-quality measurement development. By integrating both qualitative and quantitative phases into CRP measurement development, researchers can align themselves with an equity-focused perspective on educational measurement.

Supplemental Material

sj-docx-1-rer-10.3102_00346543231208720 – Supplemental material for Assessing Teachers’ Culturally Responsive Classroom Practice in PK–12 Schools: A Systematic Review of Teacher-, Student-, and Observer-Report Measures

Supplemental material, sj-docx-1-rer-10.3102_00346543231208720 for Assessing Teachers’ Culturally Responsive Classroom Practice in PK–12 Schools: A Systematic Review of Teacher-, Student-, and Observer-Report Measures by Meredith P. Franco, Jessika H. Bottiani and Catherine P. Bradshaw in Review of Educational Research

Supplemental Material

sj-docx-2-rer-10.3102_00346543231208720 – Supplemental material for Assessing Teachers’ Culturally Responsive Classroom Practice in PK–12 Schools: A Systematic Review of Teacher-, Student-, and Observer-Report Measures

Supplemental material, sj-docx-2-rer-10.3102_00346543231208720 for Assessing Teachers’ Culturally Responsive Classroom Practice in PK–12 Schools: A Systematic Review of Teacher-, Student-, and Observer-Report Measures by Meredith P. Franco, Jessika H. Bottiani and Catherine P. Bradshaw in Review of Educational Research

Footnotes

ORCID iDs

Meredith P. Franco

Jessika H. Bottiani

Catherine P. Bradshaw

Notes

Authors

MEREDITH P. FRANCO is a postdoctoral research associate at the University of Virginia’s School of Education and Human Development, PO Box 400281, 405 Emmet Street South, Charlottesville, VA 22904; e-mail: Meredith.Franco@virginia.edu . Informed by lived experience as a public school educator and clinical-school psychologist, her research focuses on how school-based professionals can champion education and health equity. She has specific interests in the well-being and safety of historically marginalized children and adolescents, who are often systemically blocked from accessing high-quality education and healthcare in the United States.

JESSIKA H. BOTTIANI is a research associate professor of education at the University of Virginia’s School of Education and Human Development, Ridley Hall 228, 405 Emmet Street South, Charlottesville, VA 22904; e-mail: Jessika.Bottiani@virginia.edu . Her research focuses on teachers’ efforts to foster students’ emotional safety and agentic engagement in learning, with an emphasis on supporting relationships across lines of difference through effective use of strengths-based, culturally sustaining, restorative, and critically conscious practices.

CATHERINE P. BRADSHAW is a university professor and the senior associate dean for research at the School of Education and Human Development at the University of Virginia, Bavaro Hall 112-D, 417 Emmet Street South, Charlottesville, VA 22904; e-mail: Catherine.Bradshaw@virginia.edu . Her primary research interests focus on the development of aggressive behavior and school-based prevention of behavioral and mental health problems in schools, with particular interest in positive behavior supports, social and emotional learning, and culturally responsive practices.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). The standards for educational and psychological testing (3rd ed.). Washington, DC: American Educational Research Association. https://www.testingstandards.net/uploads/7/6/6/4/76643089/standards_2014edition.pdf

Anhalt

C. O.

Rodríguez Pérez

M. E.

(2013). K-8 teachers’ concerns about teaching Latino/a students. Journal of Urban Mathematics Education, 6(2), 42–61. https://eric.ed.gov/?id=EJ1085786

Apfelbaum

E. P.

Norton

M. I.

Sommers

S. R.

(2012). Racial color blindness: Emergence, practice, and implications. Current Directions in Psychological Science, 21(3), 205–209. https://doi.org/10.1177/0963721411434980

Aronson

Laughter

(2016). The theory and practice of culturally relevant education: A synthesis of research across content areas. Review of Educational Research, 86(1), 163–206. https://doi.org/10.3102/0034654315582066

Aronson

Meyers

Winn

(2020). “Lies my teacher [educator] still tells”: Using critical race counternarratives to disrupt whiteness in teacher education. Teacher Educator, 55(3), 300–322. https://doi.org/10.1080/08878730.2020.1759743

Arunkumar

Midgley

Urdan

(1999). Perceiving high or low home-school dissonance: Longitudinal effects on adolescent emotional and academic well-being. Journal of Research on Adolescence. https://doi.org/10.1207/s15327795jra0904_4

(2017). When multicultural education is not enough. Multicultural Perspectives, 19(3), 147–150. https://doi.org/10.1080/15210960.2017.1331741

Banks

J. A.

(1993). Multicultural education: Historical development, dimensions, and practice. Review of Research in Education, 19, 3–49. https://doi.org/10.2307/1167339

Bell

C. A.

Gitomer

D. H.

McCaffrey

D. F.

Hamre

B. K.

Pianta

R. C.

(2012). An argument approach to observation protocol validity. Educational Assessment, 17(2–3), 62–87. https://doi.org/10.1080/10627197.2012.715014

10.

Berkowitz

Moore

Astor

R. A.

Benbenishty

(2017). A research synthesis of the associations between socioeconomic background, inequality, school climate, and academic achievement. Review of Educational Research, 87(2), 425–469. https://doi.org/10.3102/0034654316669821

11.

Bottiani

J. H.

Larson

K. E.

Debnam

K. J.

Bischoff

C. M.

Bradshaw

C. P.

(2018). Promoting educators’ use of culturally responsive practices: A systematic review of in-service interventions. Journal of Teacher Education, 69(4), 367–385. https://doi.org/10.1177/0022487117722553

12.

*Brabeck

M. M.

Rogers

L. A.

Sirin

Henderson

Benvenuto

Weaver

Ting

(2000). Increasing ethical sensitivity to racial and gender intolerance in schools: Development of the racial ethical sensitivity test. Ethics & Behavior, 10(2), 119–137. https://doi.org/10.1207/S15327019EB1002_02

13.

Brennan

R. L.

(2010). Generalizability theory and classical test theory. Applied Measurement in Education, 24(1), 1–21. https://doi.org/10.1080/08957347.2011.532417

14.

Brown

T. A.

(2015). Confirmatory factor analysis for applied research (2nd ed.). Guilford Publications. https://psycnet.apa.org/record/2015-10560-000

15.

*Byrd

C. M.

(2017). The complexity of school racial climate: Reliability and validity of a new measure for secondary students. British Journal of Educational Psychology, 87(4), 700–721. https://doi.org/10.1111/bjep.12179

16.

Carter

D. J.

(2005). “In a sea of White people”: An analysis of the experiences and behaviors of high -achieving Black students in a predominantly white high school [EdD, Harvard University]. ProQuest Dissertations & Theses Global. https://www.proquest.com/docview/305002024/abstract/404A92F527C44BE0PQ/1

17.

*Cherng

H.-Y. S.

Davis

L. A.

(2019). Multicultural matters: An investigation of key assumptions of multicultural education reform in teacher education. Journal of Teacher Education, 70(3), 219–236. https://doi.org/10.1177/0022487117742884

18.

Cohen

. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104

19.

Compton-Lilly

Ellison

T. L.

Perry

K. H.

Smagorinsky

(2021). Whitewashed critical perspectives: Restoring the edge to edgy ideas. Routledge.

20.

Creswell

J. W.

Miller

D. L.

(2000). Determining validity in qualitative inquiry. Theory Into Practice, 39(3), 124–130. https://doi.org/10.1207/s15430421tip3903_2

21.

Cronbach

L. J.

Meehl

P. E.

(1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957

22.

*Curenton

S. M.

Iruka

I. U.

Humphries

Jensen

Durden

Rochester

S. E.

Sims

Whittaker

J. V.

Kinzie

M. B.

(2019). Validity for the Assessing Classroom Sociocultural Equity Scale (ACSES) in early childhood classrooms. Early Education and Development. https://doi.org/10.1080/10409289.2019.1611331

23.

Curenton

S. M.

Rochester

S. E.

Sims

Ibekwe-Okafor

Iruka

I. U.

García-Miranda

A. G.

Whittaker

(2022). Antiracism defined as equitable sociocultural interactions in prekindergarten: Classroom racial composition makes a difference. Child Development, 93(3), 681–698. https://doi.org/10.1111/cdev.13779

24.

*D’Andrea

Daniels

Noonan

M. J.

(2003). New developments in the assessment of multicultural competence: The Multicultural Awareness-Knowledge-Skills Survey–Teachers Form. In Handbook of multicultural competencies: In counseling & psychology (pp. 154–167). Sage Publications. https://doi.org/10.4135/9781452231693.n10

25.

*Debnam

K. J.

Pas

E. T.

Bottiani

Cash

A. H.

Bradshaw

C. P.

(2015). An examination of the association between observed and self-reported culturally proficient teaching practices. Psychology in the Schools, 52(6), 533–548. https://doi.org/10.1002/pits.21845

26.

DeCarlo

(2018). 7.3 Unit of analysis and unit of observation. In Scientific inquiry in social work. Open Social Work Education. https://scientificinquiryinsocialwork.pressbooks.com/chapter/7-3-unit-of-analysis-and-unit-of-observation/

27.

*Dickson

G. L.

Chun

Fernandez

I. T.

(2016). The development and initial validation of the student measure of culturally responsive teaching. Assessment for Effective Intervention, 41(3), 141–154. http://dx.doi.org/10.1177/1534508415604879

28.

DiStefano

Hess

(2005). Using confirmatory factor analysis for construct validation: An empirical review. Journal of Psychoeducational Assessment, 23(3), 225–241. https://doi.org/10.1177/073428290502300303

29.

Dixson

A. D.

(2021). But be ye doers of the word: Moving beyond performative professional development on culturally relevant pedagogy. Educational Forum, 85(4), 355–363. https://doi.org/10.1080/00131725.2021.1957633

30.

Dover

A. G.

(2013). Teaching for social justice: From conceptual frameworks to classroom practices. Multicultural Perspectives, 15(1), 3–11. https://doi.org/10.1080/15210960.2013.754285

31.

Drennan

(2003). Cognitive interviewing: Verbal data in the design and pretesting of questionnaires. Journal of Advanced Nursing, 42(1), 57–63. https://doi.org/10.1046/j.1365-2648.2003.02579.x

32.

Ercan

Yazici

Sigirli

Ediz

Kan

(2007). Examining Cronbach alpha, theta, omega reliability coefficients according to sample size. Journal of Modern Applied Statistical Methods, 6(1), 291–303. https://doi.org/10.22237/jmasm/1177993560

33.

Esteban-Guitart

Moll

L. C.

(2014). Funds of identity: A new concept based on the Funds of Knowledge approach. Culture & Psychology, 20(1), 31–48. https://doi.org/10.1177/1354067X13515934

34.

*Flores

B. B.

Casebeer

C. M.

Riojas-Cortez

(2011). Validation of the Early Childhood Ecology Scale–Revised: A reflective tool for teacher candidates. Journal of Early Childhood Teacher Education, 32(3), 266–286. https://doi.org/10.1080/10901027.2011.594487

35.

*Flores

B. B.

Riojas-Cortez

(2009). Measuring early childhood teacher candidates’ conceptualizations of a culturally responsive classroom ecology. Journal of Classroom Interaction, 44(2), 4–13. https://doi.org/10.1080/02615479.2017.1423049

36.

Frankenberg

Ayscue

J. B.

Orfield

(2019). Harming our common future: America’s segregated schools 65 years after Brown. The Civil Rights Project. https://escholarship.org/uc/item/23j1b9nv

37.

Fried

E. I.

Flake

J. K.

(2018). Measurement matters. APS Observer, 31(3). https://www.psychologicalscience.org/observer/measurement-matters

38.

Garcia

N. M.

López

Vélez

V. N.

(2018). QuantCrit: Rectifying quantitative methods through critical race theory. Race Ethnicity and Education, 21(2), 149–157. https://doi.org/10.1080/13613324.2017.1377675

39.

Gay

(2000). Culturally responsive teaching: Theory, research, and practice (Multicultural Education Series). Teachers College Press. https://eric.ed.gov/?id=ED441932

40.

Gay

(2018). Culturally responsive teaching: Theory, research, and practice (3rd ed.). ( Banks

J. A.

, Ed.). Teachers College Press. https://eric.ed.gov/?id=ED581130

41.

Gay

(2023). Educating for equity and excellence: Enacting culturally responsive teaching. Teachers College Press. https://www.tcpress.com/educating-for-equity-and-excellence-9780807768624

42.

Gillborn

Dixson

Ladson-Billings

Parker

Rollock

Warmington

(2018). Critical race theory in education. Routledge. https://www.routledge.com/Critical-Race-Theory-in-Education-4-vol-set/Gillborn-Dixson-Ladson-Billings-Parker-Rollock-Warmington/p/book/9781138848276

43.

Gilliam

W. S.

Maupin

A. N.

Reyes

C. R.

Accavitti

Shic

(2016). Do early educators’ implicit biases regarding sex and race relate to behavior expectations and recommendations of preschool expulsions and suspensions? Yale University Child Study Center, 9(28), 1–16. http://www.fixschooldiscipline.org/wp-content/uploads/2020/09/9.Early_Educators_Implicit_Bias_Sex_and_Gender-2016.pdf

44.

Gilliam

W.S.

Reyes

C.R.

(2017). Climate of healthy interactions for learning & development, draft manual [Unpublished manuscript]. Edward Zigler Center in Child Development & Social Policy, Yale Child Study Center.

45.

Goldberg

M. J.

Lloyd

D. D.

Syed

Welch

G. W.

Curenton

S. M.

(2022). A validation study of the Assessing Classroom Sociocultural Equity Scale (ACSES) in pre-kindergarten to third grade classrooms. Early Education and Development, 1–24. https://doi.org/10.1080/10409289.2022.2146392

46.

Gorski

P. C.

(2008). Peddling poverty for profit: Elements of oppression in Ruby Payne’s framework. Equity & Excellence in Education, 41(1), 130–148. https://doi.org/10.1080/10665680701761854

47.

Green

C. W.

Adams

A. M.

Turner

C. W.

(1988). Development and validation of the school interracial climate scale. American Journal of Community Psychology, 16(2), 241–259. https://doi.org/10.1007/BF00912525

48.

Griffin

James

(2018). Humanities curricula as White property: Toward a reclamation of Black creative thought in social studies & literary curricula. Multicultural Education, 25, 10–17. https://eric.ed.gov/?id=EJ1198278

49.

Grimm

(2010). Social desirability bias. In Wiley international encyclopedia of marketing. John Wiley and Sons. https://doi.org/10.1002/9781444316568.wiem02057

50.

Gutiérrez

K. D.

Rogoff

(2003). Cultural ways of learning: Individual traits or repertoires of practice. Educational Researcher, 32(5), 19–25. https://doi.org/10.3102/0013189X032005019

51.

*Guyton

E. M.

Wesche

M. V.

(2005). The Multicultural Efficacy Scale: Development, item selection, and reliability. Multicultural Perspectives, 7(4), 21–29. https://doi.org/10.1207/s15327892mcp0704_4

52.

Haertel

E. H.

(2018). Tests, test scores, and constructs. Educational Psychologist, 53(3), 203–216. https://doi.org/10.1080/00461520.2018.1476868

53.

Hayes

A. F.

Coutts

J. J.

(2020). Use omega rather than Cronbach’s alpha for estimating reliability. But…. Communication Methods and Measures, 14(1), 1–24. https://doi.org/10.1080/19312458.2020.1718629

54.

Hazelbaker

Mistry

R. S.

(2021). “Being colorblind is one of the worst things”: White teachers’ attitudes and ethnic-racial socialization in a rural elementary school. Journal of Social Issues, 77(4), 1126–1148. https://doi.org/10.1111/josi.12489

55.

Henry

G. B.

(1986). Cultural Diversity Awareness Inventory = Inventario Sobre el Reconocimiento de Diversas Culturas. https://eric.ed.gov/?id=ED282657

56.

Hill

H. C.

Charalambous

C. Y.

Kraft

M. A.

(2012). When rater reliability is not enough: Teacher observation systems and a case for the generalizability study. Educational Researcher, 41(2), 56–64. https://doi.org/10.3102/0013189X12437203

57.

Holcomb-McCoy

(2021, August 7). The “other CRT”—culturally responsive teaching—can truly make a difference. The Hill. https://thehill.com/opinion/education/566022-the-other-crt-culturally-responsive-teaching-can-truly-make-a-difference

58.

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118

59.

Jemal

(2017). Critical consciousness: A critique and critical analysis of the literature. Urban Review, 49(4), 602–626. https://doi.org/10.1007/s11256-017-0411-3

60.

*Jensen

Grajeda

Haertel

(2018). Measuring cultural dimensions of classroom interactions. Educational Assessment, 23(4), 250–276. https://doi.org/10.1080/10627197.2018.1515010

61.

*Jensen

Whiting

E. F.

Chapman

(2018). Measuring the multicultural dispositions of preservice teachers. Journal of Psychoeducational Assessment, 36(2), 120–135. https://doi.org/10.1177/0734282916662426

62.

Jensen

Whiting

E. F.

Hernández

Zhang

Pliego

Sudweeks

(2023). Becoming equitable educators: Practical measures to support teachers’ dispositional growth. Journal of Teacher Education, 74(4), 299–314. https://doi.org/10.1177/00224871231183090

63.

Johnson

C. C.

(2011). The road to culturally relevant science: Exploring how teachers navigate change in pedagogy. Journal of Research in Science Teaching, 48(2), 170–198. https://doi.org/10.1002/tea.20405

64.

Kane

M. T.

(2006). Validation. In Brennan

(Ed.), Educational measurement, issues and practice (4th ed., pp. 17–64). Westport, CT: American Council on Education/Praeger.

65.

Kane

M. T.

(2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000

66.

Kane

T. J.

Kerr

K. A.

Pianta

R. C.

(Eds.). (2014). Designing teacher evaluation systems: New guidance from the Measures of Effective Teaching Project. Jossey-Bass. https://doi.org/10.1002/9781119210856

67.

Kane

T. J.

Staiger

(2012). Gathering feedback for teaching: Combining high quality observations with student surveys and achievement gains [Research paper]. Bill & Melinda Gates Foundation. http://files.eric.ed.gov/fulltext/ED540960.pdf

68.

Kaplan

L. S.

Owings

W. A.

(2021). Countering the furor around critical race theory. NASSP Bulletin. https://doi.org/10.1177/01926365211045457

69.

Kelley

T. L.

1927. Interpretation of educational measurements. World Book Company. https://psycnet.apa.org/record/1928-00533-000

70.

Kieran

Anderson

(2019). Connecting universal design for learning with culturally responsive teaching. Education and Urban Society, 51(9), 1202–1216. https://doi.org/10.1177/0013124518785012

71.

*Kumar

Karabenick

S. A.

Warnke

J. H.

Hany

Seay

(2019). Culturally Inclusive and Responsive Curricular Learning Environments (CIRCLEs): An exploratory sequential mixed-methods approach. Contemporary Educational Psychology, 57, 87–105. https://doi.org/10.1016/j.cedpsych.2018.10.005

72.

Ladson-Billings

(1995). Toward a theory of culturally relevant pedagogy. American Educational Research Journal, 32(3), 465–491. https://doi.org/10.2307/1163320

73.

Ladson-Billings

(2009). The Dreamkeepers: Successful teachers of African American children. John Wiley & Sons.

74.

Ladson-Billings

(2014). Culturally relevant pedagogy 2.0: A.k.a. the remix. Harvard Educational Review, 84(1), 74–84. https://doi.org/10.17763/haer.84.1.p2rj131485484751

75.

Ladson-Billings

(2021). Three decades of culturally relevant, responsive, & sustaining pedagogy: What lies ahead? Educational Forum, 85(4), 351–354. https://doi.org/10.1080/00131725.2021.1957632

76.

Ladson-Billings

Dixson

(2021). Put some respect on the theory: Confronting distortions of culturally relevant pedagogy. In Compton-Lilly

Ellison

T. L.

Perry

Smagorinsky

(Eds.), Whitewashed critical perspectives: Restoring the edge to edgy ideas (pp. 122–137). Routledge. https://doi.org/10.4324/9781003087632-7

77.

Larson

K. E.

Bradshaw

C. P.

(2017). Cultural competence and social desirability among practitioners: A systematic review of the literature. Children and Youth Services Review, 76, 100–111. https://doi.org/10.1016/j.childyouth.2017.02.034

78.

Liggett

Watson

Griffin

(2017). Language use and racial redirect in the educational landscape of “just good teaching.” Teaching Education, 28(4), 393–405. https://doi.org/10.1080/10476210.2017.1306506

79.

Lishner

D. A.

(2021). Sorting the file drawer: A typology for describing unpublished studies. Perspectives on Psychological Science. https://doi.org/10.1177/1745691620979831

80.

*Love

Kruger

A. C.

(2005). Teacher beliefs and student achievement in urban schools serving African American students. Journal of Educational Research, 99(2), 87–98. https://doi.org/10.3200/JOER.99.2.87-98

81.

Luther

(2009). Celebration and separation: A troublesome approach to multicultural education. Multicultural Perspectives, 11(4), 211–216. https://doi.org/10.1080/15210960903446036

82.

*Marx

Byrnes

(2012). Multicultural school climate inventory. Current Issues in Education, 15(3), 1–14. https://cie.asu.edu/ojs/index.php/cieatasu/article/view/960/796

83.

McDonald

R. P.

(2013). Test theory: A unified treatment. Routledge. https://doi.org/10.4324/9781410601087

84.

McGowan

Sampson

Salzwedel

D. M.

Cogo

Foerster

Lefebvre

(2016). PRESS peer review of electronic search strategies: 2015 guideline statement. Journal of Clinical Epidemiology, 75, 40–46. https://doi.org/10.1016/j.jclinepi.2016.01.021

85.

McHugh

M. L.

(2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276–282. https://doi.org/10.11613/BM.2012.031

86.

*McKoy

C. L.

(2013). Effects of selected demographic variables on music student teachers’ self-reported cross-cultural competence. Journal of Research in Music Education, 60(4), 375–394. https://doi.org/10.1177/0022429412463398

87.

Messick

(1998). Test validity: A matter of consequence. Social Indicators Research, 45(1), 35. https://doi.org/10.1023/A:1006964925094

88.

Metro

(2020). Teaching world history during an uprising for racial justice. Social Education, 84(6), 369–371.

89.

Morrison

K. A.

Robbins

H. H.

Rose

D. G.

(2008). Operationalizing culturally relevant pedagogy: A synthesis of classroom-based research. Equity & Excellence in Education, 41(4), 433–452. https://doi.org/10.1080/10665680802400006

90.

Nasir

N. S.

Rosebery

A. S.

Warren

Lee

C. D.

(2006). Learning as a cultural process: Achieving equity through diversity. In The Cambridge handbook of the learning sciences (pp. 489–504). Cambridge University Press. https://psycnet.apa.org/record/2006-07157-029

91.

National Council for Accreditation of Teacher Education (NCATE). (2008). Professional standards for the accreditation of teacher preparation institutions. Author.

92.

National Education Association (NEA). (2021). Racial justice. NEA EdJustice. https://neaedjustice.org/social-justice-issues/racial-justice/

93.

National Education Association (NEA) Center for Social Justice. (2020). White supremacy culture resources. Author. https://www.nea.org/resource-library/white-supremacy-culture-resources

94.

Ngo

(2010). Doing “diversity” at dynamic high: Problems and possibilities of multicultural education in practice. Education and Urban Society, 42(4), 473–495. https://doi.org/10.1177/0013124509356648

95.

Nieto

(2017). Re-imagining multicultural education: New visions, new possibilities. Multicultural Education Review, 9(1), 1–10. https://doi.org/10.1080/2005615X.2016.1276671

96.

Page

M. J.

Moher

Bossuyt

P. M.

Boutron

Hoffmann

T. C.

Mulrow

C. D.

Shamseer

Tetzlaff

J. M.

Akl

E. A.

Brennan

S. E.

Chou

Glanville

Grimshaw

J. M.

Hróbjartsson

Lalu

M. M.

Loder

E. W.

Mayo-Wilson

McDonald

McGuiness

L. A.

, et al. (2021). PRISMA 2020 explanation and elaboration: Updated guidance and exemplars for reporting systematic reviews. BMJ, 160. https://doi.org/10.1136/bmj.n160

97.

Paris

(2012). Culturally sustaining pedagogy: A needed change in stance, terminology, and practice. Educational Researcher, 41(3), 93–97. https://doi.org/10.3102/0013189X12441244

98.

Paris

Alim

H. S.

(2014). What are we seeking to sustain through culturally sustaining pedagogy? A loving critique forward. Harvard Educational Review, 84(1), 85–100. https://doi.org/10.17763/haer.84.1.982l873k2ht16m77

99.

Parkhouse

C. Y.

Massaro

V. R.

(2019). Multicultural education professional development: A review of the literature. Review of Educational Research, 89(3), 416–458. https://doi.org/10.3102/0034654319840359

100.

Patton

D. C.

(2011). Evaluating the culturally relevant and responsive education professional development program at the elementary school level in the Los Angeles Unified School District. Learning Disabilities: A Contemporary Journal, 9(1), 71–107. https://files.eric.ed.gov/fulltext/EJ925536.pdf

101.

Pearson

(1900). I. Mathematical contributions to the theory of evolution. —VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 195(262–273), 1–47. https://doi.org/10.1098/rsta.1900.0022

102.

Perinelli

Gremigni

(2016). Use of social desirability scales in clinical psychology: A systematic review. Journal of Clinical Psychology, 72(6), 534–551. https://doi.org/10.1002/jclp.22284

103.

Plaut

V. C.

Thomas

K. M.

Hurd

Romano

C. A.

(2018). Do color blindness and multiculturalism remedy or foster discrimination and racism? Current Directions in Psychological Science, 27(3), 200–206. https://doi.org/10.1177/0963721418766068

104.

*Pohan

C. A.

Aguilar

T. E.

(2001). Measuring educators’ beliefs about diversity in personal and professional contexts. American Educational Research Journal, 38(1), 159–182. https://www.jstor.org/stable/3202517

105.

*Ponterotto

J. G.

Baluch

Greig

Rivera

(1998). Development and initial score validation of the teacher multicultural altitude survey. Educational and Psychological Measurement, 58(6), 1002–1016. https://doi.org/10.1177/0013164498058006009

106.

*Powell

Cantrell

S. C.

Malo-Juvera

Correll

(2016). Operationalizing culturally responsive instruction: Preliminary findings of CRIOP research. Teachers College Record, 118(1), 1–46. https://www.tcrecord.org/Content.asp?ContentId=18224

107.

Reinholz

D. L.

Shah

(2018). Equity analytics: A methodological approach for quantifying participation patterns in mathematics classroom discourse. Journal for Research in Mathematics Education, 49(2), 140–177. https://doi.org/10.5951/jresematheduc.49.2.0140

108.

Rodgers

J. L.

Nicewander

W. A.

(2012). Thirteen ways to look at the correlation coefficient. The American Statistician. https://doi.org/10.1080/00031305.1988.10475524

109.

*Saldana

D. C.

Waxman

H. C.

(1997). An observational study of multicultural education in urban elementary schools. Equity & Excellence in Education, 30(1), 40. https://doi.org/10.1080/1066568970300106

110.

Sandoval

D. M.

Ratcliff

A. J.

Buenavista

T. L.

Marín

J. R.

(2016). “White” washing American education: The new culture wars in ethnic studies. Praeger, ABC-CLIO, LLC. https://lccn.loc.gov/2016005000

111.

*Scott

K. A.

Pinto

(2001). Revolutionizing multicultural education staff development: Factor structure of a teacher survey. Equity & Excellence in Education, 34(1), 32–42. https://doi.org/10.1080/1066568010340105

112.

Sigman

Dublin

Anderson

Deans

Warburton

Matsumoto

G. I.

Dugan

Harcharek

(2014). Using large marine ecosystems and cultural responsiveness as the context for professional development of teachers and scientists in ocean sciences. Journal of Geoscience Education, 62(1), 25–40. https://doi.org/10.5408/12-403.1

113.

*Siwatu

K. O.

(2011). Preservice teachers’ sense of preparedness and self-efficacy to teach in America’s urban and suburban schools: Does context matter? Teaching and Teacher Education, 27(2), 357–365. https://doi.org/10.1016/j.tate.2010.09.004

114.

*Siwatu

K. O.

Putman

S. M.

Starker-Glass

T. V.

Lewis

C. W.

(2017). The Culturally Responsive Classroom Management Self-Efficacy Scale: Development and initial validation. Urban Education, 52(7), 862–888. https://doi.org/10.1177%2F0042085915602534

115.

Sleeter

C. E.

(2012). Confronting the marginalization of culturally responsive pedagogy. Urban Education, 47(3), 562–584. https://doi.org/10.1177/0042085911431472

116.

*Sparks

W. G.

Butt

K. L.

Pahnos

(1996). Multicultural education in physical education: A study of knowledges, attitudes and experiences. Physical Educator, 53, 73–86. https://eric.ed.gov/?id=EJ531676

117.

*Stanley

L. S.

(1996). The development and validation of an instrument to assess attitudes toward cultural diversity and pluralism among preservice physical educators. Educational and Psychological Measurement, 56(5), 891–897. https://doi.org/10.1177/0013164496056005017

118.

Stecher

Holtzman

Garet

Hamilton

Engberg

Steiner

Robyn

Baird

Gutierrez

Peet

Brodziak

Los Reyes

Fronberg

Weinberger

Hunter

Chambers

(2018). Improving teaching effectiveness: Final report: The Intensive Partnerships for Effective Teaching through 2015-2016. RAND Corporation. https://doi.org/10.7249/RR2242

119.

Taber

K. S.

(2018). The use of Cronbach’s alpha when developing and reporting research instruments in science education. Research in Science Education, 48(6), 1273–1296. https://doi.org/10.1007/s11165-016-9602-2

120.

Taylor

Kumi-Yeboah

Ringlaben

R. P.

(2016). Pre-service teachers’ perceptions towards multicultural education & teaching of culturally & linguistically diverse learners. Multicultural Education, 23, 42–48. https://eric.ed.gov/?id=EJ1119400

121.

Tevis

T. L.

Martinez

J. G. L.

Lozano

Y. E.

(2022). Disrupting White hegemony: A necessary shift toward adopting critical approaches within the teaching and learning environment. International Journal of Qualitative Studies in Education, 35(4), 341–355. https://doi.org/10.1080/09518398.2022.2035453

122.

Thomas

C. A.

Berry

R. Q.

III . (2019). A qualitative metasynthesis of culturally relevant pedagogy & culturally responsive teaching: Unpacking mathematics teaching practices. Journal of Mathematics Education at Teachers College, 10(1), 21–30. https://doi.org/10.7916/jmetc.v10i1.1668

123.

*Thompson

(2009). The development and validation of the multicultural dispositions index. Multicultural Perspectives, 11(2), 94–100. https://doi.org/10.1080/15210960903028776

124.

Tropp

L. R.

Saxena

(2018). Re-weaving the social fabric through integrated schools: How intergroup contact prepares youth to thrive in a multiracial society (Research Brief No. 13). National Coalition on School Diversity. https://eric.ed.gov/?id=ED603699

125.

Tyler

K. M.

Burris

J. L.

Coleman

S. T.

(2018). Investigating the association between home-school dissonance and disruptive classroom behaviors for urban middle school students. Journal of Early Adolescence, 38(4), 530–553. https://doi.org/10.1177/0272431616678987

126.

Umaña-Taylor

A. J.

Rivas-Drake

(2021). Ethnic-racial identity and adolescents’ positive development in the context of ethnic-racial marginalization: Unpacking risk and resilience. Human Development, 65(5–6), 293–310. https://doi.org/10.1159/000519631

127.

Utley

C. A.

(2011). A psychometric investigation of the multicultural and special education survey: An exploratory factor analysis. Learning Disabilities: A Contemporary Journal, 9(1), 47–70. https://eric.ed.gov/?id=EJ925534

128.

Utt

(2018). A case for decentering whiteness in education: How Eurocentric social studies curriculum acts as a form of White/Western studies. Ethnic Studies Review, 41(1–2), 19–34. https://doi.org/10.1525/esr.2018.411205

129.

Van Ryzin

Vincent

Hoover

. (2016). Initial exploration of a construct representing native language and culture (NLC) in elementary and middle school instruction. Journal of American Indian Education, 55(1), 74–101. https://doi.org/10.5749/jamerindieduc.55.1.0074

130.

Vitriol

J. A.

Moskowitz

G. B.

(2021). Reducing defensive responding to implicit bias feedback: On the role of perceived moral threat and efficacy to change. Journal of Experimental Social Psychology, 96, 104165. https://doi.org/10.1016/j.jesp.2021.104165

131.

*Waddell

L. R.

(2014). Using culturally ambitious teaching practices to support urban mathematics teaching and learning. Journal of Praxis in Multicultural Education, 8(2), 1–21. https://doi.org/10.9741/2161-2978.1069

132.

Warren

C. A.

(2015). Scale of Teacher Empathy for African American Males (S-TEAAM): Measuring teacher conceptions and the application of empathy in multicultural classroom settings. Journal of Negro Education, 84(2), 154–174. https://doi.org/10.7709/jnegroeducation.84.2.0154

133.

Warren

C. A.

(2018). Empathy, teacher dispositions, and preparation for culturally responsive pedagogy. Journal of Teacher Education, 69(2), 169–183. https://doi.org/10.1177/0022487117712487

134.

Watkins

M. W.

(2018). Exploratory factor analysis: A guide to best practice. Journal of Black Psychology, 44(3), 219–246. https://doi.org/10.1177/0095798418771807

135.

Wetzel

M. M.

Vlach

S. K.

Svrcek

N. S.

Steinitz

Omogun

Salmerón

Batista Morales

Taylor

L. A.

Villarreal

(2019). Preparing teachers with sociocultural knowledge in literacy: A literature review. Journal of Literacy Research, 51(2), 138–157. https://doi.org/10.1177/1086296X19833575

136.

Worrell

F. C.

(2022). Who will teach the teachers? Examining implicit bias in the educator workforce. Learning and Instruction, 101518. https://doi.org/10.1016/j.learninstruc.2021.101518

137.

Xia

Yang

(2019). RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods. Behavior Research Methods, 51(1), 409–428. https://doi.org/10.3758/s13428-018-1055-2

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB

Assessing Teachers’ Culturally Responsive Classroom Practice in PK–12 Schools: A Systematic Review of Teacher-,Student-,and Observer-Report Measures

Abstract

Keywords

Operationalizing Culturally Responsive Practice

Utility of Quantitative CRP Measures

Knowledge Gaps and Barriers to Implementation of CRP Measurement

Overview of the Current Study

Method

Search Strategy

Eligibility Criteria

Data Extraction Process

Quality Analysis

Author Reflexivity

Reporting Bias and Certainty Assessment

Results

Aim 1: Scoring and Generalization

Response Options/Test Score

Validity

Reliability

Measurement Invariance

Aim 2: Extrapolation and Interpretation/Use

Units of Observation and Analysis

Construct and Domains

Interpretation and Use

Discussion

Scoring and Generalization

Extrapolation

Use and Interpretation

Future Directions

Absence of Race, Power, and Identity Foci

Need to Triangulate Multi-Informants

Limitations

Conclusions and Implications

Supplemental Material

sj-docx-1-rer-10.3102_00346543231208720 – Supplemental material for Assessing Teachers’ Culturally Responsive Classroom Practice in PK–12 Schools: A Systematic Review of Teacher-, Student-, and Observer-Report Measures

Supplemental Material

sj-docx-2-rer-10.3102_00346543231208720 – Supplemental material for Assessing Teachers’ Culturally Responsive Classroom Practice in PK–12 Schools: A Systematic Review of Teacher-, Student-, and Observer-Report Measures

Footnotes

ORCID iDs

Notes

Authors

References

Supplementary Material