Abstract
This systematic review sheds light on the scientific landscape of critical thinking (CT) in teacher education (TED) by synthesizing how CT in TED has been investigated in terms of selected features of international scholarship on the topic and how CT has been conceptualized in this scholarship. Based on 208 included studies, our results show heterogeneity in targeted educational levels, school subjects, geography and adopted methodologies. A sharp increase in publication rates is also documented. We show further that conceptualizations nested in argumentation theory predominate. Often, scholars offer implicit definitions and use varied, related terminology, blurring CT as a concept. Lastly, explicit clarifications on the role of disciplinary knowledge in CT are often absent. We argue for an emphasis on conceptual clarity in future scholarship on CT in TED, consider the implications of our findings and offer actionable proposals for theory and practice.
Keywords
In recent years, critical thinking (CT) has attracted much attention in education policy, practice, and research. At the policy level, it often features as an indispensable ingredient in effective communication and social responsibility that aids the modern citizen to make good judgments and find responsible solutions to problems (Australian Curriculum, Assessment and Reporting Authority, 2023; Government of British Columbia, 2023; Norwegian Directorate of Education and Training, 2020). In terms of practice, education can be seen as a key platform for nurturing CT, providing children with opportunities to reflect on and respond to the world and, as such, experience what it takes to participate in democracy. Teachers are the key agents who can guide children on this journey. This, in turn, requires that they are themselves critical thinkers who not only understand what CT is and how it can be practiced but also embrace it in their own teaching. By extension, teacher education (TED), understood broadly as the educational and practical training pre-service teachers (PSTs) receive before they enter the profession, has a particular responsibility to support their CT and equip them with the necessary knowledge and practice of how CT can be fostered in their future classrooms.
Parallel to CT’s increasing centrality in educational policy and practice, scholarship on CT has proliferated in recent years (e.g., Abrami et al., 2008, 2015; Puig et al., 2019; Tiruneh et al., 2014). As witnessed by much of the available scholastic literature, the concept is particularly multifaceted. By extension, it has been defined and approached in multiple ways over the years. Coming from a philosophical tradition on CT, Johnson and Hamby (2015) note that given “the sheer quantity of definitions and their obvious differences, an onlooker might be tempted to conclude that there is no inherent meaning to the term: that each author seems to consider that he or she is free to offer a definition that suits them” (p. 417). An attempt at an interdisciplinary consensus view has been offered in 1990 in the so-called Delphi Report (Facione, 1990). CT is here defined as “the purposeful, self-regulatory judgment which results in interpretation, analysis, evaluation, and inference, as well as explanation of the evidential, conceptual, methodological, criteriological, or contextual considerations upon which that judgment is based” (Facione, 1990, p. 3). Yet also this view has been critiqued for its apparent shortcomings (Alexander, 2023; Davies, 2015; Johnson & Hamby, 2015; P. K. Murphy et al., 2023). For example, opting for the label “valued” rather than “critical” thinking, Alexander (2023) notes that while listing key skills and subskills may do theoretically, the Facione definition fails to clarify whether such thinking is predicated on their partial or collective manifestation and, hence, blurs the line for “what ultimately qualifies as valued thinking and what does not” (p. 42).
The “problem of defining critical thinking” and “the overabundance of problematic definitions,” as Johnson and Hamby (2015, p. 417) put it, has obvious practical consequences for in-service and pre-service educational practitioners in charge of fostering CT in their current and future students, but also for teacher educators. For, while in wide currency and educational policy prominence, different definitions may lead to widely different pedagogical approaches as well as assessments. Although definitional and pedagogical plurality may not, at face value, seem problematic, Johnson and Hamby (2015) warn that such a view may blur deep conceptual and also practical incompatibilities. One may indeed ask: Have CT theorists and, arguably, other scholars concerned with empirical research on CT “not been thinking critically enough about the task of defining “critical thinking” (Johnson & Hamby, 2015, p. 419)?
Within higher education alone, a number of reviews exist that synthesize empirical evidence on different aspects of CT, including the effectiveness of CT instruction (Abrami et al., 2008, 2015; Tiruneh et al., 2014), interventional practices across different study fields (Niu et al., 2013; Puig et al., 2019), educational practices more broadly (Dumitru et al., 2018), and the promotion of CT through e-learning (Puig et al., 2020). Some metastudies have specifically reviewed research on CT within TED and addressed different aspects of pre-service teachers’ training in teaching CT (Huang & Sang, 2023; Lorencová et al., 2019; Mpofu & Maphalala, 2017). These different scholastic contributions to the meta-debate on CT coalesce in systematizing primarily different dimensions of CT instruction and practice. To date, only one metastudy (Huang & Sang, 2023) has attempted to untangle some of the conceptual issues by looking at different approaches to CT in TED scholarship and by providing an overview of the different methodologies employed therein. Based on a sample of 43 studies, this included a synthesis of the stated study rationales, research aims, and different types of interventions implemented to date. The authors identified a bifurcate, discipline-conditioned distinction between logical and value-based definitions of CT. Unlike Huang and Sang (2023), our study looks at CT as a concept that crosses disciplinary boundaries and addresses in more depth specific aspects, crucial to rigorous conceptual work, including CT’s theoretical and epistemological underpinnings as well as conceptual clarity. In further contrast to Huang and Sang (2023), and given our specific interests, we did not impose a priori conceptual clarity requirements as this would exclude potentially relevant studies. This approach was inspired by other metastudies interrogating similarly complex concepts such as metacognition and self-regulation (Dinsmore et al., 2008).
Seeing the need for a further exploration of the rocky grounds of CT (Alexander, 2023) and, also, given the key importance of TED in promoting CT in PSTs, as argued above, the primary objective of our systematic review was to shed light on the existing conceptual conundrum by addressing two research questions: 1) how has CT in TED been investigated in terms of key reported features of international scientific literature on this topic? and 2) how has CT been conceptualized in this literature? The first research question aimed at facilitating an understanding of the broader landscape in which scholarship on CT in TED is nested. Taking into account the feasibility of relevant and reliable information extraction, the key reported features were operationalized as 1) educational levels, 2) school subjects PSTs in the included studies were training to teach, 3) geographical locations of studies, 4) adopted methodologies, and 5) publication rates. 1 This concern with a comprehensive foregrounding of a complex field of study was also inspired by other metastudies (e.g., Bubikova-Moan & Sandvik, 2022; Dinsmore et al., 2008; Huang & Sang, 2023; Rapanta & Felton, 2022). The second question targeted an understanding of the ways in which the concept itself has been used in existing empirical research. Building on relevant strands of educational scholarship, as laid out in more detail below, our focal interest was in exploring CT through the following lenses: 1) its theoretical nesting in individual and sociocultural perspectives (Davies, 2015), 2) its conceptual clarity, and 3) its epistemological underpinnings in domain-general and domain-specific views of knowledge. Our secondary objective was that these clarifications may be instrumental in informing future educational scholarship and practice in TED and, arguably, beyond.
Theoretical Grounding
Conceptualizing Critical Thinking: A Meta-Theoretical View
There is a long history of theoretical and, more recently, also empirical interest in CT. While commonly traced to Socrates as the first proponent of CT, different definitions and perspectives on CT abound. In recent years, several attempts have been made to offer a metaview by synthesizing the available conceptualizations in terms of their roots, influences, core elements, and distinguishing features, as well as concerns and issues they may raise (e.g., Davies, 2015; Fisher, 2021; Lai, 2011). Lai’s (2011) typology, for example, distinguishes between major disciplinary traditions in approaching CT as either a philosophical, psychological, or sociological construct. Fisher’s (2021) review, on the other hand, focuses on the modern critical thinking movement, as developed by leading CT thinkers within, primarily, the North American philosophical scholarship. Broadly speaking, Davies’s (2015) model comprises individual aspects of CT, such as argumentation, and sociocultural aspects of CT, such as social conditions and creativity. The model is visualized as a set of concentric circles, where the inner parts, represented by the individual dimensions, gradually expand into the broader, outer rims, represented by the sociocultural aspects. As the model suggests, rather than being mutually exclusive dichotomies, the individual aspects are included as building blocks in the broader sociocultural view. As Davies points out, it is the former rather than the latter that has traditionally received most scholastic attention. By encompassing the sociocultural dimension in his model, Davies explicitly advocates for a broadened conceptualization of CT that attempts to bridge disciplinary distinctions, undergirding, for example, Lai’s (2011) model and other metastudies on CT in TED such as that of Huang and Sang (2023). Given its comprehensiveness, the model is used as a structural and conceptual vantage point in this study and is, therefore, reviewed in more detail later.
However, we wish to underscore that Davies’s (2015) model builds on and maps first and foremost relevant aspects of the theoretical landscape of CT, including the work of its prominent scholars on CT (e.g., Barnett, 1997; Burbules & Berk, 1999; Dewey, 1910; Ennis, 1993; R. W. Paul, 1989; Perkins et al., 1992; Siegel, 1985) and a selection of key scholarly debates of the past five decades. Therefore, we complement it with insights from other existing metaconceptual syntheses, including those of Fisher (2021) and Lai (2011), as well as more recent scholarship on CT, not specifically mentioned by Davies (2015) and coming from varied disciplinary directions, including philosophy and educational psychology. While furnishing the review with an extra layer of cross-disciplinary nuance, this also aims at making it explicit how Davies’s individual–sociocultural continuum as our key theoretical lens comes into dialogue with other scholarly discourses through a rigorous critical interrogation. It is, however, of note that, given the sheer quantity of recent CT scholarship, our selection is necessarily restricted to contributions made primarily at a theoretical metalevel and considered key for the conceptual debate on CT.
The Individual Dimension
At the very heart of his model, Davies (2015) places individual cognitive skills and abilities, most importantly argumentation and its subskills, including analyzing, making inferences, and evaluating evidence. He argues that the skills-based core represents “critical thinking in its purest form” (p. 51), a view that concurs with much other scholarship (Bailin & Battersby, 2016; Blair, 2021; Fisher, 2021; Lai, 2011). In fact, equating the “giving, evaluating and caring about reasons” (Fisher, 2021, p. 17) with the very core of CT can be traced back to Dewey’s (1910) early focus on skillful reasoning in reflective thinking, which continues to send echoes through subsequent conceptualizations. On this view, arguments can be seen as fulfilling a double function in CT—namely as its tools and its very objects (Blair, 2021).
Beyond the core, the model expands into other conceptually related yet cognitively more complex dimensions. Firstly, Davies considers the role of judgments as they feature in major philosophical conceptions of CT, drawing specifically on Ennis’s (1993) much-quoted definition of CT as “reasonable reflective thinking focused on deciding what to believe or do” (p. 180). The attention is on decision-making with the aid of thorough reflection that encompasses and builds on the argumentative core. Yet, the primary focus is the reflective rather than argumentative aspect of CT. To distinguish it from the core, this dimension is termed “the skills-and-judgments view” (Davies, 2015, p. 51), corresponding to Fisher’s (2021) other key definitional layer, namely, “deciding what to do” (p. 17). As both syntheses underscore, the consensus expert view on CT, proposed in the Delphi Report (Facione, 1990), can be seen as rooted in this view.
The emphasis on skillful reasoning as a basis for decisions on action echoes outside of the philosophical scholarship too. For example, working within educational psychology, P. K. Murphy et al. (2014) have argued that the common theme in many definitional attempts is to see CT as pertaining to “reflective judgments regarding beliefs and actions” (p. 563). Noting that the process of arriving at such judgments often remains elusive in these definitions, they argue for connecting critical thinking to analytic reasoning as a way of conceptualizing more clearly the mechanisms that undergird this process. Termed critical-analytical thinking (CAT), they define it as “effortful, cognitive processing through which an individual or group of individuals comes to an examined understanding of something known or believed” (P. K. Murphy et al., 2014, p. 563). As such, this definition propels to the fore a view of reasoned argument as the basic unit of analysis on which judgments rest and that can also aid in building robust models of CT instruction (Felton, 2005).
With reference to key scholarship within educational psychology, Davies (2015) foregrounds metacognition as an additional aspect of the skills-and-judgment view. Fisher (2021) traces the metacognitive emphasis back to R. W. Paul’s (1989) conceptualization of CT, which is predicated on a distinction between a weak and strong view of CT. While the former remains steeped in an uncritical approach to one's own thinking, the latter also puts one's own thinking about thinking under scrutiny. The metacognitive aspect is clearly embedded in the Delphi consensus view (Facione, 1990) and other subsequent scholarship on CT as well (Lau, 2015).
Lastly, Davies (2015) reviews the so-called propensity side of CT, concerned with the notion of CT dispositions to which he refers as “the skills-plus-dispositions view” (p. 55). Conceptualized primarily as “affective states” or “psychological readiness” to think critically, these consist of the various “attitudes, intellectual virtues and habits of mind” that dispose an individual to be a critical thinker” (p. 55). Building on other key scholarship, such as that of Perkins et al. (1992), these dispositions are presented in terms of a four-pronged taxonomy as either 1) “arising in relation to self” (e.g., desire to be well-informed, have intellectual courage or humility); 2) “arising in relation to others” (e.g., open-mindedness, fair-mindedness); 3) “arising in relation to world” (e.g., interest, inquisitiveness); and 4) other (mindfulness, critical spiritedness) (p. 58). Davies also includes in the propensity dimension emotions that may encourage or create conditions for CT, such as by inducing the feeling of uncertainty about something that the critical thinker may need to resolve. As he notes, many of the dispositions mentioned in this taxonomy may in fact be seen as emotion-based. While not going into the same degree of detail and without thematizing emotions specifically as part of the propensity dimension, Lai’s (2011) synthesis also singles out the skills and dispositions distinction as mostly well-established in research, as does Fisher (2021), who, like Davies, maps it explicitly onto the Delphi consensus view (Facione, 1990). Nonetheless, the dispositional aspect of CT has been criticized for implying a static view of what it takes to be a critical thinker and for ignoring its situatedness, be it in terms of the issue(s) one is supposed to think critically about, thus touching upon CT’s epistemological aspect, but also the types of support one receives to become such a thinker, hence calling upon CT’s educational aspect (Alexander, 2023).
The Sociocultural Dimension
Davies’s (2015) model expands further to include what he terms a criticality dimension. Essentially, the emphasis shifts now from critical thinking and reflecting per se (i.e., the individual dimension) to critical acting with specifically moral and ethical connotations, or, in other words, what CT should do rather than what CT should be. Davies labels this “critical thinking as action” (p. 60). Vis-à-vis the individual dimension of CT, Davies sees criticality as broader in scope as well as in its social and educational ambition.
In this expanded sense, CT comes close to the idea of critical citizenship, with questions of social and power relations as lived at its fore. As Davies suggests, educating for CT in this sense also resonates closely with the conception of critical pedagogy. With its roots in continental critical theory, the central concern here is the issue of social justice and change, with Freire (2000, 2004) as one of the main proponents and also one of Davies’s references. Emphasizing critical consciousness in education, Freire distinguished between the so-called transformative teaching that is student-active, oriented toward issues of social and political significance, and aimed at facilitating in-depth understanding as well as personal empowerment, and the so-called banking time, where the focus is on traditional, teacher-led approaches, often associated with rote learning and standardized content. Here, then, CT can be seen as having a distinctly political and transformative dimension in that students, through employing their critical consciousness, may come to recognize the oppressive nature of social structures and ways of liberating themselves from them.
This notion of CT resonates with the term “valued thinking,” recently conceptualized by P. K. Murphy et al. (2023) as an extension of CAT. Aiming to liberate CAT from its hegemonic understanding as a purely cognitive notion, they specifically call on research advocating for a sociological understanding of CT, including, among others, Freire’s philosophy and the related notion of critical literacy. They argue that this is “a conception of CAT (that) requires “reading between lines” to unearth relevant evidence of (such) biases, assumptions, worldviews, or implicit or explicit ideologies” (2023, p. 26). As such, it also maps onto Lai’s (2011) sociological research tradition on CT.
Lastly, the outer rim of Davies’s model also comprises creativity as a distinct, additional dimension. Creativity is here premised on openness to move beyond accepted or preconceived notions about any idea in the recognition that, ultimately, all ideas have an embedded ideological aspect. However, rather than suggesting an untenable version of epistemological relativism, Davies argues that this represents a willingness to appreciate what is un-reconciled or even irreconcilable. His own attempt to include creativity as openness in his model may be seen as representing this stance. In the vast literature on CT, other scholars have also considered creativity as part of CT. Fisher (2021), for example, considers a conceptualization of CT as a type of creative and imaginative thinking. Labeled in his account “critic-creative thinking,” it underscores the inseparable nature of the two concepts. This view was advocated several decades earlier by Bailin (1987), who argued that distinguishing between critical and creative thinking as dichotomies rests on a problematic view of disciplinary knowledge frameworks. Yet, while the extension of CT to critic-creative thinking may propel the creative aspect of CT to the fore, Fisher (2021) notes that the resulting expression has nonetheless “not caught on” in research (p. 23).
The “Individual–Sociocultural Continuum”: Challenges and Tensions
Importantly, Davies’s (2015) concentric model suggests that the sociocultural dimension does not preclude the individual elements but rather subsumes them. Nonetheless, he recognizes that a marriage between the individual and the sociocultural view is not without its challenges and tensions, noting further that “the critical pedagogy movement is largely disinterested in the concerns of the critical thinking movement” (p. 77). With reference to the work of Burbules and Berk (1999), Davies lists a number of potential differences between them, including nuances in their aims, scope, purpose, agenda, and attitude, but also, most importantly on our view, understanding of the wider political and social context as well as the issue of (im)partiality toward this broader context as relevant a priori conditions for CT.
By extension, the individual and sociocultural distinction touches also upon the key question of CT’s epistemological underpinnings as either a domain-general or domain-specific concept in that the former can more readily lend itself to a context-independent understanding of CT as a generic skill while the latter both presupposes and necessitates a view of CT as deeply embedded in the immediate but also broader sociocultural context where such thinking takes place. Also, this issue suggests, on our view, an unresolved epistemological tension.
Referring to the philosophical debate between leading theorists of CT, such as Ennis, McPeck, and others, on the domain-generality versus domain-specificity of CT, Siegel (1985) argued early that the plausibility of the distinction was, in fact, beside the point since “reasons can be both subject-specific and general” (p. 75). Offering a comprehensive review of the debate and the available empirical evidence on the issue, Lai (2011) supports this view and argues that the controversy is far from conclusive, with evidence pointing in both directions. Others have advocated a more clearly delineated view. Bailin et al. (1999), for example, addressed this issue in their discussion of CT as skills, processes, and procedures. While they consider the use of the term “skill” in the sense of indicating proficiency as largely unproblematic, they warn against the generic skills view as faulty, given the importance of background knowledge in the area under critical scrutiny. They also hold that CT is neither a mental process nor a set of general procedures or steps to be learned in more or less ordered ways. They critique this view for being oblivious to the key role of standards, relevant for the particular issue at hand, and also the role of context in making sound judgments. By implication, the general skill, process, and procedure view is deemed reductionist about the complex and diverse nature of problems requiring CT. Bailin et al. (1999) insist that it is the sustained practice of making critical judgment according to relevant standards of performance in a particular knowledge area, as well as a willingness to improve in line with feedback on one’s quality of thinking, that is at the core of developing CT. The view of CT as domain-specific is also propounded in more recent scholarship within educational psychology (Alexander, 2023; Alexander et al., 2011; P. K. Murphy et al., 2023).
In sum, although the outer parts of Davies’s model, particularly creativity, are, as he warns himself, presented as mostly theoretical potentialities that have not been tested empirically, the model represents an attempt at bridging different and seemingly incompatible traditions. On our reading, it is also the most comprehensive attempt at conceptualizing CT in higher education and, arguably, also beyond, available to date. Wary of the above critique, in this paper, we adopt Davies’s individual–sociocultural continuum as a theoretical vantage point and an analytical lens through which the available body of identified studies will be examined and on which we intend to build further. As Burbules and Berk (1999) argue, and as also communicated in Davies’s creativity aspect of CT, this ties in well with the idea of “openness to, and comfort with, thinking in the midst of deeply challenging alternatives” whereby difference becomes a “condition of criticality” and tensions arising out of that difference a valuable, deeply dialogic way of looking ahead (p. 63).
Method
To address our two research questions, as specified above, we conducted a qualitative systematic review, following the meta-synthesis design. A meta-synthesis is a qualitative, integrative type of systematic review that aims at extracting, comparing, and contrasting main concepts or themes in a body of research in order to arrive at their comprehensive synthesis (Saini & Shlonski, 2012; Thorne et al., 2004). As other meta-reviews in the broader systematic review family (Sandelowski & Barroso, 2007), a meta-synthesis includes a number of specific steps that are conducted in a systematic, transparent, and rigorous manner—most importantly: 1) a comprehensive search to locate relevant studies, 2) screening of the identified studies according to a set of predefined inclusion criteria, and 3) analysis of the pool of included studies in line with the review aims. In the following sections, we offer a detailed account of the entire process.
Study Identification
Our initial step included a broad, systematic search for relevant studies in two comprehensive international research databases within education and psychology—ERIC and PsychINFO. Our search string included relevant keywords, truncated (*) and linked together through the Boolean operators AND and OR as follows: (Preservice teacher* OR Student teacher* OR Education student* OR Teacher education) AND (Critical thinking). While Davies’s (2015) model, as our main theoretical vantage point, encompasses other terms, such as argumentation or creativity, as dimensions of CT, these were not included in our search string. Our specific interest was in identifying studies that used CT as a term per se and in investigating it in its various configurations in TED. To avoid too broad or too narrow and, hence, potentially irrelevant searches, our search string was therefore set to include the overarching term CT only.
Wary of the danger of inadequate bibliographic indexing and in line with recommendations for conducting qualitative systematic reviews (see Petticrew & Roberts, 2006), we supplemented our database search with a hand-search in the reference lists of thematically relevant reviews on critical thinking in higher education, published in the last 10 years at the time of the search (Dumitru et al., 2018; Lorencová et al., 2019; Mpofu & Maphalala, 2017; Puig et al., 2019, 2020; Tiruneh et al., 2014). These two steps gave together a total of 1,060 potentially relevant studies.
Screening Procedures
In our next step, all study abstracts were screened according to the following five inclusion criteria:
1) population: pre-service teachers (PSTs) enrolled in TED programs worldwide; given that different countries have different ways of organizing TED, and our interest was in mapping the broad landscape of CT in TED, we did not exclude or impose restrictions on TED models in the identified publications, such as by specifying educational levels or study duration
2) publication period: 2000 and onward, given an increase in policy attention to CT since the turn of the millennium but also considered sufficient to capture trends over time
3) language of reporting: restricted to English, for pragmatic reasons, and given the research team’s joint linguistic command
4) geography: worldwide (no restrictions)
5) study design: peer-reviewed qualitative and quantitative empirical studies published in digitally available scientific journals only; book chapters, books, proceedings publications, and scientific reports were excluded due to uncertainty regarding their quality assurance measures
The initial screening procedure and the removal of duplicates gave 495 studies that were included in the full-text review. This again contained several methodological steps. Firstly, we conducted an extended data extraction that enabled us to gain a broader understanding of the sample. This consisted of the following items: 1) study author(s); 2) publication outlet/journal; 3) year of publication; 4) purpose of study/research questions (if clearly stated); 5) thematic focus; 6) educational level PSTs were trained to teach (if clearly stated); 7) subject(s) PSTs were trained to teach (if clearly stated); 8) geographical context/setting, operationalized as the country in which the research was conducted (if clearly stated); 9) time of study (if clearly stated); 10) methodology adopted, including details on specific methods used; 11) the presence of CT among the stated keywords; and 12) conceptual and theoretical underpinnings of CT, which included providing a short synopsis of how CT was conceptualized in each study, including the presence of specific definitions and referenced work, where applicable.
Secondly, the scientific soundness of each study was appraised. Given the critique levelled at existing study quality appraisal checklists and to ensure a robust enough procedure (Atkins et al., 2008; Saini & Shlonsky, 2012), our appraisal consisted of two steps: 1) the clarity and basic coherence between the study research question(s), theoretical grounding, methods, and outcomes were assessed manually by the research team and 2) all studies were additionally checked against the Norwegian Register for Scientific Journals, Series and Publishers (NRSJSP), which provides updated information on scientific quality of existing publication channels worldwide. 2 For these steps, our inclusion criteria were as follows:
1) Manual quality assessment–only studies that delivered on basic criteria such as a sufficient description of applied methods, theoretical grounding, and results were included. Studies that offered conceptually irrelevant, missing, or poorly described theoretical and methodological grounding or provided unclear reporting of outcomes were excluded. It is of note that, given our interest in conceptual clarity or the lack of it, we did not exclude studies that did not provide explicit definitions of CT or provided multiple definitions without committing to a specific one, as this would have led to the exclusion of potentially relevant studies for our purposes.
2) NRSJSP-only studies published in journals approved as being of sufficient scientific quality (levels 1 and 2) were included.
Following these two steps, 338 studies were excluded on the grounds of insufficient conceptualizations of CT (e.g., Senocak et al., 2007; Tal, 2005) and inadequate study soundness (e.g., Tessier, 2010; Sriraman & Knott, 2009) or because the journal was not approved as being of sufficient scientific quality (e.g., Taspınar, 2007; Theiss et al., 2009) (Table 1). 3
Examples of excluded studies
Given the time lapse between the first search (April 2021) and the finalization of the screening, data extraction, and quality assurance procedure on all 495 full texts (August 2023), it was deemed necessary to reapply the database search in exactly the same manner as described previously, with the exception of the publication period that was this time round set to cover April 2021 onward. In addition, we identified a new relevant metastudy on CT in TED (Huang & Sang, 2023) that was subjected to a reference hand-search. These two steps together resulted in the identification of 183 potentially relevant studies. All abstracts were screened according to the same procedure and by applying the same inclusion criteria. As a result, 82 studies were deemed relevant for inclusion in the full-text review. Again, in line with the procedure described previously, all studies identified in the additional screening round were further reviewed for inclusion, and they were also quality-appraised. This resulted in the exclusion of 31 studies. Together, the first and second screening and inclusion assessment rounds resulted in a final sample consisting of 208 studies. 4 See Figure 1 for the visualization of the entire process.

Adapted PRISMA 2020 flowchart (Page et al., 2021)—Review process.
Analytical Approach—Sample of Included Studies
In the next stage, the final sample was subjected to a two-pronged analysis in line with our research questions. With regards to RQ1, we took as our vantage point the 12-item information extracted during the full-text review (see previous). Given our aim of facilitating an understanding of the broader landscape of CT scholarship on TED, the following items were considered relevant and jointly comprehensive enough to report on:
1) educational level—as an index of PSTs’ preparedness to teach CT across different age levels, given that CT and its components may manifest differently at different developmental levels (Bubikova-Moan & Sandvik, 2022; Kuhn, 1999)
2) school subjects—as an index of PSTs’ preparedness to teach CT across different school subjects and, hence, different fields of knowledge with their specific standards of performance and evaluation, as underscored particularly in domain-specific views of CT (cf. Bailin et al., 1999)
3) geographical location—as an index of where knowledge on CT in TED is generated and, potentially also, disseminated from
4) methodology—as an index of what scientific methods have been employed to investigate CT in TED
5) year of publication—as an index of potential publication trends on CT in TED
With regards to RQ2, we conducted an abductive content analysis (Mason, 2018), driven in part by the data (here the study corpus) but also the theoretical framework, as indicated previously, most importantly, Davies’s theoretical distinction between individual and sociocultural perspectives on CT. In addition, we applied selected analytical categories proposed by Dinsmore et al. (2008) in their metastudy on the theoretical and empirical boundaries of metacognition, self-regulation, and self-regulated learning in various learning contexts. While thematically different, this review, much like ours, aimed at clarifying the murky waters of similarly complex concepts. Moreover, the authors’ arguments for clarity resonated with our research aims regarding critical thinking and overlapping terms, and the study was carried out in a scientifically robust and transparent way. While we also considered alternative methodologies such as expert agreement (Alexander, 2014; Facione, 1990), we sought to examine the existing empirical research to uncover recent researchers’ actual conceptualizations. As such, Dinsmore et al.’s analytical approach proved inspirational for our purposes.
More specifically, at a broad conceptual level, our main analytical interest was threefold. Firstly, we classified the identified studies in terms of their conceptual nesting in either individual or sociocultural perspectives on CT, as previously detailed, or, alternatively, as a combination of both, where this was clearly the case. Secondly, at a more procedural level, our interest was in examining the offered conceptualizations in terms of clarity. This dimension was directly inspired by Dinsmore et al.’s (2008) review and facilitated through a coding scheme comprising the subcategories “explicit” and “implicit.” Similar to Dinsmore et al. (2008), we assumed explicit definitions and, hence, conceptual disambiguation as the preferred option. The less desirable, implicit dimension was by Dinsmore et al. (2008) further nuanced as being either 1) conceptual, meaning that authors drew on terminology that implied CT; 2) referential, suggesting that authors made a specific reference to relevant scholarship on CT as a form of definitional proxy; and 3) methodological, where CT was operationalized through adopted methods, such as specific standardized or author-designed assessment instruments. In contrast to Dinsmore et al. (2008), our review also included a mixture of the three implicit subcategories as the fourth analytical code. Additionally, as in Dinsmore et al. (2008), we looked at the authorial choice of including the term “critical thinking” specifically as a keyword vis-à-vis conceptual clarity concerns. Thirdly, given the continued scholastic debate on CT’s domain-specificity versus domain-generality and, hence, the issue of CT’s epistemological underpinnings, all studies were coded as either domain-specific (DS) or domain-general (DG). Studies where this issue was not thematized explicitly or where it could not be determined based on methodological information provided by the authors (implicit concern) were coded as unclear (U). We present the main patterns with the aid of basic descriptive statistics where deemed appropriate.
Review Quality Assurance Measures
To ensure methodological rigor and transparency, all extracted study information and details of the study appraisal procedure were recorded in a comprehensive MS Excel worksheet shared among the research team throughout the research process. Our choice of software was predicated on its easy search and cross-analysis functionality.
Given the size of the sample, the a priori diffuse nature of CT as a concept, but also its broadness, as indicated previously, and, hence, to facilitate coding consistency, the research team engaged in a continuous dialogue on coding procedures and also kept a digital record of decisions on gray areas arising in the review process. We conducted two rounds of interreliability coding checks. Firstly, after the full-text review, a pool of 41 studies was coded as either “include/exclude but check with interrater” in a larger set of 242, where the remaining 201 studies were all coded as “include.” The interrater reliability agreement was 86%. The six studies where our codes differed (14%) were examined in detail by all three authors, and individual decisions on their final inclusion/exclusion were reached. Secondly, we conducted another interreliability check on 20% of the final sample, totaling 42 studies. The agreement was 85%.
Findings
Investigating CT in TED—Features of the Sample
Our first research question aimed to assess the lay of the land by identifying features of international scientific literature on CT in TED. As reported in Table 2, we focused on the included populations (educational levels and school subjects PSTs were training to teach) and reported geographical locations. Moreover, we focused on adopted methodologies and publication rates.
Features of scholastic literature on CT in TED: Educational level and teaching subject of PSTs’ future students, and geographical location of reported study
Firstly, the PST populations included in our study sample were preparing to teach students of varying ages and across a range of school subjects. While a large proportion of studies omitted to report on the PSTs’ future students’ educational level (54%), 61 studies (30%) included PSTs who were being qualified to teach pre- and elementary schools. The PSTs were also being educated to teach mixed groups of students, consisting of both early childhood (kindergarten) and early school classes combined (13%), as well as a mix of primary and secondary (8%). Due to unclear reporting, it was not possible for us to identify whether this was due to mixing different TED classes in one and the same study, or if this is how teacher education is organized in the country or place of study.
Secondly, future teaching subjects of the participants, reported in around 59% of the included studies, were also varied. Natural and life sciences accounted for nearly 23% of the studies, and the arts (e.g., language, history, art, and music) accounted for around 16%. The remaining studies covered social sciences, physical education, special needs education, and a mix of different school subjects within studies, also suggesting that CT is studied across varied subjects.
Thirdly, geographic location was reported in 96% of the sample. North America (27%) was closely followed by the near- and middle-East Asian region (25.5%), while the far-East region of Asia and Europe also featured prominently, accounting for 15.9% and 13% of the total, respectively. There were also contributions from African countries as well as Australia and Oceania, covering 6.7% and 3.7% of the sample, while Central and South American studies were notably lacking (1.4%).
Further, concerning methods used to study CT in TED, we observed an almost equal distribution of studies adopting quantitative (n = 78) and qualitative approaches (n = 80), while studies that used mixed-methods approaches were also well-represented (n = 50).
Lastly, in terms of year of publication, the first 11 years of the focal time (2000–2011) accounted for only 18.7% of studies, while the last 11 years (2012–2023) showed a fourfold increase in publications, as shown in Figure 2.

Number of publications per year.
Conceptualizing CT in TED
The Individual–Sociocultural Continuum: Broad Trends
Our second research question asked how CT in TED has been conceptualized in international scientific literature. Our analysis distinguished between studies that adopted an individual lens on CT, zooming in on core aspects, including argumentation and skillful judgment, those that adopted sociocultural perspectives and focused on aspects of CT such as social conditions and criticality, and studies that combined both these perspectives. A rather clear trend in the data was that authors more often adopted an individual (n = 150) than sociocultural (n = 29) or mixed (n = 29) perspectives. Figure 3 offers a visual approximation of the main results as mapped onto Davies’s (2015) theoretical model.

Visual approximation of main results in terms of theoretical perspectives and focus of inquiry in identified studies.
In terms of distributions of subject areas PSTs were training to teach in relation to the theoretical point of departure, our findings show that most studies within the individual dimension, that specified subject areas, were conducted within natural and life sciences (n = 43), while arts and social science subjects were jointly most dominant in the sociocultural perspective (n = 9), as they did in studies that adopted both individual and sociocultural perspectives (n = 12). Only one study within natural and life sciences was coded under the sociocultural perspective. Table 3 offers further details on subject areas in relation to the theoretical perspective.
Distribution of subject area of study in relation to theoretical point of departure
As shown in Table 4, studies adopting an individual approach were investigated using different methodological approaches, with quantitative methods predominating (n = 72), but qualitative (n = 39) and mixed-method studies were also common (n = 39). Among the quantitative studies, surveys, questionnaires, and ability tests were the dominant data collection measures. Sociocultural approaches (n = 29) were most investigated by qualitative methods (n = 24). Qualitative data collection methods included multiple data types as part of case study approaches and action research, analysis of written artifacts, and interviews. Studies that combined individual and sociocultural approaches tended to use more qualitative research methods. Across conditions (individual, sociocultural, and combined), studies adopting mixed-method approaches most often combined interviews and questionnaires.
Theoretical and Methodological Approaches
The Individual Dimension: An In-Depth Insight
Within the individual perspective, we identified a body of studies that focused on the skill-based argumentation core of CT, including analyzing, evaluating evidence, and making inferences (Belda-Medina, 2022; Braund et al., 2013). The works of Ennis et al. were a frequent point of reference (e.g., Kaya, 2022; Ugwuozor, 2021). We also identified a cluster of studies that combined a focus on core skills and CT processes ending in decisions about what to believe (i.e., judgments; e.g., Angeli et al., 2003). The focus on reaching a decision and drawing conclusions, as well as problem-solving and higher-order thinking of hierarchically ordered levels of cognitively related operations (e.g., in relation to Bloom’s [1956] taxonomy) also featured in studies adopting an individual perspective (e.g., Garcia & Hooper, 2011; Zain et al., 2022).
Also common when adopting an individual perspective and, not unexpectedly, a sizeable body of studies that relied on the consensus view of Facione’s (1990) Delphi report, notably the six dimensions of interpretation, explanation, analysis, inference, evaluation, and self-regulation were identified (e.g., Arsih et al., 2021; Barahona et al., 2023). Some studies with an individual perspective also focused on dispositions underlying the propensity to engage in CT (e.g., Ekici, 2017) or combined “skills and dispositions” perspectives (e.g., Celik & Ozdemir, 2020; Han & Brown, 2013; Toy & Ok, 2012).
Interestingly, studies with the individual perspective on CT also included references to reflection, reflective thinking, and critical reflection, including a cluster of studies referring to Dewey and his focus on reflective thinking (e.g., Han & Brown, 2013; Lombard, 2008). Conceptualizations of reflection leading to action and changes in teacher behavior were also included in this perspective (e.g., Taddei & Budhai, 2016).
The Sociocultural Dimension: An In-Depth Insight
In the socioculturally oriented studies, we identified a disparate body of studies, building on a range of theoretical perspectives, where CT was often used interchangeably or replaced with other terms relating to criticality. This included critical literacy, for example, where the deconstruction of texts and awareness of books as artifacts was foregrounded (e.g., Balikçi & Daloglu, 2016; Colwell et al., 2021). Other terms, such as critical (transformative) dialogue (Matloob Haghanikar, 2019), culturally relevant pedagogy (Jones & Donaldson, 2022), and critical social theory (Carrington & Selva, 2010) also featured as commensurate with the concept of critical thinking. Moreover, terms such as reflective thinking, reflective analysis (e.g., Guichon, 2009), as well as critical reflection and reflective teaching were also drawn on in some studies that adopted a sociocultural perspective (e.g., Shin, 2021). Moving further to the outer rims of Davies’s model, the included studies witnessed a merging of concepts such as creativity with aspects of argumentation. For instance, Silva et al. (2022) posited that creative and critical thinking “are two sides of the same coin” (p. 766) and applied The Critical and Creative Thinking Test (CCTT) to assess pre- and post-critical and creative thinking skills.
It is also noteworthy that, among studies adopting a sociocultural approach, there was not always a clear theorization of potential points of contact with or boundaries to CT, and several studies provided definitions only implicitly, via references to other theorists (see also next section). Despite this, we noted a similarity in the conceptualization of the included terminology, either explicitly, in terms of a definition, or implicitly, through measurement or references to relevant and related literature (e.g., Busher et al., 2012).
Studies Combining Both Perspectives: An In-Depth Insight
Our sample also comprised studies that embraced both perspectives (e.g., Meierdirk, 2018; Sultan et al., 2017). It is of note that these mixed-perspective studies represented a specific approach encompassing both individual and sociocultural foundations. That said, we found that the teasing apart of individual and sociocultural perspectives was not complicated in most cases. For example, Sultan et al. (2017) were concerned with critical reading as a “manifestation” of CT (p. 160) that involved not only mental processes and higher-order thinking, but also more social processes of reading, including engaging with texts to identify strengths and weaknesses and assessing the value of reading. Similarly, we noted studies that combined perspectives drawing on reflection as presupposing logic and rational thinking, but, at the same time, were concerned with the situated nature of knowledge as well as issues of social justice and equity. For example, Gahlsdorf Terrell and Sherman (2022) described what they refer to as the “developmental and contextual” (p. 244) nature of critical reflection in TED and studied the influence of variable contexts on individual thinking. Also, Carlson (2019) drew on Dewey’s conceptualization of reflective thought that not only required careful cognitive considerations but also reflection on actions with social and political implications.
Conceptual Clarity
In terms of further conceptualizing CT in TED, we note that, broadly speaking, the included studies appear to populate a continuum stretching from theoretically more specific and elaborated positions, rooted most often in educational psychology and philosophical literature (e.g., Ekici, 2017; Forawi, 2016; E. Murphy, 2004; Wake & Modla, 2012) to positions where CT remained a diffuse (under-theorized) concept (e.g., Minott, 2012; Newton & Newton, 2010).
Many studies in our corpus (n = 116) defined CT in an implicit manner, through reference to key aspects of CT, key theorists, or measurement approaches that captured CT, rather than stating an explicit definition. Moreover, the matter of unclear definition was further complicated by authors who provided potpourris of references but failed to specify their own preference for theorists or theoretical foundation.
To explore this issue, we conducted an analysis of the clarity of definitions in terms of authors’ choice of keywords, assuming that studies nominating CT as a keyword would be more likely to offer an explicit conceptualization. This analysis showed that many studies (n = 87) included CT as a keyword, although a relatively large number (n = 72) did not, with the remaining articles (n = 45) not providing any keywords at all. Overall, we saw greater explicitness in those studies that did include CT as a keyword than those that did not, as shown in Table 5 by the higher number of studies coded under “Yes” for including CT as a keyword and providing an explicit definition of CT.
Conceptual clarity in relation to whether CT was included as a keyword
The Epistemological Underpinnings of CT: Domain-Specificity and Domain-Generality
Aside from the adopted theoretical position and conceptual clarity, we were also interested in emerging patterns regarding CT researchers’ positioning vis-à-vis the generality and/or specificity of CT, given perennial scholarly discussions (e.g., Bailin et al., 1999; Lai, 2011; Siegel, 1985). Our investigations highlighted differing approaches on this point, which can be placed in three categories.
The first category (n = 88) consisted of those studies that positioned themselves explicitly in the discussion of CT as domain-specific (e.g., Daniel, 2001; Erixon & Erixon Arreman, 2019; Kacerja & Julie, 2023) or domain-general (e.g., Braund et al., 2013; Çarkit & Kurnaz, 2022; Özelçi & Saracaloglu, 2017). Coded as the latter, Çarkit and Kurnaz (2022), for example, conceptualized CT as a multidisciplinary thinking process. The very issue of domain-specificity or generality in CT was also of focal interest in several studies (e.g., Lloyd & Bahr, 2010).
The second category (n = 99) comprised those studies that used instruments that were either domain-specific or domain-general and hence showed an implicit concern for the issue. In many cases, domain-general assessment tools were readily employed, either in their original form or translated, such as the Critical Thinking Standards Scale (CTSS) (e.g., Hursen, 2021; Karaoglan Yilmaz & Yilmaz, 2020; Kizilhan & Demir, 2022), the California Critical Thinking Dispositions Inventory (CCTDI) (e.g., Arsal, 2015, 2017; Bilen et al., 2013; Unlu & Domke, 2017; Toy & Ok, 2012), or the Watson-Glaser Critical Thinking Appraisal (e.g., Gadzella & Baloğlu, 2003; Zascavage et al., 2007). More specific views of CT, such as within mathematical thinking (e.g., Celik & Ozdemir, 2020) or science (e.g., Sonmez et al., 2021), were also investigated. We noted that several domain-specific measures were represented, including research-developed ones (e.g., Aidoo et al., 2022), as well as other innovative and more qualitative methods of investigation, including interviews, document analyses, observation, and multiple combined approaches as part of case studies (e.g., Gooch et al., 2008; Kurniati et al., 2019).
The third and last category (n = 21) consisted of those studies that did not explicitly thematize the issue of domain-generality or specificity of CT or where the information on methods or instruments employed did not provide further clues that could aid in disambiguating the issue (e.g., ElSayary et al., 2022).
Discussion
This systematic review aimed to shed light on how CT in TED has been investigated in terms of key reported features of international scholastic literature on the topic and how CT has been conceptualized in this literature. To this end, we conducted a systematic search in relevant international research databases and identified 208 studies that met our inclusion criteria. While necessarily conditioned by the latter, this in itself represents a sizeable corpus that, in our view, offers ample opportunities to detect tendencies across both dimensions. To facilitate this process, the corpus was subjected to an abductive content analysis, with Davies’s (2015) model, nested in individual and sociocultural perspectives of CT, as our theoretical vantage point and Dinsmore et al.’s (2008) review as an additional methodological inspiration. Next, we bring back on board and comment on our main findings across methodological, conceptual, theoretical, and practical-educational lines.
The Broad Landscape of Research on CT in TED
In terms of our first research question, we note several emerging patterns. Firstly, the heterogeneity in educational levels and school subjects that the PSTs in our sample were training to teach points to a trend that we interpret as positive. If future generations are to think critically about pressing issues of both local and global relevance, it is imperative that PSTs are equipped with the necessary knowledge of what CT as a concept represents and, relatedly, how it may be translated to actual classroom practice across different grades and subjects. Further, given that educational scholarship on the development of CT and its subdimensions, such as argumentation and reasoning, has tended to privilege older age groups (see on this, e.g., Bubikova-Moan & Sandvik, 2022), the fairly high representation of studies placed in early childhood settings is unexpected. Notwithstanding some methodological reservations, we suggest that, at the very least, these studies are a timely empirical addition to how PSTs are being prepared to foster CT from early on.
Given the complexity embedded in CT as a concept, we see it as commendable that rather than adhering to strict methodological orthodoxies, researchers employ different methodologies and methods, as well as combinations of methods, that may, in concert, aid in unpacking some of this complexity.
Likewise, the fairly broad geographical representation in our sample suggests that CT is on a worldwide teaching and research agenda in TED. Further, it signals that no specific country or continent may claim hegemony in terms of the scientific construction and dissemination of knowledge in the field. While most likely due to methodological and conceptual differences, including selected search parameters and final sample sizes, this diverges from the far less geographically even distribution patterns evidenced in earlier metastudies on CT in TED, such as that of Lorencová et al. (2019) and Huang and Sang (2023). We nonetheless see the near absence of studies conducted in Central and South America as disconcerting, if not alarming. While this may simply be due to a preference to publish in national languages rather than English and, hence, a reflection as well as a practical limitation of our methodological choices and linguistic possibilities, it nonetheless suggests that these countries’ voices may be much less audible in the broader, English-speaking scientific debate on CT in TED. In a time where the issue of equitable representation and epistemic (in)justice looms large at a social and political level, this not only calls for vigilance but also brings to the fore the need for productive cross-linguistic and cross-national research collaboration that could potentially aid in mitigating shortcomings of this kind when synthesizing across complex research areas of key international significance, such as CT in TED.
Lastly, we also documented an almost fourfold increase in publication rates from 2011 to 2023. To interpret this increase is, however, far from straightforward. On the one hand, it may be seen as a sign of an increased scientific interest in CT in TED that parallels an increase in attention given to CT as a concept in both national and transnational educational policy (Australian Curriculum, Assessment and Reporting Authority, 2023; Norwegian Directorate of Education and Training, 2020; OECD, 2018, 2022). On the other hand, it may be the result of the worldwide growth in academic publications in general, higher education notwithstanding (Seeber, 2023). Alternatively, it may be seen as a reflection of both these trends. Our data do not allow us to draw conclusions on this point.
Conceptualizing CT in TED: The Individual–Sociocultural Continuum
In terms of our second research question, we documented that CT has received the most scholarly attention as a concept studied through the individual lens. This often entailed an interrogation of argumentation-related aspects of CT, including the analysis and evaluation of claims, evidence advanced to support these claims, or inferences made in the process of argumentation. In line with existing scholarship (Davies, 2015; Felton, 2005; Fisher, 2021; P. K. Murphy et al., 2014), this seems to suggest that the individual dimension of CT, nested in argumentation theory, is indeed the most basic scholastic understanding of CT in TED.
We also evidenced that studies coded within this dimension often drew on prominent CT theorists, associated with the North American critical thinking movement, such as Dewey, Ennis, R. Paul, and Elder. This suggests that scholars broadly connect the core skills aspects of CT to the skills-and-judgments view as well as the skills-plus-dispositions view of CT, as suggested by Davies (2015). However, we note that untangling these perspectives from each other may present an analytical challenge. For example, distinguishing between CT and related concepts, such as problem-solving and other 21st-century skills (Ananiadou & Claro, 2009), as well as CT and higher-order thinking skills (HOTs), can be a difficult, if not impossible, task since these can often be distinctions of nuance rather than kind (see Daniel, 2001). It is, therefore, imperative that, if these related concepts are drawn on alongside CT, researchers offer disambiguation. Among other things, conceptual vagueness on this point may limit the practical translation of empirical findings to actual TED classrooms. Further, lack of conceptual vigilance may complicate teacher educators’ task of furnishing PSTs with a clear understanding of what CT does and does not entail, how it relates to other relevant concepts, and hence, how it may be applied and taught in specific classroom settings from early on, including in terms of instructional content, functions, and goals, as also underscored in other metastudies on CT in TED (Huang & Sang, 2023; Lorencová et al., 2019). It is also of note that while conceptualizing CT as either skills, judgments, dispositions, or a combination of these dimensions, many studies within the individual dimension also underscore action. Yet rather than social action of a specifically social-transformative nature, as suggested by Davies (2015), it was often practical action in the classroom that was foregrounded. One may argue that such action may have a deeply transformative character, albeit of a somewhat different kind and scale than that suggested in the model. This, in turn, nuances the category of action as a concept within the context of TED.
Taking an aerial view, we have also identified a lesser pool of studies that were specifically nested in the sociocultural understanding of CT or, alternatively, a combination integrating both the individual and sociocultural dimensions. Wary that this apparent underrepresentation may have to do with the procedural aspects of our study, one may argue that it indicates a distinct need for a greater emphasis on exploring this dimension and its ethical undercurrents in empirical research on CT in TED. Referring to this dimension as the value-based sense of CT or, alternatively, as its sociological understanding, a similar recommendation has been put forward by Huang and Sang (2023) and P. K. Murphy et al. (2023), which underscores the urgency of this point. Pre-service teachers are entrusted with the responsibility to educate tomorrow’s citizens who are ready to participate in democratic processes and, hence, able to analyze and make judgments on deeply complex sociopolitical issues such as climate change, racism, sexism, or the current rise of ultra-nationalist sentiment in Western democracies. Nurturing prospective teachers’ own criticality toward these issues while in TED, including a sensitization toward aspects of power and voice embedded therein and an ability to think beyond established paradigms, seems paramount, not least so that they understand it well, embrace it in their own practice, and, as such, nourish the critical spirit in their future students.
Alternatively, thinking of TED more broadly, one may see the apparent lack of studies within this dimension as a reflection of the fact that the sociocultural conceptualization of CT, with its concern for exposing and critiquing the ideological fabric of our social life, is more readily at home in the arts and the social sciences. Our findings provide further support to this assumption in that most studies in this condition were conducted within these disciplines, while only one came from natural and life sciences. Nonetheless, this does not imply that the sociocultural understanding of the term “critical” is without relevance in natural science education more broadly (see, e.g., Furness et al., 2017) and in the context of TED. On the contrary, given current advances within fields such as generative artificial intelligence (AI) or biotechnology, critical reflection and judgment on the possibilities but also ethical responsibility these fields carry in the broader sociopolitical context should be high on both scientific and teaching agendas. In fact, as Wegerif and Casebourne (2025, p. 1) argue, generative AI profoundly challenges “the existing structures and purposes of education.” They urge that rather than seeing it as a threat to our capacity for critical thinking, generative AI, when appropriately integrated in our pedagogical design, can aid in enhancing our collective intelligence. Teachers’ criticality is paramount here. Therefore, we propose that future empirical studies explore how TED prepares PSTs to work with this aspect of CT more specifically. In more practical terms, we also propose that teacher educators and their students must be encouraged to further explore CT in specific domains and contexts.
The Question of Knowledge in Conceptualizing CT
More broadly, this very issue also calls forth the question of knowledge and, relatedly, domain-specificity or domain-generality of CT. Our findings showed that researchers do not necessarily thematize this issue explicitly in their work, often providing only implicit clues via employed methods or instruments, or no clues that could reduce ambiguity on this point. Whether this is simply because it is not seen as a relevant dimension to foreground or because discipline-specificity is assumed at a broader TED level is difficult for us to conclude on. Further, judging from the employed methodological instruments, a number of studies showed concern for CT as a discipline-general rather than discipline-specific concept. As such, these findings may signal a certain lack of concern for the role of disciplinary knowledge in CT. This can be seen as disconcerting. Today’s world presents citizens with increasingly complex issues that require socially and ethically responsible action vis-à-vis issues coming from the arts and social sciences but also natural sciences, as argued above. In line with Bailin et al. (1999), we would argue that, in order to take such action, it is paramount to possess knowledge and, by extension, an understanding of the standards through which such knowledge can be evaluated, as well as training in actually applying these standards as the very bedrock of critical judgment. Lack of attention paid to these issues may imply a simplistic and static view of CT as a procedural skill, transferable across contexts once acquired, with repercussions for TED as well as PSTs’ future classroom practice of CT. We contend that TED has a double responsibility: to equip PSTs with necessary disciplinary knowledge, but also ways of knowing how to adjudicate and apply this very knowledge critically. By extension, this necessitates the development of appropriate didactic approaches that make it more transparent for PSTs how to nourish the critical spirit in their future classrooms in a broad range of subjects from early on (see, e.g., Jegstad et al., 2025). It also calls for the inclusion of such developments in current TED curricula. We suggest further that TED programs need to address CT from both individual and sociocultural perspectives. Teacher educators may do this by modeling critical analysis of specific teaching situations, having PSTs reflect on the implications of this exercise for their future students’ learning environment, as well as engaging PSTs in critical discussions of scientific literature and research-based interventions they encounter in TED. As other metastudies on CT in TED have argued, there is much to gain in making teacher educators’ modeling efforts on this point explicit (Mpofu & Maphalala, 2017; Lorencová et al., 2019). This may extend to explicit peer or computer-supported modeling as well. Further, teacher educators should use research-based approaches to assessing CT and have PSTs reflect critically on the issue of CT assessment.
The Multifaceted Nature of CT: Issues of Conceptual Clarity
Our findings also lend further empirical support to the claim that not only is CT an a priori multifaceted concept, as also underscored in existing scholarship on CT in TED and beyond (Huang & Sang, 2023; Lorencová et al., 2019; P. K. Murphy et al., 2023), its variable conceptualizations in research on TED may themselves add to this complexity.
Firstly, many studies in our corpus defined CT in an implicit manner rather than stating an explicit definition. Much like Dinsmore et al. (2008) in their study on metacognition and self-regulation, we see this as unfortunate and call for definitional disambiguation. Secondly, as our findings also indicate, studies draw on a particularly varied terminology, used in parallel to or interchangeably with CT, such as critical literacy, critical pedagogy, or critical dialogue, among others. Perspectives denoted through the qualifier “reflective,” such as reflective thinking or reflective analysis, were also common, and HOTs, argumentation, and creativity were among closely related concepts that were sometimes used synonymously with CT. While this resonates well with Davies’ (2015) term “criticality,” where the “critical” is understood in a broad sense, we nonetheless question to what extent such a broad, interchangeable use of CT and related terms muddies the water of CT further.
Such unclear boundaries may not only be problematic conceptually but also practically, such as in terms of its teaching and assessment. The mixing of terminologies presents difficulties for researchers communicating with one another, but, more importantly, it can lead to further confusion for PSTs, teacher educators, and in-service and pre-service teachers alike. Teachers may already be skeptical of educational research or struggle to see its relevance (Ferguson, 2021; Hendriks et al., 2021). Scholastic inconsistency may also be understood as uncertainty, and thus further contribute to erosion of trust in educational research(ers) (cf. Oreskes & Conway, 2010). Further, the cacophony of research findings complicates the design of interventions and the comparison of research findings for teacher educators. There may be good reasons for different terminologies, but then this also needs to be communicated. One helpful contribution might be more subject-specific approaches and guides to CT that underline common conceptualizations, teaching approaches and moves, and interventions, as well as assessments within subjects. This should be balanced with interdisciplinary conversations that sow the seeds for cross-fertilization.
On a more abstract level, our findings necessitate reflection on the potential epistemological divides between the individual and sociocultural dimensions of Davies’s (2015) model that draw on different intellectual traditions with their corresponding terminological legacies and preferences. Could it be that attempts at reconciling the differences in one single theoretical model unnecessarily complicate the conceptual and terminological landscape further? One may argue that aligning with disciplinary orthodoxies and, by extension, rejecting attempts at theoretical synthesis may itself belie an uncritical spirit, at odds with the idea of dialogic openness, as Davies (2015) and Burbules and Berk (1999) seem to suggest. Nonetheless, if CT really does entail all of the aspects, as put forward by Davies (2015) and as explored in this review, then it can be charged with potentially encapsulating so much that it not only starts to be meaningless (cf. Alexander, 2014) but may also complicate or even hinder in-depth discussions on the productive theorization and practical application of CT in TED and, arguably, beyond. As recently suggested by P. K. Murphy et al. (2023), integrating critical pragmatism in conceptions of CT and uniting perspectives under the more generic valued thinking may be an alternative way forward theoretically and conceptually. Yet, while we see that this elegantly bypasses the terminological conundrum, it remains to be seen whether a new term will catch on and, if so, how it may be developed theoretically while steering away from the pitfalls encountered in conceptualizing CT.
Acknowledging these multiple, potentially enduring challenges, we propose that the individual–sociocultural continuum, as put forward by Davies (2015), does have both theoretical and practical merit. Firstly, unlike other available conceptualizations such as those based on disciplinary perspectives, as applied by Lai (2011) or Huang and Sang (2023), the model is conceived concentrically and, as such, assumes the two dimensions as related rather than mutually exclusive, with the individual cognitive core being subsumed in and, hence, always in dialogue with the sociocultural outer rim. We see this as a theoretical strength that is particularly suited for TED. Studies specifically combining both perspectives, identified in this metastudy, illustrate a productive empirical harnessing and feasibility of this very point. We would further argue that the distinction is conceptually simple and clear enough to merit practical appeal.
Conclusion
The nature of current pressing problems requires citizens to think across subject boundaries, in innovative, creative, and critical ways. Teachers throughout the world are faced with preparing young students to think critically. To embrace the challenge well, pre-service teachers need to engage in CT while in TED. We contend that this is again predicated on their understanding of what CT is as a concept. It is neither a context-independent procedural skill nor something that PSTs can learn implicitly or simply transfer from their own studies without instruction and practice of considering what CT looks like in their own (future) practice.
This study systematized scholarship on how CT has been investigated and conceptualized in international research literature on TED. It has brought to the fore both methodological and conceptual trends, most importantly: 1) the global concern for CT in TED, as evidenced by the disciplinary, educational level–related and geographical spread of studies as well rise in publication rates in the last decades; 2) the centrality of argumentation-based conceptualizations of CT in TED scholarship at a broad level but, simultaneously, 3) the lack of sufficient terminological and conceptual clarity at a more concrete level; and 4) lack of an explicit preoccupation for the issue of knowledge in CT.
Our findings point toward several key implications and actionable proposals that underscore needs and opportunities for research and practice of CT in TED:
Conceptual clarity in future research, which may not only leverage productive discussions of CT among TED scholars but also make the concept more readily understandable, translatable, and, hence, also applicable in practice for teacher educators and educational practitioners alike, both in-service and pre-service
Providing and integrating discipline-specific and contextually sensitive conceptualizations of CT in TED research and curricula, so that pre-service teachers not only learn what CT is and entails, but also how they can capitalize on it in both their own thinking and decisions on (and in) action, and in their future practice of teaching in the school subject(s) of their choice
Further research on the sociocultural dimension of CT in TED, particularly how TED supports PSTs’ own criticality not only across arts and social science subjects but also natural science, alongside further research on the individual dimension of CT in TED, including the subdimensions of skills, judgments, dispositions, and action, understood also as transformative classroom action, in order to deepen our understanding of both dimensions but also their interplay
Engagement in cross-linguistic and cross-national research collaboration on CT in TED to propel to the fore the issue of equitable representation and epistemic justice in research and practice
Employment of varied research methodologies, including mixed-methods designs, to unpack and deepen our understanding of the multilayered and complex nature of CT as a concept and practice in TED
Engagement in research on CT in TED across subjects and age groups that PSTs are being prepared to teach, not least in initial education, to deepen our understanding of the developmental differences in CT within and across subjects that PSTs can capitalize on in their (future) practice
Further development of appropriate didactic approaches to CT across subjects and, relatedly, integration of these approaches in TED curricula, facilitated by explicit teacher educators’ peer but also computer-supported modeling, including exploration of the affordances of generative AI in this regard
Further development, employment, and PSTs’ critical reflection over research-based assessments of CT during their training and beyond
However, we acknowledge that our review is not exhaustive and, as such, does not offer an ultimate answer. Like most systematic reviews, it is constrained by the very parameters we have set for our searches, including our specific database and other procedural choices, as well as the research team’s linguistic possibilities. Likewise, as a particularly time- and labor-intensive scientific method, it is conditioned by practical and pragmatic constraints. As such, it systematically interrogates a corpus of data, published and accessible at a given time and in a given context.
Thus, rather than aiming to settle the issue once and for all, we see the task of how TED researchers conceptualize and operationalize CT, as well as whether and how the identified contributions to this debate align with current and past understandings, as meriting continued attention, given the keen political and educational interest in the concept. As Alexander (2023) argues, the continued fluidity of notoriously complex educational concepts is only to be expected. Ultimately, in line with Burbules and Berk (1999) and Davies (2015), it is paramount that we embrace dialogic openness and, to paraphrase Johnson and Hamby (2015), continue to think critically of critical thinking and its teaching in TED as well as classrooms beyond TED.
Supplemental Material
sj-docx-1-rer-10.3102_00346543261438479 – Supplemental material for The Concept of Critical Thinking in Teacher Education: A Systematic Review
Supplemental material, sj-docx-1-rer-10.3102_00346543261438479 for The Concept of Critical Thinking in Teacher Education: A Systematic Review by Jarmila Bubikova-Moan, Leila E. Ferguson and Anette Andresen in Review of Educational Research
Footnotes
Notes
Authors
JARMILA BUBIKOVA-MOAN is an associate professor at the Department of Primary and Secondary Teacher Education, Oslo Metropolitan University, P.O. Box 4, St. Olavs plass, NO-0130 Oslo, Norway; email:
LEILA E. FERGUSON is a professor of teaching and learning in higher education at the Department of Education, University of Oslo, P.O. Box 1092, Blindern, NO-0317 Oslo, Norway, and adjunct professor of education at the Department of Psychology, Pedagogy and Law at Kristiania University of Applied Sciences, P.O. Box 1190 Sentrum, NO-0107 Oslo, Norway; email:
ANETTE ANDRESEN is an associate professor at the Department of Psychology, Pedagogy and Law at Kristiania University of Applied Sciences, P.O. Box 1190 Sentrum, NO-0107 Oslo, Norway; email:
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
