Abstract
Lesson study (LS), a teacher-oriented, student-focused professional development approach that originated in Asia, has spread globally. However, findings related to the effect of LS on teacher learning and growth have varied widely. This suggests a need to review current research on LS in the field of mathematics education. This paper focuses on mathematics teachers’ learning and professional growth through LS using Clarke and Hollingworth’s interconnected model of professional growth (IMPG) that contains three components: domains of change, mediating mechanisms, and change environment. With regard to domains of change, we found that although previous LS studies—often small scaled, qualitative studies—attended to all domains of change, they have more frequently reported changes in the personal (or collective) domain (e.g., teacher knowledge) but have relatively less frequently reported changes in other domains. In fact, there is a lack of research on systematic changes across domains in the IMPG. Regarding the mediating mechanisms, research has well explored both enactment (planning and teaching) and reflection (debriefing). For enactment, the planning process is less appreciated, and there is an inconsistent understanding of the role of the knowledgeable others. For reflection, there is a lack of focus on student thinking and constructive feedback during this process. Finally, research has suggested that LS has received support from the change environment but that tensions (e.g., time limits) have remained. These challenges call for systemic support and cross-cultural collaborations to develop sustainable and large-scale LS. We suggest future directions for continued research and practice.
Keywords
Introduction
Lesson study (LS) as a path to teacher learning and professional growth has drawn increasing, global attention in the field of education. For instance, the International Journal for Lesson and Learning Studies (IJLLS) was established in 2012, and the annual conference of the World Association of Lesson Studies started in 2007. However, findings related to the effect of LS on teacher learning and growth, including that of mathematics teachers, have varied widely, particularly in different education systems. While prior review studies (e.g., Cheung & Wong, 2014; Willems & Van den Bossche, 2019) reported overall positive effects of LS, they also identified that studies used inconsistent measures and that many lacked rigorous design. In addition, review studies (e.g., Seleznyov, 2019) found that there is little evidence of impact on teaching practice, student learning, and school culture/community in LS outside of Japan (and China). This suggests a critical need to review current research development on LS in the field of mathematics education. Given that teacher learning and growth is a complex enterprise that demands systematic changes in different domains with different mediating mechanisms (Barber, 2018; Clarke & Hollingsworth, 2002; da Ponte et al., 2022), we aim to conduct a scoping review of recent international research on LS in mathematics education, identify progress and challenges, and recommend possible directions for future research and development.
What is lesson study?
Stages and key features of lesson study
LS is a form of teacher-driven, collaborative professional development (PD) with promoting teacher learning and improving teaching competency as its core goals (Lewis et al., 2009; Lewis & Perry, 2017; Murata, 2011). Most recent literature describes a four-stage LS cycle: study, plan, do, and reflect (e.g., Aas, 2021; Akiba et al., 2019; Confrey & Shah, 2021; Huang & Shimizu, 2016; Lewis & Perry, 2017; Widjaja et al., 2021). During the first stage, teachers study the curriculum and relevant documents to set a research goal for the LS. Second, the teachers collaboratively plan a research lesson to address their research goal. Third, one teacher implements the lesson in a classroom while the other participating teachers observe and collect data. Fourth, all participating teachers collaboratively reflect on and discuss the effectiveness of the observed lesson and provide suggestions for improvement.
Effective LS has several distinctive features. First, LS is a teacher-oriented research inquiry. Teachers should approach LS with a desire to address a practical issue of concern (Takahashi & McDougal, 2016). Thus, LS should be guided by a research theme and question rooted in teachers’ understanding of the subject and student learning (Fauskanger et al., 2019; Fujii, 2016; Guner & Akyuz, 2020; Sekao & Engelbrecht, 2021). Second, LS requires sustained collaboration among teachers (Vrikki et al., 2017). Considerable time—typically several weeks or even months—must be allotted for the different LS phrases (Suh et al., 2021; Takahashi & McDougal, 2016). Third, effective LS is often supported by “knowledgeable others” who have expertise in LS and the relevant content area and can offer deep insights (Seleznyov, 2019). Fourth, the success of LS can be further supported by frequent or even iterative cycles of LS, although some experts disagree about whether the same research lesson should be taught to different groups of students as every group of students is unique (Takahashi & McDougal, 2016).
International implementation of lesson study
LS originated and has been practiced in Japan for more than a century (Akiba et al., 2019; Bieda et al., 2015; Lewis, 2016) and is woven into the fabric of Japan's educational culture (Robutti et al., 2016). In Japan, teachers—especially elementary school teachers—participate in multiple LS cycles and observe numerous teaching demonstrations each year (Lewis & Perry, 2017). Chinese LS, which developed in China in the early 20th century (Huang & Shimizu, 2016), is distinct from Japanese LS in its twin focus on developing a product (i.e., an exemplary lesson plan) and honing instruction through a process of repeated teaching and ongoing involvement of knowledgeable others (Huang, Fang, et al., 2017; Huang et al., 2021; Huang, Zhang, et al., 2019). In addition, Chinese teachers usually engage in different types of LS depending on their level of experience, for example, progress-report lessons by novice teachers and exemplar-lesson demonstrations by expert teachers (Huang, Barlow, et al., 2017).
Stigler and Hiebert's (2009) seminal book systematically introduced LS to the West, and LS has been adapted to different countries around the world (Akiba et al., 2019; Fauskanger et al., 2019; Huang & Shimizu, 2016; Joubert et al., 2020). During the international adaptation process, several variations of LS have emerged (Huang & Shimizu, 2016). For example, the United Kingdom has employed a version of Japanese LS that adds the close observation of case pupils—students who represent certain learner groups in the classroom—with three iterations of research lessons (Dudley, 2013; Vermunt et al., 2019). In the US, a collaborative research lesson model with an emphasis on research purpose/proposal and inclusion of knowledgeable others has been developed for conducting LS at a large system level (Takahashi & McDougal, 2016). In addition, learning study—a combination of Japanese LS and design-based research that is guided by variation theory (Pang & Marton, 2017)—was developed by scholars in Hong Kong and Sweden (Huang & Shimizu, 2016; Pang & Runesson, 2019). Recently, the COVID-19 pandemic has also spurred interest in technology-empowered adaptations of LS (Calleja & Camilleri, 2021; Huang et al., 2023).
Benefits of lesson study
LS offers potential benefits to individuals, teaching communities, and educational systems. It fosters teachers’ growth by helping teachers to develop pedagogical skills (Bocala, 2015) and cultivate content mastery (Amador & Carter, 2018). Teacher competencies enhanced by LS are expected to contribute to improving student learning outcomes (Lewanowski-Breen et al., 2021). Within schools, LS promotes teachers’ learning community by encouraging them to collaborate and learn from each other and develop discipline-specific teaching practices (Suh et al., 2021). LS can also foster shared learning between schools or across districts through communications and publications (Warwick et al., 2016), a benefit that is sometimes overlooked (Widjaja et al., 2021).
LS can support education reform efforts and promote teaching innovation. Gero (2015) noted that the goals of LS—teaching quality, learning outcomes, self-assessment, continuous improvement, and collaboration—align with goals for education reform. Furthermore, as teachers attend simultaneously to both teacher and student learning (Lee & Tan, 2020), they become more aware of tacit assumptions that may prevent critical reflections (Kager et al., 2022; Lee & Tan, 2020). Thus, LS can foster connections between theory and practice (Huang & Shimizu, 2016; Huang et al., 2021; Lewis, 2016) and encourage new approaches to curriculum design and teaching (Huang et al., 2021). Consequently, LS is widely viewed as a robust PD approach to supporting mathematics educators’ development (Huang, Barlow, et al., 2017).
Theoretical framework of teachers’ professional growth
In this review, we use Clarke and Hollingsworth's (2002) interconnected model of professional growth (IMPG) as a lens to examine recent research on LS in mathematics education. Even though IMPG was developed in a Western culture while LS originated in an Eastern context, the IMPG model has been widely used to capture teacher learning in general (Goldsmith et al., 2014) and effectively used to frame teacher learning through LS in particular (e.g., da Ponte et al., 2022). Additionally, we anticipate findings from LS practice may provide additional insights that enrich the IMPG model.
Interconnected model of professional growth (IMPG)
IMPG has an interconnected and non-linear structure through which we can identify the sequence and network of teachers’ changes (or growth). Figure 1 shows Clarke and Hollingsworth's (2002) IMPG model that contains three key components: the domains of change, the mediating processes, and the change environment. Among the four domains of change (external, practice, consequence, and personal), the external domain is distinguished from the other three domains in that the former is located outside the teacher's personal world (Clark & Hollingsworth, 2002). Change in one domain often triggers changes of the other domains. The mechanisms behind these changes are two mediating processes, enactment and reflection. Enaction refers to translating a new belief, a new idea, or a new pedagogical model into action, while reflection refers to “active, persistent, and careful consideration” (p. 954) of one domain leading to changes in another. These processes are also affected by the outside change environment (e.g., school and district support).

The interconnected model of professional growth (adapted from Clarke & Hollingsworth, 2002).
Note that the above model is iterative as indicated by the various arrows that show different paths of teachers’ professional growth. For instance, changes usually start with the external domain in which external sources of information or stimulus (e.g., new standards, new teaching theory) become accessible to teachers (e.g., through a PD workshop). When teachers try to implement these new external sources, changes in their teaching practices will likely occur (e.g., their classroom teaching may shift from teacher-dominated to student-centered). Consequently, teachers’ perceptions of salient outcomes related to classroom practices may change (e.g., viewing student–student talk as a positive outcome of a new teaching strategy as opposed to a sign of loss of control), which may further contribute to teachers’ personal changes, including in their knowledge, beliefs, and attitude. Of course, changes in the external domain may first make an impact on teachers’ personal domain, which may then trigger changes in the practice and consequence domains.
The IMPG model has been used by researchers to guide studies on LS with slight variations (e.g., Barber, 2018; Goldsmith et al., 2014; da Ponte et al., 2022, 2023; Widjaja et al., 2017; Zhao et al., 2022). While these studies have all located teaching within the practice domain, lesson planning and debriefing processes were often classified differently. Table 1 summarizes the discrepancies.
Discrepancies in classifying “lesson planning” and “debriefing” in IMPG.
Discrepancies in classifying “lesson planning” and “debriefing” in IMPG.
Our reconceptualization of IMPG has built on and modified the model used in prior research. Similar to Goldsmith et al. (2014) and da Ponte et al. (2022, 2023), we view the “plan, teach, and debrief” stages as components of a single unit that plays a dual role, serving as both a domain of practice and as mediating mechanisms that lead to change. As a domain, these stages provide contexts to locate the “changes” in teaching practice. As mediating processes, relevant actions link and facilitate changes between two domains. These considerations are supported by Schipper et al. (2017), who pointed out that LS activities in the domain of practice are interchangeable and difficult to differentiate. Additionally, we modified the original “personal domain” as “personal/collective domain.” This is because the original IMPG model developed by Clark and Hollingsworth (2002) was used to characterize individual teachers’ knowledge change. When researchers expanded the model to characterize teachers’ knowledge change in a group setting such as LS, the personal domain was extended to the collective or group domain (e.g., da Ponte et al., 2022; Prediger, 2020, see Table 1). In this review, we include studies that view teachers as individuals or as a group. Below, we elaborate our re-conceptualized IMPG, which serves as a lens for our review.
Domains of change. In many countries, LS is a response to changes in the external domain (e.g., new curriculum standards) or a desire for teaching improvement (Akiba et al., 2019; Hadfield & Jopling, 2016; Huang & Shimizu, 2016; Takahashi & McDougal, 2016; Vermunt et al., 2019; Vrikki et al., 2017). Changes in the external domain prompt the first stage of LS: the study of curriculum materials and goal-setting. As LS proceeds from the external domain to the domain of practice (plan, teach, debrief), changes may occur in teachers’ lesson plans, research lessons, and even their post-lesson debriefs (Goldsmith et al., 2014). Changes in the domain of practice may result in changes in students’ learning, which may lead to changes in teachers’ perception of the salient consequences. The above experiences may further impact teachers’ knowledge, beliefs, and attitudes toward LS, bringing changes to teachers’ personal or collective domain.
Mediating mechanisms. In LS, planning and teaching may serve as mediating processes (enactment). Note that Clarke and Hollingsworth (2002) stressed that enactment is not simply acting (the latter occurs in the domain of practice). In LS, both planning and teaching demand careful design, which puts new ideas into action. It should be noted that some actions in the domain of practice may not enact new ideas or lead to changes, which may explain why some LS activities are not effective. In the same vein, debriefing could involve active, persistent, and careful consideration processes (reflection) that mediate between the domain of practice and other domains. Overall, the LS process appears to make the enactment and reflection mechanisms in the IMPG model explicit as a collective activity.
The change environment. Like any other teacher learning models, LS needs support from the outside environment such as the support of school administrators and district leadership (Aas, 2021; Groves et al., 2016; Gu & Gu, 2016); the success of LS can in turn contribute to school culture and even the broader teaching culture of a nation (Hadfield & Jopling, 2016; Hiebert & Stigler, 2017; Warwick et al., 2016). Given that LS is a cultural activity that originated in Japan but spread to other countries, cultural factors play an important role that should be attended to in the change environment. In Japan, LS is such an integral component of teacher professional work embedded in school culture that it does not even warrant a particular research focus (Fujji, 2016; Robutti et al., 2016). This is clearly not the case in many other countries.
With our re-conceptualized IMPG in the context of LS, we review recent research on LS in mathematics education to answer the following research question: What are the successes and challenges in teacher learning and professional growth categorized in terms of the domains of change, the mediating mechanisms, and the change environment, respectively?
Search and screening process
Although the current study was not an exhaustive review, we focused on recent research and searched through relevant peer-reviewed journals (n = 28) published since 2015 using the key word “lesson study.” We used only this key word because “lesson study” is the focus of this review and is a term broadly used worldwide. We intentionally excluded “learning study” literature to sharpen the review's focus and avoid potential debate regarding whether learning study is a type of LS. For instance, the journal title, “The International Journal of Lesson and Learning Studies” (IJLLS) clearly suggested differences between LS and learning study. Even though the latter is derived from the former, it differs by explicitly claiming the guided theory of variation (Pang & Marton, 2017) and the design research methodology. We used an approach similar to that in a recent review about STEM education (Li et al., 2020), as LS is also an emerging field of study with relevant publications appearing in different journals. Thus, we identified and then searched five types of research journals using the search term “lesson study”: general education journals (e.g., Teaching and Teacher Education, Journal of Teacher Education), mathematics education journals (e.g., Journal for Research in Mathematics Education, Journal of Mathematical Teacher Education, ZDM – Mathematics Education), cognitive science and educational psychology journals (e.g., Learning and Instruction), and international journals that focus on LS (e.g., International Journal for Lesson and Learning Studies) or contain relevant studies (e.g., International Journal of STEM Education). Overall, we searched these journals to identify 204 articles and then screened them for their research focus, resulting in an exclusion of 37 articles, as their foci were not on LS. The remaining 167 were directly related to LS as applied in many disciplines. Out of 167 articles, about 50% (n = 85) were excluded because they were not related to mathematics. The remaining 82 articles were then used for review. Figure 2 indicates the process of article identification and selection.

Modified PRISMA diagram for the screening process based on Page et al. (2021).
For the 82 LS articles that involved mathematics, we first coded the following aspects of each article: study type (e.g., empirical, theoretical, review), methods used (qualitative such as observation, quantitative such as correlational analysis, or mixed methods), grade level (elementary, middle, secondary), teacher level (preservice, in-service), country, and LS time (e.g., 1 month, 8 weeks, 3 years). We used an Excel spreadsheet to document these details, which enabled us to broadly understand the literature and quantify key elements from included studies.
Next, we conducted an in-depth analysis of each article. One author compiled the notes about the key information suggested by the article titles and abstracts. Another author took notes focusing on how the LS was conducted, noting the operational definition of LS, the study focus, and the LS procedures reported in each article. Given that our review focused on LS in mathematics education in a global context, we paid attention to the mathematics-specific and cultural factors. Overall, this process resulted in 117 pages of notes. Guided by our re-conceptualized IMPG model, we memoed all the notes, referring to the original articles as needed.
After the memoing process, we classified our comments and observations into three areas. First, we noted variations in understanding and conducting LS. For instance, we identified differences in LS team formation, the goal of LS, the LS stages, and the length of one LS cycle. Second, we summarized successes and challenges of teacher learning and professional growth in terms of each component of the re-conceptualized IMPG (see Section 3.2). This is a direct response to our research question. Finally, we identified possible solutions to the noted challenges and questions that indicated possible areas for future research.
Findings
Overview
These 82 articles described LS in a total of 28 countries; the United States appeared most frequently (20 articles). How the LS team was formed and how the LS process was conducted varied greatly across cultural contexts (Amador & Carter, 2018). LS was most often conducted within a single school (Takahashi & McDougal, 2016), although it could also occur at district, regional, and national levels (Groves et al., 2016). It was frequently conducted by in-service (81.4%) elementary and middle school teachers (Corey et al., 2021), although a growing body of research has explored the implementation of LS in secondary classrooms (e.g., Huang, Barlow, et al., 2017) and with preservice teachers (e.g., Guner & Akyuz, 2020; Hernández-Rodríguez et al., 2021). LS cycles were often led by expert teachers (Seino & Foster, 2021) or supported by “knowledgeable others” who provided feedback during the process (Hernández-Rodríguez et al., 2021; Lewis & Perry, 2017). Outside of Japan, university faculty often initiated and facilitated LS cycles (Calleja & Camilleri, 2021). Below, we report findings across these studies in terms of the core aspects in Clarke and Hollingsworth's IMPG model: domains of change, mediating mechanisms, and the change environment, each including successes and challenges of LS implementation.
Domains of change in LS
According to the modified IMPG model, the domains of change (external, practice, consequence, personal/collective) should function together as a system. In other words, the changes of one domain would potentially trigger changes in the other domains. Some empirical studies reported the successes of LS in fostering such changes. For instance, through a randomized, controlled trial, Lewis and Perry (2017) found that LS supported by a resource kit made a significant impact on both teacher and student knowledge of fractions. In this study, there was clear evidence of changes from the external domain to the domain of practice resulting in salient outcome changes (students’ knowledge improvement) and changes in teachers’ personal/collective domain (teachers’ knowledge improvement). Nevertheless, studies of this sort were rare (Willems & Van den Bossche, 2019). In fact, systematic reviews of LS (e.g., Larssen et al., 2018; Seleznyov, 2019; Takahashi & McDougal, 2016; Willems & Van den Bossche, 2019) have identified several challenges to understanding outcomes related to LS (elaborated below) and concluded that the overall effectiveness of LS outside Japan is indeed unclear.
First, studies on LS that reported positive changes tended to lack rigorous design and consistent outcome measures. For instance, previous reviews (Seleznyov, 2019; Willems & Van den Bossche, 2019) highlighted that positive findings in current LS literature were mainly based on small-scale qualitative research (often case studies conducted in the United States). In fact, of the empirical articles we collected, 56.4% employed case study methods. Of course, case study is an important research method that enables researchers to conduct in-depth investigation of teachers’ learning processes and supportive factors. However, due to their limited scope, findings from case studies cannot be generalized and therefore are inconclusive, suggesting the need for large-scale studies to investigate LS outcomes. Furthermore, Seleznyov (2019) noted that previous efforts to measure the effects of LS had many shortcomings, including an overemphasis on short-term outcomes and a failure to analyze LS as it relates to multiple goals in PD, student learning, and systemic change. These findings suggest the need for a well-controlled design (e.g., randomization) with consistent and validated outcome measures to investigate the impact of LS on teachers’ learning and growth.
Second, those LS studies that reported positive changes primarily emphasized the effects of LS on teachers’ changes in the personal/collective domain such as increased professional knowledge and more positive teaching attitudes and beliefs (e.g., Akiba et al., 2019; Corey et al., 2021; Leavy & Hourigan, 2016; Nguyen & Tran, 2022; Vermunt et al., 2019). However, there is little evidence that LS made a difference in the practice domain by shaping pedagogy or the external domain by influencing schools’ professional learning cultures and structures (Akiba & Wilkinson, 2016; Seleznyov, 2019; Willems & Van den Bossche, 2019). For instance, Fauskanger et al. (2019) reported that no changes took place in teachers’ initial and final lesson plans (practice domain) after one cycle of LS.
Third, many LS studies did not focus on “changes” across domains, which is a common issue of educational research on teacher change (Goldsmith, 2014). Although it is critical for a study to report teachers’ change in a certain domain (e.g., the aforementioned “personal/collective domain”), it is more important to identify the pathways that lead to those changes (Clarke & Hollingsworth, 2002; Goldsmith, 2014). In fact, many LS studies explored the features of effective practices or debriefing, which indicate potential paths toward teacher changes (elaborated upon in the next section). However, these paths were often not linked to the domains of change in ways that illustrated the network of teacher growth. For instance, Aas (2021) focused on talking during planning. Data were mainly from the first, second, eighth, and ninth LS cycles. While such data could have allowed an investigation of changes in teacher practices, this study instead focused on an exploration of what triggers teacher talk with learning potential and how to characterize such talk. Although such studies are necessary, but the current literature lacks a relatively complete picture that suggests how LS activities result in changes across domains.
Mediating mechanisms in LS
As articulated in Section 3.2, we argue that the “planning and teaching” in LS serves as one mediating mechanism (enactment) while “debriefing” serves as the other (reflection). An examination of the recent LS literature indicated a great effort to explore relevant activities during these processes. However, some issues still exist in current LS practice.
Enactment: planning and teaching in LS
Existing literature has identified various insights in terms of the enactment process in LS. Note that by the enactment process, we do not refer to those simple actions in the domain of practice. Rather, we consider actions that potentially facilitate the changes across domains. In comparison with planning and teaching, scholars have expressed concern that the planning stage is largely underappreciated in LS outside Japan (e.g., Fujii, 2016). This might be because planning is not as visible as live research lessons for outsiders. As such, researchers (e.g., Fujii, 2016; Miyakawa & Winsløw, 2019) called for attention to lesson planning as a central prerequisite for follow-up activities in LS. Nevertheless, existing studies have reported teachers’ challenges during lesson planning. In particular, Fujii (2016) found that teachers’ planning in other countries often lacked a research question. As such, the selected task examples lacked a clear, research-driven objective. Consequently, teachers’ follow-up discussions about the research lesson tended to be descriptive rather than analytical (Grimsæth & Hallås, 2015, as cited in Fauskanger et al., 2019). Such challenges may hinder the process of transferring teachers’ external learning into their practices.
Compared to lesson planning, the live research lesson, the heart of LS (Lewis, 2016), has drawn a great deal of attention. For instance, some studies explored how teachers may incorporate well accepted learning theories (e.g., learning trajectory and variation; Huang, Zhang, et al., 2019) into teaching to enhance teacher learning. Despite reported insights, research also demonstrated inconsistent understandings of LS enactment, which may be a potential roadblock to the effectiveness of LS. First, the purpose of a research lesson is often unclear. In particular, some teachers have misconceived the goal of the LS research lesson as simply to develop a good product (Takahashi & McDougal, 2016; Widjaja et al., 2021), which has nothing to do with teacher learning and changes. Note that although Japanese and Chinese approaches to LS differ somewhat on the importance of developing a good product as an LS outcome—Huang et al. (2021) specified exemplary lessons as a desired result of Chinese LS—typically, producing a perfected lesson plan is not the ultimate goal (Leavy & Hourigan, 2016; Widjaja et al., 2021). In both Japan and China, enhancing teachers’ teaching competency serves as a research lesson's common goal.
The second inconsistency during enactment is related to the duration of LS. Across the articles in our sample, the duration of LS varied widely, ranging from one cycle within 1 or 2 days (e.g., Confrey & Shah, 2021; Gero, 2015; Hernández-Rodríguez et al., 2021; Miyakawa & Winsløw, 2019) to multiple cycles over several years (e.g., Aas, 2021; Akiba et al., 2019; Bruce et al. 2016; Groves et al., 2016; Kager et al., 2022). This variability is noteworthy, for Akiba et al. (2019) found that one of the critical features that contributed to teacher learning and growth is the duration of LS. Although there is no magic bullet for LS duration and flexibility is often needed due to local contexts, effective LS demands careful design and enactment, which takes a considerable amount of time. In Japan, a school usually spends 4–6 weeks to develop one lesson plan (Seino & Foster, 2021). This strikingly contrasts with LS implementation outside of Japan, where a research lesson plan may be developed in 1–2 hours (e.g., Confrey & Shah 2021; Hernández-Rodríguez et al., 2021).
Last but not least, the role of “knowledgeable others” is inconsistent across LS studies during the enactment process. While both Japanese and Chinese LS have always involved knowledgeable others as a critical element (Groves et al., 2016; Huang & Shimizu, 2016; Lewis & Perry, 2017; Seino & Foster, 2021), some LS studies in other countries did not have this role (Takahashi & McDougal, 2016). Although the exact roles of knowledgeable others in Japanese and Chinese LS can differ—for example, they may or may not be part of the LS team or serve as a facilitator—these experts play an extremely important role in LS by providing content, conditional, and practical knowledge (Gu & Gu, 2016). In fact, knowledgeable others in Chinese LS may also serve as facilitators who provide support throughout the LS process (Huang & Shimizu, 2016). Internationally, university faculty often initiate, facilitate, and provide expert feedback on LS (Calleja & Camilleri, 2021). However, when such critical expertise is lacking, it is less clear whether those anticipated changes can be realized through the enactment process.
Reflection: debriefing in LS
Existing LS studies have also devoted a great deal of attention to the reflection process. The Chinese LS model indicates ongoing reflections on changes, first comparing the “new design” of a lesson plan to the “existing action” and then comparing the “new action” in a research lesson to the “new design” of the plan (Gu & Gu, 2016; Huang & Shimizu, 2016). Across the literature, there is also a consensus that it is most critical to focus reflection activities on students’ thinking and learning (e.g., Akiba et al., 2019; Bocala, 2015; Bruce et al., 2016; Confrey & Shah, 2021; Lewis, 2016). Guner and Akyuz (2020) reported a successful case study demonstrating how preservice teachers learned to notice student mathematical thinking within the context of LS.
However, challenges in the reflection process emerged when LS was explored outside of Japan. The first challenge related to the focus of the reflection. Gero (2015) found that US teachers tended to focus on social and less rigorous aspects of the process (e.g., creativity of a lesson, the observation of one another's teaching, collaboration in teaching) but conducted almost no analysis of students’ thinking. Bakker et al. (2022) reported similar findings with Dutch teachers who focused most on the identifying and planning elements of LS and least on interpreting the lesson implementation (e.g., explaining students’ problems with learning). In addition, Sekao and Engelbrecht (2021) reported that South African primary mathematics teachers’ reflection lacked clear focus, resulting in poor reflection quality. Some teachers in this study felt nervous and unconfident during debriefing because they did not specialize in teaching mathematics and consequently felt they lacked necessary content knowledge.
A second challenge is due to the lack of constructive feedback during reflection. For instance, Sekao and Engelbrecht (2021) found that some teachers felt they were being personally attacked during debriefing because they received some critical comments. Consequently, teachers viewed reflection as less useful and enjoyable than planning and teaching. This finding calls for culturally relevant LS that demands a great deal of collaboration and collegiality. In a country where teaching is perceived as individualized, LS must be conducted in a way that promotes a constructive environment for debriefing and feedback sharing.
The above findings indicate that more meaningful reflections should be encouraged to facilitate perceived changes across domains. Lewis and Hurd (2011) provided guidelines for conducting debriefings. Additionally, the aforementioned “knowledgeable others” may play an important role in this regard. As demonstrated in the literature, knowledgeable others in Japanese LS made final comments that connected the research lesson to broader content and pedagogical knowledge (Lewis & Perry, 2017). Their final comments often had specific foci during reflection (Seino & Foster, 2021). Other studies (e.g., Akiba et al., 2019; Huang et al., 2016; Huang, Fang, et al., 2017) also found that knowledgeable others and/or facilitators played crucial roles in ensuring the success of LS.
The change environment in LS
The change environment is the outside environment that supports anticipated changes. Across LS studies, the reported successes with the change environment are twofold. On one hand, there was encouraging school-, district-, and even state-level support for LS (Groves et al., 2016). For instance, in Aas (2021), LS was largely supported by the school leaders; in Akiba et al. (2019), LS was promoted statewide by the Florida Department of Education. On the other hand, the successes of LS also contributed to further enhancing the outside environment. For example, many studies have reported how LS contributed to the establishment of a professional learning community that facilitated teacher collaboration (e.g., Lewanowski-Breen et al., 2021; Sekao & Engelbrecht, 2021).
However, LS studies have also reported various challenges with the change environment. For instance, Sekao and Engelbrecht (2021) identified several external issues such as time and system challenges for South African teachers. Akiba and Wilkinson (2016) reported that even though LS was mandated by the state of Florida, no sufficient funding was spent to support the effort. In addition, existing organization structure and routines also posed challenges to engage district leaders and teachers to participate in LS. Furthermore, Gero (2015) noted that the district's high degree of control over the LS process threatened teachers’ ability to take responsibility for student learning. Such top-down approaches to LS are in conflict with its nature as a teacher-oriented PD format. These findings raise questions about how schools, districts, or states can provide appropriate support for LS.
In fact, LS reflects a systemic effort to improve teaching in both Japan (Hiebert & Stigler, 2017) and China (Huang & Shimizu, 2016; Pang & Marton, 2017). Without a supportive system, LS implementation may be hard to sustain even if there are some reported successes. Although the LS studied by Lewanowski-Breen et al. (2021) was successful, it was not continued after the study was over. Similarly, Takahashi and McDougal (2016) reported that “Despite the fact that public research lessons have been going on in the city for 12 years, all the schools that piloted lesson study in the early years discontinued after a few years” (p. 516).
The challenges to the change environment along with the unsustained LS efforts in other countries call for a system of support if LS is proven to be theoretically powerful and empirically successful in mathematically high-achieving countries. Given that LS is new to many countries, one possible solution to ensure a supportive change environment is cross-cultural collaboration. Currently, there are cross-cultural collaborations on LS in mathematics at the teacher- or school-level (e.g., Clivaz & Miyakawa, 2020; International Math-teacher Professionalization Using Lesson Study [IMPULS], 2022; Sekao & Engelbrecht, 2021). However, continued efforts may extend further to seek school district collaborations. This is because while many school districts outside Japan and China faced challenges in implementing LS (e.g., Akiba & Wilkinson, 2016; Gero, 2015), school districts—and even the nations of China (Gu & Gu, 2016) and Japan (Hiebert & Stigler, 2017)—were found to provide effective, systematic support for LS.
Summary: mathematics teachers’ learning and professional growth in LS
In this review, we examined recent research on LS in mathematics education based on the modified IMPG model. We noticed that enactment and reflection mechanisms in the IMPG model have been made explicit as a collective activity in LS, which may have strengthened teacher learning. This may explain why LS is one of the most effective PD approaches (Gersten et al., 2014; Lewis & Perry, 2017) and has drawn global attention. Our review also shows that adapted LS in other countries could improve teachers’ knowledge (e.g., Corey et al., 2021; Lewis & Perry, 2017) and improve school environments in some places with needed supports from the district (e.g., Aas, 2021; Groves et al., 2016). Despite the reported successes of teacher learning through LS, our findings echoed prior reviews (Cheung & Wong, 2014; Willems & Van den Bossche, 2019), indicating various research gaps and implementation challenges in the lens of IMPG. For instance, with regard to the domains of change, LS processes often lacked clear research questions to drive the LS cycle. In addition, current research included mainly qualitative studies, suggesting a need for more large-scale, quantitative research to understand the LS impact. Existing LS studies that reported positive impact mainly focused on changes in the personal domain but not other domains. This indicates a lack of research on changes across domains. Similarly, implementation challenges appeared with the mediating mechanisms. Regarding enactment, lesson planning is often less appreciated than teaching. Furthermore, the role of knowledgeable others (KOs) is often underspecified. Regarding reflection, there is a lack of focus on student learning. Moreover, constructive feedback is often lacking. Addressing the above research gaps and implementation challenges calls for systematic support for sustainable changes from the change environment. In Figure 3 we annotate the modified IMPG model with our main findings, highlighting the challenges indicated above.
Discussion and future directions
Employing the modified IMPG model as a lens, we examined teachers’ learning and professional growth through LS. Findings in this study shed light on the mutually beneficial relationship between IMPG and LS in mathematics education. On one hand, we found that although LS is a practice that was originated in the East and is now used in different educational systems globally, IMPG demonstrated a capability to capture all LS stages. The IMPG model also enabled us to identify how current LS literature has a limited focus on certain domains but not the changes across domains. It also allowed us to detect issues in LS mechanisms and the change environment that may have hindered teacher learning and professional growth in LS. On the other hand, we found that LS research contributes insights that enrich the IMPG model and enhance its capability. For instance, we found that LS stages and activities could not be classified directly in the existing IMPG model. This may be due to the deeply dynamic and interconnected nature of LS, which goes beyond what IMPG intended to capture. Additionally, LS activities demand collective effort in all stages, which cannot be captured by the original IMPG, a model developed in the West to capture individual's growth. Finally, LS originated in Japan but has been widely used in many other countries worldwide. This cross-cultural phenomenon calls for attention to cultural relevance, which also goes beyond IMPG's initial intent.
Through the lens of IMPG, our findings also revealed successes and challenges of LS in the domains of change, mediating mechanisms, and the change environment. These findings suggest future directions for this line of research in mathematics education. (a) What are specific strategies or interventions that promote systematic changes across domains in LS? (b) How may the mediating mechanisms function to bring forth the changes? (c) How may the change environment provide systematic support for LS? Future studies may explore each of these questions to enrich the big picture of LS informed by the modified IMPG model. For instance, there is a need to explore whether and how LS implementation has led to changes in the practice and consequence domains, especially outside of Japan and China. What aspects of the mediating processes (e.g., enactment and reflection) contributed to the observed change? How will the observed changes be scaled up and sustained with school district support? Lewis and Perry's (2017) study indicates an example. This large-scale, rigorously designed study investigated “when” LS can best support teachers learning. It was found that LS supported by a research-informed resource kit (change in external domain) could lead to systematic changes in teachers’ knowledge (personal domain), teaching (practice domain), and student learning (outcome domain). In addition to investigating ways to enhancing the LS effect, future studies may contribute to conceptualizing the key elements of LS (e.g., the role of knowledgeable others and their relation to facilitators as well as the respective effects on teacher personal and collective learning), addressing misconceptions (e.g., the purpose of LS and its necessary duration), and establishing validated outcome measures of LS. This is because even though LS should be adapted to different nations in ways that are culturally relevant, effective implementation of LS demands deep understanding of its key elements (e.g., knowledgeable others) and the underlying mechanisms (Lewis, 2016; Lewis & Hurd, 2011). Finally, we propose one question for researchers who are interested in theory construction: How may IMPG be modified with broader capabilities to capture deeply dynamic, interconnected, and collective PD activities like LS?

Recent LS research in mathematics education per the modified IMPG model.
Footnotes
Acknowledgements
The authors are grateful to Caroline Bryn-Mawr Driscoll for searching for the articles. Thanks also go to Rahma F Goran for helpful assistance with the manuscript development.
Contributorship
Yeping Li oversaw the entire development of this study and led discussions about the research focus. He reviewed and edited each draft, providing critical suggestions to ensure the study's direction. Meixia Ding collected the articles, analyzed them using the IMPG model, and synthesized the findings. She wrote the initial manuscript with contributions from other authors. Rongjin Huang guided the selection of the IMPG model and discussed with Ding how it relates to LS. He authored a section of the literature review on LS and offered critical feedback on each draft, along with recommended resources. Catherine Pressimone Beckowski conducted the initial screening and coding of each article. She wrote a portion of the Literature Review and Methods, as well as the Overview of findings, and also carefully proofread this paper. Xiaobo Li participated in all team discussions, contributed valuable ideas, and reviewed and edited the manuscript drafts. All authors read and approved the final manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Correction (December 2023):
Article updated to correct 3rd author Catherine Pressimone Beckowski's affiliation.
