Abstract
Lesson study (LS) is a professional development practice that has mainly remained conducted by elementary, secondary, and preservice schoolteachers. However, in recent years, different studies have explored its practice among higher education (HE) faculty members. This article presents the first systematic review on LS among HE faculty members. Twenty-one studies published until December 2019 were analyzed. Among others, findings regarding reveal that (a) most of these studies are of U.S. origin and of linguistic and mathematics disciplines; (b) few faculty members participated in these studies; (c) most LS-related references used are not contextualized in HE; (d) beneficial outcomes of LS in the design of the lessons, the participants’ pedagogical knowledge and the participants’ approach to teaching; (e) mixed results regarding the participants’ reflection and collaboration, and (f) less positive outcomes about organizational issues when conducting LS. I discuss these results and present future research lines and limitations of this study.
Introduction
Lesson study (LS) is a Japanese originating teachers’ professional development (PD) practice spread all around the globe (Lewis & Lee, 2017). Even if until now LS has mostly remained conducted among elementary and secondary teachers, we have also begun to observe that its practice has seeped into the context of the PD of higher education (HE) faculty members. Nevertheless, we lack a global perspective of what has been done and of its results. In response, this study offers the first systematic review of the literature on LS among HE faculty members, aiming to gain understanding on the features of studies published, their bibliographic and bibliometric data and, in especial, the results they report regarding the practice and effects of LS.
Lesson Study
LS, the English translation of the Japanese concept jugyou kenkyuu (授業研究), is a teachers’ PD practice with roots in the Japan of the Meiji era (1868–1912; Makinae, 2019; Nagashima, 2019) and a central component of the in-service training (kounai kenshuu) of today’s Japanese schoolteachers (C. Fernandez & Yoshida, 2004; Stigler & Hiebert, 1999).
The origin of LS is connected to the opening of Japan to the Western world and to the modernization of the country, and its practice is tied to broad reforms that included the modification of the educational system (Collins, 1975) and brought to Japan foreign pedagogical methodologies, among which, the Herbartian five-steps system is often highlighted (Ichimiya, 2011; Sato, 1991) for its procedurals links to the exercise of LS.
At present, LS is carried out in over 30 countries (Lewis & Lee, 2017) and most literature grants recognition for this international popularization far from Japan to the work of Stigler and Hiebert (1999) and the unpublished doctoral dissertation of Yoshida (1999). The internationalization of LS at the end of the 1990s and the beginning of the 2000s brought LS to educational contexts that differed from the Japanese. Because of this, there have been observed misconceptions in its practice (Fujii, 2014) and variations, being the most relevant the learning studies, a popular approach that emerged in the early 2000s combining LS and design experiments (Pang & Marton, 2003). In result of this, although arguable, Seleznyov’s (2018, p. 223) pointed out that “there is not an internationally shared understanding” of LS.
LS is a practice through which groups of teachers cooperate around the design of a lesson plan (Fujii, 2016) with the main goal of improving students’ learning (Lewis, 2009; Murata, 2011; Verhoef et al., 2013; Yoshida, 2012). LS offers a space for teachers to experiment (Fujii, 2015) in a cyclical process consisting of: (a) planning and designing a lesson plan—named research lesson (Takahashi & McDougal, 2016) or study lesson (C. Fernandez & Yoshida, 2004)—usually formulated in terms of students’ learning, related to topics that teachers find interesting to delve into (Rock & Wilson, 2005), and putting into practice a throughout study and analysis of teaching materials (Sarkar Arani, 2017; Takahashi et al., 2005), (b) in-class instruction of the lesson plan by one (or more) of the teachers, and observation of the instruction by other members of the group. Observers gather evidences regarding how the lesson unfolds and about students’ reactions, performance and learning (2009); (c) postlesson discussion (Takahashi & McDougal, 2016) that involves the joint reflection by all members of the group regarding what happened during the earlier stage and in connection to their goals and the design of the lesson plan. By doing this, the group seeks to revise and improve the lesson plan in order to enhance their students’ learning experiences; (d) optional (Lewis, 2009; Weeks & Stepanek, 2001) in-class instruction and observation of the revised lesson plan to a different group of students; and (e) optional sharing and disseminating to the educational community of a report with the lesson plan that also includes its rationale and the teachers’ reflections (Hurd & Licciardo-Musso, 2005; Lewis, 2009).
These different stages and their features make of LS a practice that responds to what literature tells us in relation to the scholarship of teaching and learning, as it implies teachers inquiring about their work (Boyer, 1990) to maximize students’ learning (Trigwell & Shale, 2004), it uses reflection as a key component (Schön, 1995), and it opens to public scrutiny the teaching practice (Kreber, 2013).
Lesson Study and Teachers’ Professional Development
The international spread of LS has a lot to do with the advantages that research keeps reporting in relation to teachers’ learning and PD. In this sense, the focus on students’ learning and the combination of cooperative designing, observing and reflecting around a lesson plan has brought earlier studies to define LS as a useful approach for teachers to develop professionally and to improve the quality of their teaching (Bocala, 2015; Dudley, 2013; Hiebert & Stigler, 2017).
Previous literature has pointed out several benefits around the practice of LS and, among them, we find that research has referred to: (a) its potential for curriculum reform, development and innovation (Kuno, 2018; Lewis & Takahashi, 2013); (b) its usefulness for improving teachers’ practice, instruction (Hiebert & Stigler, 2017; Lewis et al., 2006) and efficacy (Chong & Kong, 2012); (c) its utility for the development of a professional knowledge base (Stigler & Hiebert, 1999) in terms of educational strategies (Rock & Wilson, 2005), content knowledge (Perry & Lewis, 2009), and pedagogical content knowledge (Coenders & Verhoef, 2019); (d) its promotion of a student-centred approach to teaching (M. L. Fernández & Zilliox, 2011; Lee Bae et al., 2016; Takahashi & McDougal, 2016) and of a greater insight of students’ learning needs (Chassels & Melville, 2009; Weeks & Stepanek, 2001).
These positive outcomes emerge through collaboration and conversation with peers (Bocala, 2015; Cajkler et al., 2014), thanks to the possibility that LS offers teachers to professionally interact with their colleagues (Vrikki et al., 2017). On account of this, LS is also considered a practice that contributes to improve interpersonal relationships among teachers—as it offers them the chance to appreciate the potential and needs of their peers (Lewis, 2009)—and that reduces feelings of isolation and increases professional confidence (Rock & Wilson, 2005).
In resume, earlier findings expose that LS offers teachers the chance to deepen and polish their personal and practical knowledge—as defined by Schön (1983)—leading to a transformative learning (Wong, 2018) that affects the personal dispositions, mental habits, beliefs and routines (Lewis & Perry, 2014) of those who participate in its practice. Nevertheless, these findings come from studies conducted among elementary and secondary teachers or preservice and prospective schoolteachers. Hence, we cannot assume that they all occur and that they do it in the same way when LS is conducted in HE among faculty members.
The goal of this systematic review is to examine and analyze what literature tells us about the practice and results of conducting LS among HE faculty members. Thus, this research adds to what previous theoretical papers have referred on this topic (e.g., W. Cerbin & Kopp, 2006; Chenault, 2017; Norton, 2018; Wood & Cajkler, 2016) and complements recent literature reviews on the benefits of LS among in-service and preservice teachers (Xu & Pedder, 2014), its effects on schoolteachers and students (Ming & Yee, 2014), its effectiveness for teachers’ learning (Willems & Van den Bossche, 2019), its use in the training of mathematics prospective secondary schoolteachers (da Ponte, 2017), its use as a PD activity for language teachers (Uştuk & Çomoğlu, 2019), the challenges that its translation to contexts different than the Japanese brings (Seleznyov, 2018), its benefits, difficulties, and conditions in implementing it for preservice teachers’ training (Kanellopoulou & Darra, 2019), and on how observation and learning are discussed in studies of LS in initial teacher education (Larssen et al., 2018).
Method
Focus of the Research
To carry this study out, I followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines (Moher et al., 2009). I conducted an ongoing systematic literature search that ended in December 2019. The search was carried out on the electronic databases of EBSCOHOST CINHAL, Educational Resources Information Center (ERIC), SCOPUS (particularly relevant for this topic of research as it is where the International Journal for Lesson and Learning Studies [IJLLS] is indexed) and Web of Science. The search was also conducted on Google Scholar, which offers significant additional coverage compared to Web of Science and SCOPUS (Martín-Martín et al., 2018) and is the most comprehensive academic search engine and bibliographic database according to Gusenbauer’s (2019) recent scientometric study. To conduct the search, Boolean operators (OR and AND) were used to combine the key search terms (both in English and Spanish language) identified in Table 1.
Key Search Terms
Note. LS = lesson study; HE = higher education.
Procedure
A single database was created to download and identify all the hits. This allowed to pinpoint most of the potential duplicates when they were downloaded, and it is also the reason for the low number of duplicates later removed (as Figure 1, with the procedure followed, shows). Studies were initially removed attending to their relevance based on an initial title and abstract screening. A second screening was necessary to examine the context and participants sections of the studies, as it was found that several articles did not offer clear information about these aspects (crucial for this review) within the abstract. Following this, the reference lists of studies that passed the second screening was reviewed in order to conduct a backward snowballing search (Jalali & Wohlin, 2012), a method useful to find less visible literature (Greenhalgh & Peacock, 2005). Additional studies identified through backward snowballing were also screened by examining their title, abstract, context and participants’ information. Finally, remaining studies were fully read and assessed for inclusion in this systematic review attending to different inclusion criteria (see Table 2) and quality indicators (see Table 3).

Systematic review procedure followed.
Inclusion and Exclusion Criteria
Note. LS = lesson study; HE = higher education.
Quality Indicators and Related Questions
Inclusion Criteria
As Figure 1 shows, studies eligible for inclusion in this systematic review were screened and selected attending different inclusion criteria (see Table 2) that were set for different reasons.
Regarding the time period, 2019 (included) was set as the final year for the review for being the last full year that could be covered. Certainly, all reviews need to stop at a certain moment, and, in consequence, this might leave aside studies in the process of being published. To the knowledge of the author, at least a couple of studies are about to be published early in 2020 that, potentially, could have been included in this review: Appelgate et al. (2020) and Hervas et al. (2020). As for the initial year, 1997 was set because, even if according to a great number of previous literature (e.g., Bjuland & Mosvold, 2015; Fujii, 2016; Lewis, 2009; Shimizu & Chino, 2015; Takahashi & McDougal, 2016), the international popularization of LS took place after the studies of Stigler and Hiebert (1999) and Yoshida (1999), we find earlier literature around that date (e.g., Lewis & Tsuchida, 1997).
As for the study focus, it was necessary to be specific in terms of the participants on the studies. It is common to find research exploring the practice of LS among student teachers (in consequence, HE students) and preservice and prospective teachers with, in occasion, the participation of HE faculty members as facilitators of the process; however, the focus of this research was the analysis of studies describing the practice of LS among those with a teaching role in HE, faculty members and/or teaching assistants (or graduate teaching assistants).
Finally, in relation to the literature included in the review, any potential gray literature was left aside for two reasons: (a) because, even if it might contribute to reduce publication bias (Paez, 2017), it also demands of sensitivity analyses as it might only offer preliminary findings, adding higher risks of bias in its results (Schmucker et al., 2017) and (b) because an initial screening and review of a broad sample of these records (mainly, conference papers) revealed a high percentage of manuscripts that did not satisfy the quality standards that were set (see Table 3). However, regarding conference papers, my search and preliminary analysis has revealed a relevant cluster of studies at Indonesian universities (e.g., Joni, 2019) that deserves further analysis regarding the practice and results of LS for that specific context.
Also, only empirical research was included, leaving aside theoretical manuscripts connecting LS and HE (e.g., W. Cerbin & Kopp, 2006; Chenault, 2017; Norton, 2018; Wood & Cajkler, 2016). The reason for this was because, as much as these conceptual papers could be based on practical experiences, they did not report specific results and methods that could be scrutinized.
Last, thesis dissertations were not included in the review for two reasons: (a) because dissertations are reviewed and published following criteria different to that of journal papers and (depending on the case) book chapters, and (b) because dissertations could have been later published as journal papers or book chapters, which would increase the chance of including in the review redundant, if not duplicate, results. Such is the case of the dissertations of Lampley and of Dillard, who later published a related journal paper (Lampley et al., 2017) and a book chapter (Dillard, 2019). Both studies passed the quality assessment and were included in the final synthesis of this review. The bibliographic analysis also revealed that a thesis dissertation (Tasker, 2014) was cited in Dillard (2019) as an LS in HE-related reference. However, Tasker conducted his study at a private language school, did never use the terms “higher” or “tertiary” education in his thesis, and wrote about how one of the participants in his study was teaching “intermediate level students who had recently finished high school (aged 16–21 years) and were either preparing to take university entrance exams or find jobs” (Tasker, 2014, p. 92). Because of this, this study was not considered an example of LS being conducted among HE faculty members. Finally, leaving doctoral dissertations outside of the analysis has excluded from the review the work of Schmies (2011) and Lucas (2014); these dissertations did not lead to a published study but, nonetheless, deserve to be cited.
Quality Assessment
Studies selected attending to the inclusion criteria were also assessed for quality. This was done using a series of 13 questions (see Table 3) that were established by the author and that were responded using a 0– to 2–point scale, being 0 = no, 1 = not entirely, and 2 = yes. These set of questions aimed to address five different indicators that were considered relevant to assess the rigor of the studies: clarity, consistency/congruity, data collection and features of analysis, discussion and conclusions, and ethical issues.
The highest possible score for the quality assessment would be 26 points (2 points for each of the 13 questions). Studies were included in the final database for analysis when their score was 19 or greater. As Table 4 shows, any study had two fulfil two relevant conditions in order to pass the quality cut and be included in the final qualitative synthesis: (a) to not receive 0 points in more than three questions and (b) to receive 2 points in at least six of the questions (half minus one of the 13 total questions). These conditions were decided considering that a study with four (30.8% of the 13 questions) or more questions receiving zero points would be missing relevant elements to be adequately analyzed and its results taken into consideration, and that the less to be expected of a published study was that at least half of the quality questions were responded with a yes (2 points). Being odd the number of questions (half of the questions would be 6.5), it was decided to set the bar on six questions with two points in order to allow more studies at the quality border to be included. As a result of this, studies would only get the minimum score (19 points) in the four cases shown in Table 4.
Only Possible Combinations to Achieve the Minimum Quality Score (19)
Given these inclusion criteria and quality assessment indicators, different works were excluded from the final review. That is the case of most book chapters. Even if it was the interest of the author to also include empirical studies found in the form of book chapters (e.g., Dillard, 2019; Garfield & Ben-Zvi, 2008; Kamen et al., 2011; Mohd-Yusof et al., 2019), only one (Dillard, 2019) was finally incorporated into the final qualitative synthesis, as it offered a clear, explicit, and extended explanation of the methods and procedures for data collection and analysis, making it possible to answer positively most of the questions for quality assessment of Table 3. The exclusion of the final analysis of the rest of book chapters does not speak ill of their quality; it is merely a consequence of the chosen quality indicators.
Analysis
Following the previous procedure, descriptive data extraction of the studies included in the qualitative synthesis was done using an Excel database in which the author incorporated the information included in Table 5. This process made possible to offer results in relation to the features and bibliographic and bibliometric data of the studies analyzed. As for the analysis of their results, this was done through a conventional content analysis (Hsieh & Shannon, 2005) in an coding process based on the analytical procedures of strong theory (Strauss & Corbin, 1998); this analysis resulted into the emergence of five different themes regarding the practice and effects of LS among HE faculty members.
Data Extracted From the Selected Studies
Note. LS = lesson study; IF = impact factor; HE = higher education.
Results
The systematic review of the literature uncovered 21 studies that met the inclusion criteria and passed the quality assessment to be included in the final qualitative synthesis. Studies in alphabetical order are included in Table 6.
Studies Included for the Final Qualitative Synthesis
Note. IJLLS = International Journal for Lesson and Learning Studies; SJR = SCImago Journal Report; LS = lesson study; JCR = Journal Citation Report; EFL = English as a foreign language.
Following Table 5, used for data extraction, findings in relation to the country of origin of the literature reveal that, out of the 21 studies, 57.14% (n = 12) were carried out in the United States, 19.05% (n = 4) in Turkey, 9.52% (n = 2) in Indonesia, 4.76% (n = 1) in England, 4.76% (n = 1) in Ireland, and 4.76% (n = 1) in Spain (see Table 6).
As for the year of publication, Table 6 also shows that, after the publication in 2011 of the first study included in this review (Dotger, 2011), during the past 4 years (2016–2019) the number of studies published has grown and maintained similar figures (see Figure 2), in spite of never getting to double figure. Nonetheless, I consider it is ethically necessary to insist that there exist studies or manuscripts reflecting the practice of LS among HE faculty members before 2011 (e.g., Becker et al., 2008; Voetmann et al., 2007), not included in this review for not meeting the inclusion criteria or passing the quality assessment.

Evolution in the number of studies published.
Also, the combination of the data about the country of origin and the year of publication reveals that the evolution since 2016 is tied to the rise of studies from contexts other than the North American; as Table 6 evinces, from 2011 to 2016, all studies were set in the United States, but since 2016, studies set in the United States represent a 40% (n = 6) out of the 15 studies analyzed during that period.
Regarding the institutional origin of the literature, 90.48% (n = 19) was contextualized at universities (graduate or undergraduate studies), while 9.52% (n = 2) took place at Turkish university preparatory programs, also considered part of the HE system in Turkey (British Council, 2015). In spite of this, the analysis reveals that 57.14% (n = 12) of studies did not state the HE institution where LS was carried out or were unclear about it (Bayram & Bıkmaz, 2018; Bayram & Canaran, 2019; Coşkun, 2017; Demir et al., 2012; Deshler, 2015; Dillard, 2019; Dotger, 2011; Lampley et al., 2017; Gok, 2016; Refaei et al., 2017; Soto et al., 2019; Wood & Cajkler, 2016). Among the other 42.86% (n = 9)—studies that indicated the institution where LS was carried out—we find both Indonesian studies (at the University of Malang and Universitas Muhammadiyah Surakarta), the Irish one (Maynooth University), the Spanish one (University of Cantabria and University of Oviedo), and five U.S. studies conducted at the University of Wyoming (Burrows & Borowczak, 2019) and the University of Wisconsin System (Marshik et al., 2015; Murray & Knowles, 2014; Samaranayake et al., 2018; Strangman & Knowles, 2012), being the University of Wisconsin-La Crosse the only institution where more than one study was set (Marshik et al., 2015; Murray & Knowles, 2014; Strangman & Knowles, 2012).
As for the authors of the literature reviewed, Table 6 reveals that only two authors appear in more than one study: E. Knowles (Murray & Knowles, 2014; Strangman & Knowles, 2012) and İ. Bayram (Bayram & Bıkmaz, 2018; Bayram & Canaran, 2019). Other than this, all studies were written by different researchers and always by four or less authors (M = 2.43; SD = 1.12).
Figure 3 shows the combination of different related areas and disciplines in which LS was conducted and includes 20 out of the 21 studies, as Samaranayake et al. (2018) did not specify the disciplinary origin of the participants. This figure reveals that languages and writing (33.33%; n = 7) and mathematics and statistics (23.81%; n = 5) are the most representative disciplines in which LS was carried out.

Disciplinary fields in which lesson study (LS) was conducted.
In relation to the number of HE faculty members conducting LS, Table 6 reveals that, with the exception of a Turkish and the Spanish study (with 14 participants), participants were always less than seven (M = 5; SD = 3.45). Again, the analysis of this feature leaves aside the study by Samaranayake et al. (2018); their research—a survey among schoolteachers and college teachers who conducted LS in the past—included 27 participants from colleges, but differs greatly from the rest of studies that directly analyze the put into practice of LS. In addition, regarding the type of participants, 80.95% (n = 17) evinced HE faculty members carrying out LS, while 19.05% (n = 4) addressed the put into practice of LS by teaching assistants or graduate teaching assistants with teaching load.
A second category for analysis in this review had to do with the bibliographic and bibliometric data of the literature. As shown in Table 6, among the 21 studies analyzed, 95.24% (n = 20) were journal articles, with only one case (Dillard, 2019) being a book chapter. These 20 journal articles were published in 15 different journals, with three journals accumulating 42.11% of the studies: 20% (n = 4) in the IJLLS, 10% (n = 2) in the Journal of University Teaching and Learning Practice, and 10% (n = 2) in Teaching in Higher Education.
The analysis of the Journal Citation Report (JCR) and SCImago Journal Report (SJR) impact factor (IF) of the 14 journals in the years all 20 articles were published shows that: (a) 10% (n = 2) were published in journals with an IF in SJR the year before being published (Burrows & Borowczak, 2019; Soto et al., 2019), but the information for 2019 had yet to appear when this research was conducted, (b) 15% (n = 3) in journals with an IF both in JCR and SJR the year they were published, (c) 35% (n = 7) in journals with an IF in SJR, and (d) 40% (n =8) in journals without an IF in SJR or JCR at the time.
In relation to this, the analysis of Table 6 also reflects that all articles published in journals with an IF both in SJR and JCR are from the United States, and that among the articles published in journals without an IF, 50% (n = 4) are Turkish studies, 37.5% (n = 3) are from the United States, and 12.5% (n = 1) from Indonesia.
As for the analysis of the references cited, Table 6 evinces that the total number of recognizable (by language) LS citations is of 193; 26.94% (n = 52) of these are citations related to the practice of LS in HE. However, Table 7 shows that the total number of different LS citations in each study varies greatly (SD = 6.02), and it also reveals that, as a mean, 26.99% (n = 2.48) of the LS citations in each study (n = 9.19) have to do with the practice of LS in HE.
LS Citations and LS in HE-Related Citations
Note. LS = lesson study; HE = higher education.
Also, Table 8 reveals the most-cited LS references across the 21 studies and evinces that seven studies accumulate 27.46% (n = 53) of all 193 LS citations. Additionally, Table 9 allows to observe that only one reference (W. Cerbin & Kopp, 2006), a theoretical article describing LS as a practice for building pedagogical knowledge and enhancing the teaching practice in HE, accumulates 25% (n = 13) of the 52 citations in relation to LS in HE. Moreover, as seen in Table 9, five studies represent 50.01% (n = 26) of the citations in relation to LS in HE.
Most Cited LS References
Note. LS = lesson study; HE = higher education.
Most Cited LS in HE-Related References
Note. LS = lesson study; HE = higher education.
Finally, a deeper analysis of the LS references and their authorship reveals that the most cited authors are C.C. Lewis, with 15 different publications cited and 20.73% (n = 40) of the total number of LS citations, R. Perry, with seven different publications cited (all of them with C. C. Lewis) and 10.36% (n = 20) of the total number of LS citations, and C. Fernandez, with six publications cited and 9.33% (n = 18) of the total number of LS citations. With them, we find W. Cerbin, who, with a web reference and four theoretical publications—two of them only found as self-citations in an article he co-authorized (Marshik et al., 2015)—accumulates 11.92% (n = 23) of the total number of LS citations and 44.23% (n = 23) of the citations in relation to LS in HE.
The third category for analysis in this review had to do with the results regarding the put into practice of LS among HE faculty members. The content analysis of the literature evinces five main themes (teaching and learning approach, lesson design, collaboration, participants’ knowledge and practice, and LS as a practice) and different findings.
First, the beneficial outcome most reported across the studies has to do with how LS promoted a shift in the approach of the participants from teaching to learning (Bayram & Canaran, 2019; Gok, 2016). Different studies show how conducting LS contributed to generate reflection on pedagogy (Wood & Cajkler, 2016) and to discuss theories and beliefs about teaching and learning and about the learners (Calvo et al., 2018; Dotger, 2011) and the hurdles that they face (Strangman & Knowles, 2012). As a consequence, we find that, through LS, HE faculty members improved in understanding and addressing in a better way students’ needs, their thinking processes and sources of their confusion (Demir et al., 2012; Murray & Knowles, 2014; Gok, 2016; Soto et al., 2019).
Despite this, not all studies report such favorable outcomes. Lampley et al. (2017) found distances between what teachers said and what they were doing. Demir et al. (2012) reported that HE faculty members maintained a teacher-centered approach and did not really engage in self-reflection about teaching and learning. Finally, in a similar vein, Deshler (2015) showed how teaching assistants conducting LS mostly evinced a descriptive level of reflection and no signs of higher levels of reflection.
Second and in close relation to the previous theme, the focus on students and their learning seems to have consequences in the lessons that the participants designed. In this regard, it has been found that this shift contributed to changes (Calvo et al., 2018) and more interesting activities (Coşkun, 2017), to the creation of more meaningful teaching and learning experiences (Marshik et al., 2015; Soto et al., 2019), and to, in general, the design of better learning processes (Khotimah & Masduki, 2016).
Third, another beneficial outcome that we find in the literature has to do with collaboration during LS. Samaranayake et al. (2018) reported a strong connection between collaboration and teachers’ change, Gok (2016) and Bayram and Bıkmaz (2018) evinced how participants learnt from each other, Calvo et al. (2018) pointed out the emergence of a collegial way of understanding the practice among the participants, and Dotger (2011) reported how the practice of LS led graduate teaching assistants to generate a community of practice.
However, as it happened with reflection, not all studies found such positive outcomes regarding collaboration. This less optimistic perspective is mainly reported by Demir et al. (2012). In their research, the authors found a lack of cohesion and consensus, authority issues and resistance to offer and accept suggestions and critique from colleagues. Demir et al (2012) and Dotger (2011) also evinced that LS practitioners were reluctant or directly refused being observed in the classroom and video-recorded. And, even further, Dotger (2011) revealed that the alternative ways of thinking that LS might promote generated discomfort among graduate teaching assistants, as they had to work with experienced faculty members who did not share them.
Four, literature also shows that a sustained practice of LS among HE faculty members had an impact on the participants’ knowledge and practice. In this sense, different studies report that LS (accompanied by workshops) promoted teachers’ conceptual development (Dillard, 2019), contributed to generate changes in teachers’ pedagogical content knowledge (Lampley et al., 2017), improved time management skills (Bayram & Bıkmaz, 2018), and generated confidence building (Gok, 2016), but also self-validation (Demir et al., 2012).
Last, the fifth theme that emerged in the results of the literature reviewed has to do with LS as a practice. Different studies explicitly state how LS was perceived by the participants as a solid practice for their own PD (Coşkun, 2017) and that it generated a certain degree of interest in conducting research (Bayram & Canaran, 2019). However, it is in relation to this theme that the review evinces more drawbacks. The put into practice of LS among HE faculty members faced difficulties in relation to logistical issues (Dotger, 2011), was perceived as very demanding (Bayram & Bıkmaz, 2018; Bayram & Canaran, 2019) and time consuming (Demir et al., 2012; Gok, 2016), and was considered too rigid, leaving little space to creativity (Demir et al., 2012). To face different of these setbacks, Bayram and Canaran (2019) suggested the use of mentorship to facilitate the process, and Gok (2016) demanded management support.
Discussion
Results of this systematic review show that, when compared with what we find in research with schoolteachers and prospective teachers—see, for example, Fujii (2014) and Lewis and Lee (2017), and the reviews of Kanellopoulou and Darra (2019), Seleznyov (2018), and Uştuk and Çomoğlu (2019)—there is less diversity regarding the country of origin of the studies analyzed in this review. The put into practice of LS among HE faculty members is still in the first steps of its international expansion, decades later of its popularization in earlier educational stages after the work of Stigler and Hiebert (1999). Nevertheless, it stands out that the number of countries that appear in this review (seven) is similar (da Ponte, 2017; Larssen et al., 2018) or even exceeds (Ming & Yee, 2014; Willems & Van den Bossche, 2019) what we find in earlier reviews on LS.
Results show that the United States remains as the most relevant international source of literature and concur with what we find in previous reviews (da Ponte, 2017; Kanellopoulou & Darra, 2019; Larssen et al., 2018). At the same time, results also reflect the prominence of U.S.-originating studies among the references cited. This U.S. origin speaks not only of the work of U.S.-based researchers and institutions but also of the language of publication of the studies included, a limitation of this research that I address later.
In relation to the country of origin of the studies, two other situations deserve highlighting. First, the presence of two Indonesian studies gives visibility to what seems to be a relevant trend at different Indonesian universities and represents many studies (mainly conference papers) that did not meet the inclusion and quality criteria. Second, the lack of studies from Japan or Hong Kong, with a relevant presence in earlier reviews (Kanellopoulou & Darra, 2019; Larssen et al., 2018; Ming & Yee, 2014; Uştuk & Çomoğlu, 2019). Two reasons can explain this situation: in relation to Japan, the search of studies in a language different to Japanese and the exclusion of conference papers (e.g., Kato, 2011), and, regarding Hong Kong, the exclusion of studies about learning studies, a popular variation of LS that incorporates elements from variation theory that emerged there (Marton & Pang, 2006).
From an institutional standpoint, it has been exposed that most researchers maintained the anonymity of the institutions in which LS took place. Among those that reveal this data, the University of Wisconsin System is the only institution that appears repeatedly. Without doubts, this has to do with the task of Professor Cerbin, who started a LS project at the University of Wisconsin-La Crosse during the 2000s, as his theoretical paper (W. Cerbin & Kopp, 2006)—the most cited reference in the studies analyzed (see Table 8)—evinces.
However, despite the work of Cerbin and his colleagues during the 2000s, we need to wait until 2011 to find an empirical study (Dotger, 2011) that passes the inclusion and quality criteria of this research; this supports the words of Watanabe (2011), who considered that the practice of LS in HE was still in “unchartered water” (p. 175). As Figure 2 shows, this situation was maintained until 2016, when the number of studies underwent a relevant growth in connection to the appearance of studies in contexts other than the United States.
From a disciplinary standpoint, mathematics is a relevant field when LS is conducted among HE faculty members, similarly to what happens in elementary and secondary education, where it is the most common discipline (e.g., Fujii, 2018; Huang et al., 2019; Takahashi & McDougal, 2016). However, over mathematics, linguistics is the area in which more studies have been conducted; this is less common at other educational contexts, although we find examples in earlier literature reviews on LS (Kanellopoulou & Darra, 2019; Ming & Yee, 2014; Uştuk & Çomoğluon, 2019; Willems & Van den Bossche, 2019).
Finally, in relation to the features of the studies analyzed, research papers included in this review reported a low number of participants (see Table 6); differently, earlier reviews at other contexts included studies with over 50 participants (da Ponte, 2017; Kanellopoulou & Darra, 2019; Ming & Yee, 2014; Willems & Van den Bossche, 2019). This difference adds to the argument that the practice of LS among HE faculty members is still at an early stage in its expansion and popularization.
A second category of analysis was related to the bibliographic and bibliometric data of the studies analyzed. Results reveal that only two researchers appeared more than once (twice) as authors. This lack of continuity (in terms of published studies) combined with the low number of publications that we still find in relation to the topic have different implications in what and who researchers cite. First, as Table 7 shows, most of the LS references cited did not report the practice of LS in HE; hence, researchers discuss their results using references from other educational stages. Second, even when they use a reference related to LS in HE, it is often a theoretical work, rather than the expression of actual empirical results. And third, as Table 8 and results about the most cited authors evince, a few studies and authors accumulate a high percentage of citations, an aspect that needs to be critically considered when we talk about LS, as it could lead to the misconceptions revealed by Fujii (2014), Seleznyov (2018) and Wolthuis et al. (2020); this situation has brought some authors to explore LS from different conceptual perspectives (e.g., Hervas & Medina, 2020; Saito & Atencio, 2013), responding to the call of Lewis et al. (2006) for further theoretical development to increase comprehension of LS.
In relation to the journals where the studies analyzed were published, findings show that the IJLLS, the journal of the World Association of Lesson Studies, acts as a publishing niche for the highest percentage of papers in this review. Other than this, the IF where the studies were published deserves further discussion. Even if the IF as an index of quality for individual papers has been debated for decades (Vanclay, 2012), findings in this regard are relevant because they reveal that until 2019 only papers from the US had been published in journals with an IF both in JCR and SJR. To understand the reasons for this situation, more than about the quality of the papers (in this review, they were all scrutinized under the same parameters), we should take into consideration how much journals favor research from a context, how publishing in journals with an IF is tied to the academic career at different contexts, and, in especial, how publishing in English language might affect. These are plausible reasons to explain why only papers originating in the United States have been published in what the scientific community recognizes as journals of the highest quality. About the latter consideration (publishing in the English language), it has already been argued that it might act as a limitation for non-English authors (González-Alcaide et al., 2012); however, Mueller et al. (2006) showed that linguistic bias is associated with language rather than country, which in turn should make us wonder why we only find two other studies in this review from countries in which English is an official language (England and Ireland), being areas where LS is broadly practiced at other educational stages (e.g., Dudley, 2013; Vrikki et al., 2017).
Finally, the third category for analysis was related to the results reported in the studies. Globally, literature reports a favorable view regarding the effects of LS in the practice and PD of participants, being the study of Demir et al. (2012) the one with less positive outcomes.
The most reported effect has to do with a shift toward a student/learning-centered approach to teaching. This is similar to what research with elementary and secondary education teachers has reported (Dudley, 2013; M. L. Fernández & Zilliox, 2011; Lee Bae et al., 2016; Takahashi & McDougal, 2016) and is congruent with the focus of LS: students and their learning (Lewis, 2009; Murata, 2011; Verhoef et al., 2013; Yoshida, 2012). This shift has to do with reflection and discussion among the participants about their teaching practice and about learning; in this sense, results support what research tells us in relation to how LS participants learn in terms of their beliefs (Lewis & Perry, 2014) through conversations with colleagues (Bocala, 2015) when they analyze the lessons they have designed (Lumpe et al., 2012). Hence, as Lewis and Tsuchida (1999) described, LS can affect the philosophy of teaching of the participants, also, when they are HE faculty members.
Regarding collaboration, findings also reveal positive outcomes and endorse what an earlier review (Uştuk & Çomoğlu, 2019) has shown in relation to how LS contributes to improving working cultures. Collaboration is a key feature in LS (Takahashi & McDougal, 2016), because collaborating contributes to enhancing teaching practices (Danielson, 2008) and because, through collaboration, professional conversations, become learning tools (Readman & Rowe, 2016). In this manner, as Hervas and Medina (2020) exposed, collaboratively applying and combining their reasoning, individual and group knowledge generate interactive learning paths for the participants in LS.
Results also display the positive effects that LS has over the participants’ conceptual and pedagogical content knowledge. This outcome has also been reported in earlier studies among schoolteachers and preservice schoolteachers (Coenders & Verhoef, 2019; Meyer & Wilkerson, 2011; Perry & Lewis, 2009) and allows for transferring to the case of LS conducted by HE faculty members the words of Dudley (2013) regarding how LS contributes to elaborate on pedagogical reasoning by explicating tacit knowledge, and of Cajkler and Wood (2016) in relation to how it makes it possible to develop pedagogical literacy; both elements are of great relevance in the context of HE, where faculty development initiatives try to give answer to the international concern about the teaching quality (Jacob et al., 2015).
All these positive outcomes are connected to another result of this review with an immediate impact on teaching and learning. As it has been shown, different studies report that, through LS, lessons and activities became more engaging for their students. This direct effect over the teaching practice has also been reported in earlier reviews (da Ponte, 2017) and studies (Kuno, 2018; Schipper et al., 2018) and contributes to respond to a major preoccupation in HE, related to students’ engagement (Rocca, 2010).
Yet, findings of this review have also revealed drawbacks in the practice of LS among HE faculty members and divergences on the results reported by earlier studies.
Even if most studies have evinced a shift toward a student/learning-centered approach, Demir et al. (2012) and Lampley et al. (2017) showed that this might not always be the case and that participants had room to modify their approach at a greater degree, as it has also been found among schoolteachers (Amador & Weiland, 2015; Bjuland & Mosvold, 2015; Larssen et al., 2017).
Results in this regard are related to other findings about how the participants engaged in reflection. Results evince a dichotomy in relation to how LS contributed to generate reflection. While Wood and Cajkler (2016) revealed a positive effect, Demir et al. (2012) and Deshler (2015) referred to low and descriptive levels of reflection and self-reflection. These last results have already been observed at other educational levels in, for example, Kvam (2018), who reported that primary education teachers often conducted descriptive and superficial analysis. Quality and levels of reflection deserve further exploration, because, as Loughran (2010) pointed out, mere descriptive reflection with no connection to further pedagogical actions, might not lead to engage in learning.
Also, even if collaboration has already been discussed in terms of its positive impact, Demir et al. (2011) and Dotger (2011) also revealed difficulties in relation to reaching consensus and accepting feedback, and the reluctancy of some to be observed and video-recorded, an aspect also evinced by Hervas et al. (2020). These situations might have consequences over reflection, because they diminish the chances to put into practice the talk type that Dudley (2013) defined as disputational, and reflect that participants might have been unhabituated to comment and receive critique from colleagues. This last aspect is not unusual in HE, often described as an excessively individualist context where academic isolation is a common trait (Calvo et al., 2018), but it has also been reported in relation to the put into practice of LS in other educational levels (Chassels & Melville, 2009).
Context and lack of habit discussing the teaching practice might also explain the authority issues and discomfort between participants at different career stages reported by Demir et al. (2012) and Dotger (2011). This situation had already been reported as a critical aspect in earlier reviews that urged to take precautions to adjust power and working relationships among those conducting LS (da Ponte, 2017; Uştuk & Çomoğlu, 2019); as Tschannen-Moran (2001) stated, trust and collaboration are related.
Results show that mentorship and management support are also pointed out as possible strategies to overcome some of these difficulties. In LS—mainly during the postlesson discussion—is not unusual to invite an outside expert, the knowledgeable other (Takahashi, 2014). Earlier studies have already suggested that this figure has an impact on participants’ noticing (Amador & Carter, 2016), contributes to facilitate deeper reflection (Lee Bae et al., 2016), and stimulates interthinking among them (Bjuland & Helgevold, 2018). Hence, knowledgeable others can help participants getting through the difficulties described and could also contribute to overcome the tendency to self-validation observed by Demir et al. (2012) and that we also find in Kvam (2018).
Finally, one of the major drawbacks found has to do with LS’s organization and with how demanding and time consuming it might be. However, this is not exclusively on account of these studies being conducted in HE; earlier reviews (da Ponte, 2017; Kanellopoulou & Darra, 2019; Seleznyov, 2018) in other settings have also shown that time, in particular, tends to be an issue, reporting pressures to simplify the process (da Ponte, 2017). In this sense, it might help to take into consideration that Japan, origin of LS, recurrently appears as one of the countries where teachers report longest work hours per week (Organisation for Economic Co-operation and Development, 2014).
Context, in the end, is a crucial factor. When pondering the outcomes and difficulties that arose during the practice of LS, we should take into consideration that LS comes from a sociocultural context that tends to collectivism and promotes the idea of organizations as families (Yufu, 2019), where continuous improvement—kaizen—is embedded in the professional practices, and where “within the circle” inspection (rough translation of ennai kenshou) and self-contemplation—hansei (Rohlen & LeTendre, 1996)—are regularly expected and are fully integrated in the day to day practice of Japanese teachers (Howe, 2014) and their institutions (Takahashi & McDougal, 2016). As a cultural activity (Stigler & Hiebert, 1999), LS is subject to the cultural relativity of organizational practices, as it involves the manipulation of symbols and conditions of a local nature (Hofstede, 1983). It is for this reason that Matoba (2005) urged to stimulate professional and training cultures that accompanied the practice of LS.
Limitations and Suggestions for Future Research
Although this review provides a comprehensive picture, some limitations might be addressed by future research. First, reviewing only published studies generates a publication bias (Schmucker et al., 2013). This limitation might be attended by including gray literature (quality screened) which, in turn, might also contribute to diversify the origin of the articles analyzed.
Second, there is also a linguistic bias result of only analyzing studies in two languages. Hence, the inclusion of studies published in other languages such as Japanese or Indonesian might provide us with a greater knowledge of this practice among HE faculty members.
Third, the inclusion criteria and quality criteria set for this review are not universal. In consequence, even if they have been justified, other researchers might decide on different and equally valid indicators.
Four, studies included in this review were mostly conducted with a limited number of participants. As research on this field develops, future research might have the chance to explore the benefits of LS using studies with more representative samples.
Five, results reported in this review come from studies that discuss their findings based, mostly, on research at contexts and with participants (in general, schools and schoolteachers) that differ from where they took place and the participants that they included. If research on this field maintains its growth, further studies might be able to generate more contextualized discussions.
Last, when data are provided, future reviews could delve into how LS is put into practice, compare approaches, and address these potential differences to elaborate on their impact.
Conclusion
This review shows how, during the past years, we observe an increase in the number of studies addressing the practice of LS among HE faculty members, in connection with a greater number of countries where these studies are conducted. However, in comparison with other educational levels, research with HE faculty members conducting LS is still making its first steps.
Findings of research up to this moment are encouraging in terms of the potential benefits of LS on the PD of HE faculty members and evince positive outcomes like those found among schoolteachers and prospective teachers. Nevertheless, results so far are mostly based on isolated experiences with few participants. In addition, we also find studies reporting less optimistic findings, especially in relation to the type of reflection that takes place and to how participants collaborate. These mixed findings make clear the need of further research toward generating a solid body of evidence regarding the practice of LS among HE faculty members, as teaching and learning in HE and the professional practice and idiosyncrasies of HE faculty members differ to those of schools and schoolteachers. Increasing the corpus of studies in HE, in return, will also contribute to substantiate further practices of LS in this context and might help overcome the limited bibliographic range observed in this review.
Footnotes
Acknowledgements
I wish to express my gratitude to Professor José Luis Medina for his support and direction during these last years which made this work possible. This work was supported by the Spanish Ministry of Economy and Competitiveness [grant reference EDU2015-63712-P-BES-2016-076824] and has received funding for open access publishing by the University of Barcelona. The funding sources had no involvement in any task related to this article. The data that support the findings of this study are available from the author on reasonable request.
Author
GABRIEL HERVAS is a postdoctoral researcher and lecturer at the School of Education of the University of Barcelona. His research focuses on academic development, teachers’ training, and teachers’ knowledge and reflection.
