Abstract

Introduction
The use of classroom video and dialogue analysis to support teaching improvement has a long-standing tradition and has become a widely adopted approach in international research on classroom studies (Howe & Abedin, 2013; Wang et al., 2024). The Trends in International Mathematics and Science Study stimulated cross-national research on classroom teaching through video-based analysis, enabling systematic comparisons of instructional practices across countries (Stigler & Hiebert, 1999). The Teaching and Learning International Survey further employed classroom video analysis to investigate teaching and learning processes in greater depth, thereby strengthening the empirical understanding of instructional quality (OECD, 2019, 2020). In addition, researchers have demonstrated that systematic analysis and coding of classroom dialogue provide a powerful means of assessing the role of high-quality interaction in supporting student learning (Hennessy et al., 2020; Mercer et al., 2019). Collectively, these international efforts have provided an important empirical foundation for evidence-informed approaches to teaching improvement through classroom analysis.
Similarly, China has developed a long-standing tradition of improving teaching through classroom observation and lesson study (Chen & Zhang, 2019). As early as the 1950s, primary and secondary schools in China established subject-based teaching and research groups (TRGs), within which teachers of the same subject regularly engaged in collective professional inquiry (Paine & Ma, 1993; Zheng et al., 2023). Teaching and research groups function as the formal and primary organizational units for teacher collaboration and professional learning within schools, playing a central role in improving classroom practice, supporting student learning, and fostering teachers’ professional development (Zheng & Luo, 2024). Common forms of these activities include public lessons, lesson observation and postlesson discussion, collective lesson planning, and the use of recorded lessons for reflective analysis (Paine & Ma, 1993; Zheng & Luo, 2024). Over time, such collective teaching-research practices have evolved into a key institutional arrangement supporting school reform and instructional innovation in China, taking on relatively stable and routinized forms in everyday school practice (Wong, 2010).
Classroom analysis and instructional improvement in Chinese schools can be broadly characterized as having evolved through three major phases (Cui, 2012; Gao & Yang, 2025; Yang, 2025). Prior to 2008, research and practice were predominantly grounded in experience-based lesson observation, commonly referred to as listening and postlesson evaluation. Between 2008 and 2018, classroom observation increasingly shifted toward evidence-based classroom observation, supported by structured observation frameworks and analytic tools that sought to enhance rigor and comparability in classroom analysis (Cui, 2012). Although these approaches have generated a substantial body of theoretical insights and practical knowledge, traditional forms of classroom analysis remain heavily dependent on manual recording and experiential judgment, rendering them time-consuming, highly subjective, and difficult to standardize or scale across schools and regions (Lewis et al., 2009; Yuan et al., 2020).
Against this backdrop, classroom analysis research in China has increasingly integrated artificial intelligence (AI) technologies, marking a transition toward a third phase of development: AI-based classroom analysis. Artificial intelligence-based classroom analysis demonstrates substantial advantages and potential for supporting instructional improvement (Wang et al., 2024; Yang, 2025). In particular, how such systems and tools can be integrated into specific school contexts—so that they not only contribute to teaching improvement but are also sustainably embedded in schools’ routine instructional practices—constitutes a highly promising area for future research. This report presents research progress from Shanghai, China, illustrating how schools draw on AI-generated classroom analysis reports to support everyday teaching improvement.
Using the High-Quality Classroom Intelligent Analysis System to Improve Teaching
Since 2018, the International Classroom Analysis Laboratory at East China Normal University has been dedicated to developing an AI-powered classroom analysis system. Over this period, the research team has released the High-Quality Classroom Intelligent Analysis system. The system is trained through machine learning on large-scale, expert-annotated classroom data, comprising approximately 350,000 annotated discourse units across different educational stages and classroom types (Jia et al., 2025; Yang, 2025). This training enables the automated analysis of classroom videos.
In practice, the platform automatically processes uploaded classroom videos to collect and analyze multimodal classroom data, including classroom discourse, teaching and learning behaviors, teacher–student (TS) interaction structures, and time allocation. Following academic ethics and data privacy security, the process is carried out on a voluntary, informed, and consensual basis. After a teacher voluntarily uploads a classroom video, a detailed AI classroom (AIC) report can be generated within approximately 12 min. The report translates raw classroom data into structured and visualized analytic outputs. At present, and in accordance with research ethics considerations, AIC reports are primarily generated through the analysis of classroom discourse data.
Specifically, the AIC report adopts a three-aspect analytic structure, comprising: (1) A macro-level overview, in which AI interprets lesson objectives and major instructional activities and examines the alignment between the designed lesson and the implemented lesson; this layer includes the instructional sequence matrix and TS interaction diagrams (Figure 1), which depict the overall structure of the lesson, including instructional focus and interaction patterns; (2) Classroom dialogue analysis, featuring automated IRE/F (Initiation–Response–Evaluation/Feedback) coding of TS interactions (Walsh, 2006), as illustrated in Table 1; and (3) Visualized classroom profiling, which presents key characteristics of classroom teaching across multiple dimensions to support teachers’ instructional reflection and decision-making. Through these analytic representations, classroom interaction processes that were previously difficult to document and compare systematically become clearly visible.

Teacher–student interaction diagram.
Initiation–Response–Evaluation/Feedback (IRE/F): Types and Definition.
The integration of AI technologies has made the analysis of classroom data both more efficient and more accurate. Taking classroom dialogue as an example, the report's analyses of teachers’ questioning, students’ responses, and teachers’ evaluation/feedback are derived from machine-learning models trained on a corpus of 161,035 annotated classroom dialogue units. Based on this corpus, the system can infer the distribution and proportions of classroom dialogue across three analytic categories for a given lesson within approximately 12 min. Compared with traditional approaches that rely on manual recording and analysis, the system substantially improves both analytic efficiency and accuracy. The resulting data provide teachers with concrete evidence to support classroom reflection and guide instructional improvement.
Compared with traditional classroom observation and analysis approaches that rely entirely on manual coding, AIC reports demonstrate two main advantages. First, automated analysis substantially enhances the efficiency of classroom analysis, making it possible to process large-scale classroom data and provide timely feedback. Second, the analytic framework is grounded in classical models of classroom discourse and expert-designed annotation schemes, ensuring the theoretical coherence and interpretability of the analytic results, thereby supporting their usability in teaching research and professional learning contexts (Jia et al., 2025; Yang, 2025). By generating structured classroom analysis reports, the project has supported school-based teaching research and contributed to ongoing improvement in classroom teaching practice (Shi, 2025).
How Teachers and Schools Use AIC: Two Levels and Four Types
Building on the design of the High-Quality Classroom Intelligent Analysis system described above, this report further examines how teachers and TRGs in Chinese schools use AIC reports to improve teaching. To explore these processes, the report draws on empirical evidence from case schools and teachers, illustrating concrete practices through which AIC is used to improve teaching and promote teachers’ professional development.
In selecting cases, we employed purposeful sampling based on two criteria: (1) schools and teachers that had used AIC reports in practice for more than one year, and (2) the use of AIC in routine teaching had produced positive instructional impacts. Based on these criteria, three schools in Shanghai and seven teachers were selected as exemplary cases. In total, 16 lessons and their corresponding AIC reports were analyzed. All case schools were pilot sites for the High-Quality Classroom Intelligent Analysis system and had promoted AIC-supported instructional improvement in a sustained and organized manner for over one year. Participating teachers had each completed at least two cycles of instructional redesign and reflection informed by system-generated reports and had shared their experiences of using AIC in academic conferences or professional communities. It should be noted that these exemplary cases were selected to illustrate how AIC can be used effectively to support instructional improvement, rather than to establish causal evidence of effectiveness.
With respect to data sources, this report draws on two types of data. The first consists of automatically generated classroom data derived from AIC reports. These data include both descriptive indicators, such as teacher speaking rate and instructional time allocation, and inferential indicators produced through machine-learning models, such as the IRE/F distributions presented in Table 1. Second, the authors conducted interviews with participating teachers to examine how they interpreted and used AIC-generated data in their instructional reflection and improvement processes. Through integrated analysis of these data sources, the report identifies two analytical levels—the individual level and the group level—and four typical patterns of using AIC to support instructional improvement.
Individual Level
At the individual level, AIC reports help teachers use classroom data for self-evaluation and pedagogical improvement, with two typical patterns: the first focusing on comparing practices across lessons to identify effective instructional designs, and the second on tracking long-term progress to support sustained professional growth.
Type 1: Same Teacher, Optimized Designs (Tongshi Yougou)
Same teacher, optimized designs refer to situations in which a single teacher experiments with different instructional designs. By comparing classroom processes and learning outcomes across lessons, teachers gain insight into which design features are more conducive to classroom improvement.
An illustrative example comes from a mathematics teacher at Qibao High School teaching An Extended Inquiry Into the Graphs and Properties of Trigonometric Functions. On the same day, the teacher implemented two different instructional designs for the same lesson, with an interval of approximately 2 hr between the two classes. This setting provided ideal conditions for immediate comparative analysis of classroom data. As shown in Table 2, AIC analysis revealed that the proportion of teacher-led instruction in the second lesson decreased from 54% to 34%, while the proportion of student self-directed activity increased from 22% to 41%. The instructional focus thus shifted noticeably from teacher exposition toward student-led activity.
Comparison of Teacher–Student Interaction in the “Same Teacher, Optimized Designs” Case.
This shift was not attributable to differences in student ability but was primarily associated with changes in instructional design. In the second lesson, students were guided toward the core task approximately 8 min into the lesson. Through a sequence of coherent and focused questions, the teacher provided students with extended opportunities for thinking, exploration, discussion, and presentation. As a result, the lesson featured more understanding-oriented dialogue and a higher level of student agency. In contrast, although the first lesson attempted to guide students toward the learning focus through multiple rounds of TS interaction, the questions were relatively fragmented and lacked sustained follow-up, preventing the formation of a coherent problem chain.
Importantly, the purpose of comparing different lessons taught by the same teacher is not merely to judge which lesson is “better.” Rather, by analyzing different design orientations, teachers are able to identify the strengths of each approach and gain a clearer understanding of how instructional decisions shape classroom enactment. Through immediate comparison, teachers can directly observe how different instructional designs influence classroom processes and student learning quality, thereby supporting informed and selective instructional improvement.
Type 2: Same Teacher, Longitudinal Improvement (Tongshi Jinjie)
Same teacher, longitudinal improvement focuses on teachers’ sustained self-tracking and professional growth over time. In China, teachers typically record a certain number of public lessons each year; however, these recordings are often forgotten after a period of time. The contribution of AIC reports in this context lies in providing teachers with comparable longitudinal evidence, enabling more objective judgments about changes in their teaching and supporting the gradual enactment of student-centered curriculum reform principles.
For novice teachers in particular, AIC reports offer a relatively stable and objective reference framework that helps them identify key changes in their classroom practices as they experiment and adjust their teaching. For example, longitudinal AIC analysis spanning six years for a novice mathematics teacher at Qibao High School revealed substantial changes in the structure of classroom activity time. As teaching experience accumulated, the proportion of TS interaction increased from 13% to 24%, while the proportion of student self-directed learning activities rose from 3% to 22% (Table 3). Classroom time gradually shifted away from teacher exposition toward supporting students’ thinking, inquiry, and expression, reflecting the teacher's ongoing effort to “return time to students.”
Comparison of Classroom Activity Time Distribution in the “Same Teacher, Longitudinal Improvement” Case.
At the level of interaction quality, IRE/F distributions provided direct evidence of a shift from teacher-centered exposition toward dialogic inquiry. As shown in Table 4, with respect to questioning practices, the proportion of open or semiopen questions (I3) oriented toward deep learning increased from 1.92% to 11.11%, indicating that classroom questions increasingly moved beyond single correct answers to create space for student thinking and expression. In terms of feedback, the proportion of higher-order feedback (E3) that promoted student reflection or metacognition rose from 0% to 9.68%, closely aligning with a transition from task completion toward understanding- and reflection-oriented learning.
Longitudinal Improvement in Initiation–Response–Evaluation/Feedback (IRE/F)-Based Classroom Interaction Patterns.
By continuously examining classroom time structures, interaction patterns, and IRE distributions, teachers are able to assess whether their instructional practices are developing stable characteristics rather than reflecting isolated successes or failures. As the novice teacher reflected: “These data let me view my teaching with a new pair of eyes, spot problems, and see my growth over the years as I return more time to students.” In this way, AIC reports reshape how teachers observe their own classrooms, shifting reflection from immediate instructional impressions toward data-informed, longitudinal evaluation of professional growth. Through these longitudinal comparisons, reform principles commonly associated with public lessons—such as student-centered teaching—are no longer abstract expectations but become empirically grounded and meaningfully enacted in practice.
Group Level
At the group level, teachers collectively use AIC reports to conduct classroom observation and instructional improvement within TRGs. Two typical patterns can be identified. The first involves same-lesson, different-design approaches conducted by different teachers, while the second focuses on collective advancement within the group.
Type 3: Same Lesson, Different Designs (Tongke Yigou)
Different teachers, different designs are a common form of classroom inquiry in school-based teaching and research activities (Chen & Zhang, 2019). Within a TRG, different teachers teach the same lesson and examine how varying instructional approaches shape classroom structure and student learning. With the support of AIC reports, this form of inquiry shifts from experience-based comparison toward data-informed analysis of classroom structure and learning processes, enabling more grounded reflection on the appropriateness of instructional design.
An illustrative case comes from the Physics TRG at Jiangwan Junior High School, where two teachers conducted same-lesson, different-design teaching based on an eighth-grade lesson titled Linear Motion. The IRE/F analysis revealed marked differences in interactional orientation across the two lessons, each demonstrating distinct instructional strengths. In Teacher A's lesson, students’ open reasoning or explanatory responses (R3) reached 19.3% (Table 5). Classroom dialogue analysis reveals that when presenting data with varying distances and times, the teacher guided students to construct the concept of speed through inquiry (e.g., “How did you compare them?”), effectively eliciting explanatory responses regarding the control of variables.
Comparison of Classroom Q&A (IRE) and Level Proportions.
By contrast, Teacher B demonstrated a clear strength in the feedback phase. In this lesson, feedback behaviors involving inviting evaluation, promoting reflection or metacognition, and facilitating discussion (E3) accounted for 21.05%. For example, during a unit conversion task (m/s to km/h), when a student hesitated due to calculation difficulties, the teacher withheld the direct answer. Instead, he prompted the class to evaluate the approach: “He hesitated just now. Who can examine his method to find a better solution?” This practice reinforced students’ reflection on their problem-solving processes and contributed to sustaining coherence and depth in classroom dialogue.
From the perspective of same-lesson, different-design analysis, these two lessons do not lend themselves to a simple judgment of superiority or inferiority. Teacher A's instructional design was more effective in guiding students to construct physical concepts through inquiry, thereby fostering their logical reasoning capabilities, whereas Teacher B's feedback and follow-up questioning prioritized comparing and evaluating problem-solving strategies, effectively supporting students’ metacognitive reflection on their own learning processes. Through the structured presentation of classroom data, AIC reports make these differences visible and discussable rather than leaving them at the level of experiential impression. On this basis, the TRG was able to consider how the two instructional orientations might be integrated in subsequent discussion, maintaining high-quality questioning while strengthening targeted feedback and follow-up, to achieve an overall improvement in the quality of classroom interaction.
Type 4: Collective Advancement of a TRG (Tongzu Jinjie)
Collective advancement of a TRG emphasizes how TRGs, as a Chinese version of professional learning communities (Wong, 2010; Zheng & Luo, 2024), engage in holistic reflection and sustained optimization of teaching practices across different lesson types in relation to shared disciplinary and educational goals. The key concern is not the evaluation of individual lessons but whether the TRG can develop a shared understanding of educational purposes at the conceptual level and continuously examine and refine this understanding through data-informed inquiry.
An illustrative case comes from the Grade 5 Chinese language TRG at Kongjiang No. 2 Primary School. Focusing on the theme of developing argumentative learning and higher-order thinking, the group conducted sustained classroom observation and collective lesson refinement around the text Tian Ji's Horse Racing. Following the initial lesson taught by Teacher Liang, the AIC report indicated that teacher-led instruction accounted for 69% of classroom time, while student self-directed learning accounted for only 20%. Based on these data, the TRG engaged in collective discussion, during which more experienced teachers proposed three key strategies for improvement: (1) using role-play to help students experience different character perspectives and stimulate argumentation; (2) employing mind maps to organize characters’ reasoning processes and reduce fragmented questioning; and (3) refining core questions and constructing a coherent problem chain.
In a revised lesson taught three days later, Teacher Liang integrated these strategies into the instructional design, resulting in a noticeable improvement in overall classroom performance. The proportion of teacher-led instruction decreased to 55%, while student self-directed learning increased to 23%. More importantly, as shown in Figure 2, the TRG did not confine these refinements to a single lesson. Instead, practices such as core problem chains, mind mapping, and role-play were extended to three additional texts within the same unit, leading to a unit-level instructional design. Across the unit, student participation patterns and classroom interaction structures exhibited more stable and consistent improvement trends.

Strengthening unit-based design through core question chains by the teaching and research group (TRG).
Building on this work, the Grade 4 Chinese language TRG at the same school implemented the unit-based instructional plan developed by the previous cohort in the following academic year. Through lesson recording and AIC analysis of three lessons, the group observed further improvements in both classroom structure and the quality of student participation. The TRG subsequently disseminated these practices to a newly formed Grade 4 TRG, where core problem chains, mind mapping, and role-play were adopted as key instructional strategies for the unit. Through this process of unit-level iteration, the efficiency of the TRG's implementation of unit-based teaching was enhanced. Taken together, collective advancement of a TRG demonstrates how AIC functions as a tool for consensus-building, enabling teaching research to move from isolated lesson improvement toward unit-level optimization and the sharing of collective professional knowledge.
Takeaway Message
Artificial intelligence-powered classroom analysis systems overcome the limitation of experience- and video-based approaches by delivering timely, precise, visualized, theory-informed data.
Two levels and four types of AIC use for classroom analysis and improvement within schools were identified: at the individual level, same teacher, optimized designs and same teacher, longitudinal improvement, supporting reflection and continuous improvement; at the group level, same lesson, different designs, and the collective advancement of a TRG, supporting lesson refinement and whole-group instructional improvement within TRGs.
Artificial intelligence classroom should serve as a resource or partner, not a set of prescriptive indicators; potential challenges, such as overreliance on AI-generated metrics, may encourage “teaching to indicators” and weaken teachers’ professional judgment.
Both teachers and TRGs face boundary conditions when using AI for classroom evaluation and improvement.
While AI can reshape the modes, processes, and efficiency of classroom analysis and instructional improvement, teachers’ professional reflection, informed judgment, and collective wisdom remain irreplaceable.
Footnotes
Ethical Considerations
Informed consent was obtained from all participants prior to data collection. Ethical approval was granted by the Ethics Committee of East China Normal University (IRB No. HR2-0206-2026).
Author Contributions
Chenlu Liu was responsible for analyzing data, writing original draft, and responding to reviewers’ comments. Xin Zheng set up the framework and revised the draft. Bei Ding provided part of the data and guided data analysis.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
