Sage Journals: Discover world-class research

Abstract

This study develops a systematic, theory-driven evaluation index system to address the critical gap in assessing the implementation effectiveness of China’s “Double Reduction” policy. Grounded in Policy Implementation Theory and Educational Policy Evaluation Frameworks, the study employed a mixed-methods approach involving policy text analysis of five core documents and a two-round Delphi consultation with 23 experts (authority coefficient Cr = 0.7928). Index weights were determined through comprehensive weighting (AHP and Entropy Weight Method), and the system was empirically validated via fuzzy comprehensive evaluation with data from 163 teachers across four Chinese provinces. The resulting framework comprises 2 first-grade, 5 second-grade, and 23 third-grade indexes. Empirical results indicate an overall “good” policy implementation effect (score = 72.96/90), with “Quality of Education and Teaching” (weight = 0.2856) and “Level of After-School Services” (weight = 0.2269) as the most influential indicators. However, weaker performance was observed in reducing homework burden and promoting balanced compulsory education. This research provides the first unified evaluation tool for the “Double Reduction” policy, offering both a practical instrument for local Chinese governance and a replicable model for other regions grappling with the challenge of balancing student burden reduction and educational quality improvement.

Plain Language Summary

How well is China’s “Double Reduction” Policy working? A Simple Look at its Effects and Evaluation

China’s “Double Reduction” policy, started in 2021, aims to cut students’ heavy homework and unregulated after-school tutoring. This study created a clear system to check how well the policy works, using expert advice and surveys of teachers in four provinces (Jiangsu, Shandong, Shanxi, Anhui). Overall, the policy has done well (scoring 72.96 out of 90), with better school teaching quality and after-school services being the biggest wins. However, there are gaps: too much homework is still a problem, and compulsory education isn’t equally good across areas. Also, after-school services need better conditions and more options. This study helps Chinese local governments improve the policy and gives other countries (like those with big tutoring industries) a model to check their own student burden-cutting efforts. It uses simple, fair methods to measure success, making it easy to understand even for people not in education research.

Keywords

“Double Reduction” Policy evaluation index system comprehensive empowerment fuzzy comprehensive evaluation method education quality

Introduction

In recent years, the excessive burden of homework and extracurricular academic tutoring on primary and secondary school students in China has distorted the goals of education reform, fostering short-sighted and utilitarian approaches to learning while triggering widespread social anxiety around education (Huang et al., 2021; Qi et al., 2023). Because of the convergence of parents’ parenting concepts, methods and behaviors around the world, the anxiety caused by excessive involvement in children’s education has spread globally, especially among middle-class parents (Ehrenreich, 1990). To address this crisis, in July 2021, the General Office of the Central Committee of the Communist Party of China (CCCPC) and the State Council issued the Opinions on Further Easing the Burden of Excessive Homework and Off-campus Tutoring for Students Undergoing Compulsory Education—widely known as the “Double Reduction” policy. Critically, the policy’s core objective is to reduce two key sources of student overload: excessive homework assigned by schools and unregulated off-campus academic tutoring (Zheng & Zhou, 2021). It has the value essence of student-centered, quality-based education and home-school cooperative education. (Xue & Li, 2023). To achieve this, it employs targeted instruments: strengthening school-based after-school service programs (e.g., extended hours, skill-building activities) to replace off-campus tutoring, regulating the registration, fees, and advertising of tutoring institutions, and reforming classroom teaching to improve in-school efficiency (She et al., 2022). By 2024, China’s Ministry of Education reported that 3 years of policy implementation had achieved “Double Reduction and double increase” (i.e., reduced burden alongside improved educational quality and after-school services) and pledged to consolidate these outcomes. General Secretary Xi Jinping further emphasized the need to sustain progress at the National Education Conference, underscoring the policy’s role as a cornerstone of China’s efforts to build a modern, Chinese-style education system.

Establishing a rigorous evaluation index system for the “Double Reduction” policy is of utmost importance for scientifically appraising its implementation, optimizing policy measures, and enhancing public confidence. Nevertheless, extant academic research on the policy’s effectiveness is plagued by two crucial limitations, which mirror more extensive deficiencies in global educational policy evaluation. At the domestic level, studies predominantly rely on “self-constructed evaluation scales” centered on single perspectives (e.g., parental satisfaction or homework quantity). There are no unified standards to elucidate the hierarchical relationships among indexes (L. Zhou, 2023). This disorganization frequently confounds process variables (e.g., “implementation organizational quality”) with outcome variables (e.g., “target group perception”), resulting in inconsistent and non-comparable evaluations.

On a global scale, existing frameworks for educational policy evaluation, such as the OECD’s emphasis on equity and student well-being in its Education Policy Outlook (OECD, 2022) and the U.S. Every Student Succeeds Act (ESSA)’s stress on standardized test performance and school accountability, present a significant shortcoming. These models are deficient in specialized instruments for evaluating “burden reduction” policies in non-Western, developing, or culturally diverse contexts (Z. Zhou et al., 2023). For example, OECD frameworks give precedence to overall educational quality rather than addressing the distinctive problem of “over-tutoring” (a prevalent phenomenon in East Asia, South Asia, and certain regions of Southeast Asia). Meanwhile, ESSA’s accountability metrics fail to consider the role of unregulated private tutoring in intensifying educational inequality. This global lacuna implies that policymakers in non-Western settings lack adaptable and culturally appropriate models to assess policies aimed at alleviating student overload, thereby impeding the replication of successful burden-reduction initiatives.

Against this backdrop, this study anchors its design in Policy Instrument Theory—a framework that examines how policy tools (e.g., regulation, service provision, collaboration) shape implementation outcomes (Howlett, 2009). By applying this theory, we move beyond descriptive evaluations of the “Double Reduction” policy to explain how its instruments (e.g., tutoring regulation, after-school services) interact with local contexts to influence effectiveness—a contribution to localized policy evaluation models that can inform non-Western contexts. Methodologically, we first code and analyze five core “Double Reduction” policy texts (see Table 1) to develop a systematic, theory-driven evaluation index system. We then use the fuzzy comprehensive evaluation method to empirically assess policy effectiveness across four Chinese regions (Jiangsu, Shandong, Shanxi, and Anhui), leveraging primary and secondary school teachers as key informants (they are direct implementers and witnesses to student-level changes).

Table 1.

Selected Policy Texts.

Texts/meetings	Timing	Department of enactment
Easing the Burden of Excessive Homework and Off-campus Tutoring for Students Undergoing Compulsory Education	2021-07-24	CCCPC and State Council
Strengthening the Role of the Main Bases of School Education and Deepening the Governance of tutoring Institutions	2021-07-26	People’s Daily
Promoting the All-Round Development and Healthy Growth of Students	2022-07-27	People’s Daily
The Ministry of Education convened a national meeting to promote the work of the “Double Reduction” and a plenary meeting of the specialized coordination mechanism for the “Double Reduction”	2023-07-21	Ministry of Education
Anchoring the three-year goal and overcoming the difficulties to ensure the timely completion of the centralized “Double Reduction” task	2024-01-05	Ministry of Education

The significance of this study is two-pronged. In practical terms, it furnishes Chinese local governments with a scientific instrument for the refinement of the “Double Reduction” policy. From a global perspective, it presents a replicable model for the evaluation of burden-reduction policies in non-Western contexts, thereby filling the void in existing international frameworks. Through these efforts, the study aims to make contributions to both China’s educational reform and the global endeavors to rebalance education in favor of students’ well-being and equitable development.

Literature Review

Policy Connotation of the “Double Reduction,” Policy: Domestic Evolution and Global Context

Understanding the connotation of the “Double Reduction” policy is a fundamental prerequisite for constructing its evaluation index system. Domestically, research focus on this connotation has shifted from examining short-term implementation effects to advancing systematic policy evaluation. In the early stages post-policy launch (2021), scholars emphasized its strategic significance as an effort to reset educational paradigms—framing it as a breakthrough in addressing “burden reduction as a focal goal” while reflecting policy legitimacy and practical innovation (Wang, 2021; X. J. Zhang et al., 2023). For instance, Wang (2021) argued that the policy aligns with educational equity principles by curbing utilitarian tutoring, while X. J. Zhang et al. (2023) highlighted its role in rebalancing school, family, and social responsibilities in education. As implementation progressed, Zhu (2021) expanded this discourse by proposing a “symptom-root cause” evaluation lens: symptoms include reduced student burden and improved in-school efficiency, while root causes target structural issues like regional educational inequity—a distinction that guided early domestic evaluation attempts.

From a global perspective, the “Double Reduction” policy aligns with international efforts to mitigate educational overload, yet its design reflects unique contextual characteristics. For example, South Korea’s 2000s cram school (hagwon) regulations focused narrowly on limiting tutoring hours and standardizing fees to reduce student stress, but lacked China’s emphasis on strengthening in-school alternatives (e.g., after-school services) to replace off-campus tutoring (Choi & Choi, 2016). In contrast, Finland’s “less-is-more” reforms centered on curriculum streamlining and student-centered learning to reduce academic pressure, prioritizing qualitative educational improvement over direct regulation of private services (Sahlberg, 2021). These global cases highlight that burden-reduction policies typically prioritize either “supply-side regulation” (e.g., South Korea) or “in-school quality enhancement” (e.g., Finland). China’s “Double Reduction” is distinctive in integrating both: it targets both excessive homework (in-school) and unregulated tutoring (out-of-school) while linking burden reduction to compulsory education quality—filling a gap between single-focus global models.

Evaluation Dimensions of Educational Burden-Reduction Policies: International Models and Domestic Gaps

Clarifying effective evaluation dimensions is critical for assessing policy impact, yet both domestic and international research exhibit notable limitations. Internationally, classic evaluation models have guided educational policy assessment but struggle to address the unique demands of burden-reduction policies. The CIPP Model (Context-Input-Process-Product; Stufflebeam, 2003), a widely used framework for educational evaluation, emphasizes systematic assessment of policy context, resources, implementation processes, and outcomes. However, its process-oriented design prioritizes “whether policies are implemented as planned” over “whether burden reduction improves student well-being or equity”—a core goal of policies like China’s “Double Reduction.” Similarly, Kirkpatrick’s Four-Level Evaluation Model (Kirkpatrick & Kirkpatrick, 2016), which evaluates training programs via reaction, learning, behavior, and results, is ill-suited for large-scale educational policies: it focuses on individual participant outcomes (e.g., student test scores) rather than systemic changes (e.g., regional educational balance) that define burden-reduction success.

Domestic research on “Double Reduction” evaluation dimensions, while evolving, remains fragmented. Early studies focused on isolated indicators: Qi et al. (2023) measured homework burden reduction via student survey data, while Yang (2023) assessed tutoring regulation effectiveness through institutional compliance rates. Zhu (2021) expanded this to include “symptom-root cause” dimensions but did not integrate these into a unified framework. A key gap is that domestic studies rarely engage with international models to address their own limitations—for example, no domestic research has adapted the CIPP Model’s “product” dimension to measure holistic outcomes (e.g., parental anxiety reduction, student well-being) or addressed Kirkpatrick’s blind spot in systemic evaluation. This disconnect leaves domestic evaluations without a globally informed, comprehensive lens for assessing the “Double Reduction” policy’s multi-faceted impact.

Methodological Limitations in “Double Reduction,” Evaluation: Domestic Shortcomings and International Misalignments

The methodological rigor of policy evaluation directly impacts result validity, yet both domestic and international research on educational burden reduction faces challenges. Domestically, studies rely heavily on qualitative methods or self-constructed quantitative scales with limited standardization. Ye (2023) used in-depth interviews with primary school teachers to analyze classroom teaching changes under “Double Reduction,” while Liu (2023) employed questionnaires to measure tutoring demand—but neither study validated their tools against international standards or unified metrics. L. Zhou (2023) criticized this approach, noting that conflating process variables (e.g., implementation form) with outcome variables (e.g., target group perception) creates chaotic, incomparable results. Additionally, domestic studies rarely use comprehensive weighting or fuzzy evaluation methods to handle subjective data (e.g., parental satisfaction), leading to oversimplified conclusions about policy effectiveness.

Internationally, methodological limitations in burden-reduction policy evaluation stem from a misalignment with non-Western contexts. Studies on South Korea’s hagwon regulations (Bae & Choi, 2024) used quantitative data (e.g., tutoring participation rates) to measure success but ignored cultural factors like parental pressure to pursue supplementary education—factors critical to understanding policy impact in East Asia. Evaluations of Finland’s reforms (Sahlberg, 2021) relied on qualitative classroom observations, which are difficult to replicate in large, diverse education systems like China’s. A global methodological gap is the lack of tools that balance quantitative rigor with sensitivity to cultural context—particularly for policies that, like China’s “Double Reduction,” target both structural regulation and cultural shifts in educational expectations. This gap underscores the need for mixed-method approaches that integrate international best practices (e.g., Delphi expert validation) with context-specific adaptations (e.g., fuzzy comprehensive evaluation for subjective data).

Methods

Selection of Texts

Three criteria guided the selection of “Double Reduction” policy texts: authority (central government-issued documents with national dissemination), professionalism (compiled by expert teams from the Ministry of Education), and symbolism (reflecting public opinion via democratic decision-making). Five core texts were selected (Table 1), with additional considerations: (1) alignment with the policy’s official 2021 launch timeline; (2) text 1 as the overarching framework, text 2 as a focus on tutoring governance, and texts 3 to 5 as post-implementation effectiveness references to inform index design.

The five policy texts were imported into NVivo 12 for systematic coding, following three sequential steps to ensure rigor. Open coding: Abstracting core concepts via keywords (e.g., “homework management”“tutoring supervision”) to generate 47 initial concepts; Axial coding: Consolidating highly similar initial concepts into 23 categories (e.g., “homework amount”“after-school service quality”) by identifying causal and associative relationships; Selective coding: Integrating 23 categories into five main categories (“total homework amount and duration”“normative degree of tutoring,”“quality of education and teaching,”“degree of collaborative linkage,”“level of after-school services”) and distilling two core categories (“degree of ‘Double Reduction’”“effectiveness of supporting governance”) to form the three-tier index system (2 first-grade, 5 second-grade, 23 third-grade indexes; Table 3).

Coding consistency was verified using Cohen’s Kappa coefficient (Kappa = 0.87, p < .001), exceeding the threshold of 0.75 for excellent inter-coder reliability (Landis & Koch, 1977).

Basic Information on Experts

According to the requirements of this study and considering the interests associated with the “Double Reduction” policy, education scholars, heads of education management departments, school administrators, and parents were selected as expert sources for the research. The evaluation of the policy indexes is highly specialized; therefore, the study developed a detailed plan for expert selection to ensure the authority, scientific rigor, and validity of the results. Education scholars were chosen based on their senior academic titles, and all possess doctoral degrees. Additionally, several experts have led national key projects as well as provincial and ministerial initiatives related to the “Double Reduction” policy and have published academic findings on relevant topics. Other experts specialize in family education, school governance, and educational evaluation, all of whom can contribute valuable insights to the assessment of the “Double Reduction” policy. The heads of city and county-level education bureaus and the directors of education supervision offices were selected from the education management sector. Grassroots education departments are the primary entities responsible for implementing and evaluating the evaluation serve as significant references. Schools are the main implementers of the “Double Reduction” policy and are directly involved in assessing its effectiveness. Parents are the primary beneficiaries of the “Double Reduction” policy, and selecting highly educated parents can provide a more informed perspective for evaluating the policy. A total of 23 expert consultation letters were distributed, all of which yielded valid responses. Table 2 presents the information of experts. Experts completed online targeted questionnaires (100% effective response rate). The high response rate was justified by: (1) pre-consultation communication to confirm expert availability; (2) providing a concise policy brief to reduce response burden; (3) 2-week response windows with one reminder. Expert authority was validated via Cr = 0.7928 (exceeding the acceptable Cr ≥ 0.70; Zeng Guang’s standard), ensuring index validity.

Table 2.

Basic Information on Experts (n = 23).

Type of expert	Number of experts	Note
Scholars in the field of education	10	All of them are professors, focusing on the “Double Reduction” policy, family education, school governance, education evaluation, and other research.
Educational management staff	4	Heads of Education Departments, Directors of Supervisory Departments, etc.
Parents	5	All have doctoral degrees and are able to evaluate policy indexes
School administrators	4	Principals, deputy principals, full senior teachers

Empirical Measurement Sample Selection

In order to assess the accuracy and feasibility of the evaluation index system for the implementation effectiveness of the “Double Reduction” policy, this study uses Jiangsu (high economic development, top-tier compulsory education quality), Shandong (large education scale, balanced urban-rural development), Shanxi (mid-income province, ongoing educational resource optimization), and Anhui (rapidly developing, with regional disparities in education access) as case studies. The fuzzy comprehensive evaluation method (C. Zhang et al., 2020) is employed to select primary and secondary school teachers from these four regions as the subjects of the survey, aiming to measure the effectiveness of the “Double Reduction” policy’s implementation. Primary and secondary school teachers, as the direct implementers of the policy and the primary contacts with students, possess firsthand insights into the realities of the policy’s implementation and the resultant changes in students. Therefore, selecting primary and secondary school teachers as the survey subjects is both scientifically valid and rational. 166 primary/secondary school teachers (direct policy implementers) were surveyed online, yielding 163 valid responses (98.19% effective rate). Pre-survey reliability was tested via Cronbach’s α = .82, indicating high internal consistency of questionnaire items (Nunnally, 1978).

Ethical Approval and Procedures

This study was approved by the Institutional Review Board of the author’s University. All procedures were performed in accordance with the 1964 Helsinki Declaration and its later amendments. The study design minimized risks to participants by ensuring anonymity (no personally identifiable information was collected), allowing withdrawal at any time without penalty, and avoiding sensitive topics. The potential benefits of developing a systematic policy evaluation framework for the “Double Reduction” policy were deemed to outweigh the minimal risks to participants. Informed consent was obtained from all participants prior to their involvement. For experts, educational management staff, and teachers, consent was obtained electronically. For parent participants, verbal consent was obtained and documented, with all participants being fully informed of the study’s purpose, procedures, and data usage.

Results

Construction of the Evaluation Index System

Through NVivo 12 coding (open → axial → selective) and Delphi expert validation, a three-tier evaluation index system was finalized. Hierarchical Structure Diagram illustrates the hierarchical relationships between core categories (first-grade indexes), main categories (second-grade indexes), and subcategories (third-grade indexes)—replacing Table 3 for clearer visualization. The system comprises 2 first-grade indexes (“Degree of ‘Double Reduction’”“Effectiveness of Supporting Governance”), 5 second-grade indexes, and 23 third-grade indexes. Coding consistency was confirmed via Cohen’s Kappa = 0.87 (p < .001), indicating excellent reliability.

Table 3.

Final Coding Results.

Core category (First grade indexes)	Main category (Second grade indexes)	Category (Third grade indexes)
Degree of “Double Reduction”	Total homework amount and duration	Perfection degree of homework management mechanism
		Homework amount
		Quality of homework design
		Guidance on completing homework
	Normative degree of tutoring	Strictness of agency approvals
		Training Service Behavioral Norms
		Standing Operations Supervision
		Regulation of tutoring fees
		Control of tutoring advertising
		Extent of reduction in academic tutoring
Effectiveness of Supporting Governance	Quality of education and teaching	Degree of balanced high quality development of compulsory education
		Quality of classroom teaching
		Degree of high school enrollment reform
		Changes in the quality evaluation system
	Degree of collaborative linkage	Home-school cooperation
	Degree of collaborative linkage	Deepening sectoral linkages
	Level of after-school services	extracurricular time
		After-school service hours
		Quality of after-school services
		After-school service conditions
		After-school service channels
		Free Online Learning Services
		Utilization of resources on and off campus

Results of Expert Advice

In this study, the expert positivity coefficient was set at “1,” indicating a high level of concern and positive attitude among experts in this field. Based on the expert authority coefficient assignment method proposed by Prof. Zeng Guang, the expert authority coefficient Cr was calculated to be 0.7928. Since the acceptable value is Cr ≥ 0.70, it can be inferred that the expert authority degree of the consultation in this study is relatively high, and experts are capable of better evaluating the index system.

In this research, a five-point Likert scale, ranging from 5 to 1, was utilized to evaluate the importance of various indexes. This approach enabled the calculation of both the degree of consensus among experts’ opinions and the level of agreement on those opinions. Moreover, open-ended questions were incorporated into the correspondence questionnaire, allowing experts to propose modifications to irrelevant existing indexes and provide additional comments on missing indexes.

Following the consultation with experts, the analysis of the questionnaire data demonstrated that, according to expert opinions, the mean importance of the analyzed indexes exceeded 3.5. Additionally, over 95% of the indexes had a full-score ratio greater than 20%. The coefficient of variation (CV) of the analyzed indexes showed that more than 90% of the indexes had a CV of less than 0.25. Furthermore, the p-value of the Kendall’s coefficient of concordance test was less than 0.05. Hence, all entries can be retained. Finally, through the analysis of the supplementary opinions provided by experts in Table 4, which are highly relevant to the “Double Reduction” policy, it was found that these opinions can be integrated into the original evaluation index system. As a result, after several rounds of validation via expert correspondence, the original evaluation system remained unchanged.

Table 4.

Expert Open-Ended Comments.

Expert opinion	Original indexes
Lack of content on family-school-community collaborative education	Degree of collaborative linkage; family-school cooperation
Along with reducing the burden, it is important to ensure that students have time for sleep, socialization, recreation and independent arrangements. In addition, the imbalance in education resources can be partially mitigated if there are standardized free learning channels	After school hours; free online learning services
Functions of after-school services	Level of after-school services

Weighting Results

Determining Subjective Weights

This study adopts the analytic hierarchy process (AHP; Saaty, 1980) to calculate the subjective weights of indexes at all levels, which decomposes the influencing elements related to decision-making into different levels, and carries out qualitative and quantitative analyses on the basis of this method. The process of subjective weighting of the second-grade indexes is illustrated as follows:

First, according to the evaluation index system constructed above, combined with the results of the experts’ correspondence to compare the relative importance of each index at each level, based on the ninefold scaling method to construct the judgment matrix at each level, Table 5 presents the details of this method.

Table 5.

Ninefold Scaling Method.

Scale	Degree of importance
1	Equal importance
3	Marginally important
5	Clearly important
7	Particularly important
9	Monumental
2, 4, 6, 8	Intermediate values of the above adjacent judgments
Inverse of 1–9	Importance of comparing two indexes of opposite order

Using Likert five-level scale to collect the evaluation of the degree of importance of each level of indexes by experts, the two indexes for comparison, 5 points divided into 10 intervals, two indexes rating is equal, then the two scaled to 1; two indexes between the absolute value of the rating difference between the two indexes in the (0, 0.5], the two in the important indexes scaled to 2, the other 1/2; the absolute value of the rating difference between the two indexes in the (0.5, 1], the important index in both is scaled as 3 and the other as 1/3. and so on to construct the subjective weight judgment matrix C of the second-grade indexes as:

C = {(c_{ij})}_{5 \times 5} = [\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} 1 & 2 \end{matrix} & \begin{matrix} \frac{1}{2} & 2 & \frac{1}{2} \end{matrix} \end{matrix} \\ \begin{matrix} \begin{matrix} \frac{1}{2} & 1 \end{matrix} & \begin{matrix} \frac{1}{3} & 1 & \frac{1}{2} \end{matrix} \end{matrix} \\ \begin{matrix} \begin{matrix} 2 & 3 \end{matrix} & \begin{matrix} 1 & 3 & 2 \end{matrix} \end{matrix} \\ \begin{matrix} \begin{matrix} \frac{1}{2} & 1 \end{matrix} & \begin{matrix} \frac{1}{3} & 1 & \frac{1}{2} \end{matrix} \end{matrix} \end{matrix} \\ \begin{matrix} \begin{matrix} 2 & 2 \end{matrix} & \begin{matrix} \frac{1}{2} & 2 & 1 \end{matrix} \end{matrix} \end{matrix}]

Where $c_{ij}$ denotes the importance of index i to index j.

The second is the consistency test. Using Matlab 2021, the matrix C is computed to obtain its maximum eigenvalue $λ_{\max}$ and the corresponding eigenvector as follows:

λ_{\max} = 5.0719 Ω = (0.3631, 0.2174, 0.7356, 0.2174, 0.4822)

From the formula $α_{i} = \frac{Ω_{i}}{\sum_{j = 1}^{n} Ω_{j}}$ index weights were obtained after normalization:

α_{I I} = (0.1801, 0.1079, 0.3649, 0.1079, 0.2392)

Conduct consistency tests:

CI = \frac{λ_{\max} - m}{m - 1} = \frac{5.0719 - 5}{5 - 1} = 0.017975

CR = \frac{CI}{RI} = \frac{0.017975}{1.12} = 0.016 < 0.1

Where CR indicates the consistency ratio, CI indicates the consistency test index, and RI presented in Table 6 represents the average random consistency index, the values of which are shown in Table 6. If CI is close to 0, then it has better consistency; if CI = 0, the judgment matrix has full consistency; the larger CI is, the worse the consistency is. When CR < 0.1, the matrix satisfies the consistency requirement.

Table 6.

Table of RI Values.

Matrix order n	1	2	3	4	5	6	7	8	9	10	11	12	13
RI	0	0	0.58	0.90	1.12	1.24	1.32	1.41	1.45	1.49	1.51	1.54	1.56

As can be seen from the calculation, the judgment matrix of subjective weights of second grade indexes has good consistency and meets the consistency requirements. Table 7 presents the subjective weights of the second-grade indexes of the evaluation index system of the effectiveness of the implementation of the “Double Reduction” policy are obtained. According to the above method, the subjective weight judgment matrix of each second-grade indexes and its subordinate third grade indexes is constructed separately to calculate the subjective empowerment results of the third-grade indexes.

Table 7.

Results of Subjective Weighting of Second Grade Indexes.

Second grade indexes	Total homework amount and duration	Normative degree of tutoring	Quality of education and teaching	Degree of collaborative linkage	Level of after-school services
Subjective weighting	0.1801	0.1079	0.3649	0.1079	0.2392

Determination of Objective Weights

The entropy weight method is used for objective weight determination, which is based on the different entropy values to assign weights, and since the weight determination depends on the data set, the entropy weight method can effectively avoid the influence of subjectivity and objectively reflect the characteristics of the weighted data. The process of objective weighting of second grade indexes is explained as follows:

First, the 23 expert ratings were transformed into a scoring matrix S and according to the formula $d_{uv} = \frac{s_{uv}}{\sum_{v = 1}^{m} s_{uv}}$ to calculate the normalization matrix D.

Second, according to the formula of information entropy $H_{v} = - \frac{1}{\ln n} \sum_{u = 1}^{n} d_{uv} \ln d_{uv}$ The entropy value of each second-grade index is calculated as H:

H = (2.3380, 2.2875, 2.4379, 2.2904, 2.3857)

Finally, according to the $β = \frac{1 - H_{v}}{m - \sum_{v = 1}^{m} H_{v}}$ calculations the objective weights of the second-grade indexes can be obtained, as Table 8 shows.

β_{I I} = (0.1985, 0.1910, 0.2134, 0.1915, 0.2056)

Table 8.

Results of Objective Weighting of Second Grade Indexes.

Second grade indexes	Total homework amount and duration	Normative degree of tutoring	Quality of education and teaching	Degree of collaborative linkage	Level of after-school services
Objective weighting	0.1985	0.1910	0.2134	0.1915	0.2056

Similarly, the objective weights of each second-grade index and its subordinate third grade indexes can be calculated according to the above methodology.

Determination of Combined Weights

Combining the analytic hierarchy process (AHP) and the entropy weight method to implement comprehensive weight allocation for evaluation indexes can effectively circumvent the limitations imposed by a single assignment method (Jiang et al., 2024). The Lagrange multiplier method is used to determine the comprehensive weight, and the subjective weights α and objective weights β are combined and assigned, which can reflect the importance of the indexes and make the weight value of each index more objective and reasonable. Expanding the illustration with the process of comprehensive weighting of second grade indexes, the steps are as follows:

One is from the Lagrange multiplier formula $φ_{i} = \frac{\sqrt{α_{i} β_{i}}}{\sum_{i = 1}^{m} \sqrt{α_{i} β_{i}}}$ It is obtained by calculating the comprehensive weight of the second-grade indexes:

ϕ_{I I} = 0.1935, 0.1469, 0.2856, 0.1471, 0.2269

The same reason for the above formula can be calculated to obtain the composite weight of the third grade indexes.

Second, based on the comprehensive weights of the second-grade indexes and the formula $φ_{I I, a, b} = \frac{ϕ_{I I, a, b}}{\sum_{i = 1}^{n} ϕ_{I I, a, b}}$ calculate the composite weights of the second-grade indexes under each first-grade index:

\begin{matrix} φ_{I I, 1} = (0.3091, 0.2347, 0.4562) \\ φ_{I I, 2} = (0.3933, 0.6077) \end{matrix}

Third, based on the above results and the formula $ϕ_{I I I, i, j} = ϕ_{I I, i} * φ_{I I I, i, j}$ Calculate the global comprehensive weights of the third-grade indexes. Where $ϕ_{I I I, i, j}$ is the global comprehensive weight of the jth third grade index under the ith second grade index; $ϕ_{I I, i}$ is the comprehensive weight of the ith second grade indexes; $φ_{I I I, i, j}$ is the composite weight of the jth tertiary index under the ith second grade index. Calculation can be obtained:

\begin{matrix} ϕ_{I I I} = (0.0395, 0.0517, 0.0629, 0.0394, 0.0225, 0.0260, 0.0334, \\ 0.0290, 0.0164, 0.0196, 0.0471, 0.0872, 0.0639, 0.0874, 0.0861, \\ 0.0610, 0.0375, 0.0212, 0.0480, 0.0277, 0.0355, 0.0234, 0.0336) \end{matrix}

Finally, according to the comprehensive weights of the second grade indexes, the formula $ϕ_{I, h} = \sum_{i = 1}^{n} ϕ_{I I, h, i}$ Calculate the comprehensive weight of the first grade indexes, where $ϕ_{I I, h, i}$ is the composite weight of the ith second grade index under the hth first grade index. For the first-grade indexes, their comprehensive weights are equal to the composite weights, that is, $φ_{I} = ϕ_{I}$ .

The comprehensive weights of the above second and third grade indexes are summarized and ranked according to the weights of the indexes, and the weights of the index system for evaluating the effectiveness of the implementation of the “Double Reduction” policy are finally obtained, as Table 9 shows.

Table 9.

Index Weights by Level.

First grade indexes	First grade index weights $φ_{I}$ ( $ϕ_{I}$ )	Second grade indexes	Composite weight of second grade indexes $φ_{I I}$	Comprehensive weight of second grade indexes $ϕ_{I I}$	Ranking of second grade indexes	Third grade indexes	Composite weights of third grade indexes $φ_{I I I}$	Comprehensive weight of the three levels of indexes $ϕ_{I I I}$	Ranking of the three levels of indexes
Degree of “Double Reduction”	0.6260	Total homework amount and duration	0.3091	0.1935	3	Perfection degree of homework management mechanism	0.2041	0.0395	10
						Homework amount	0.2670	0.0517	7
						Quality of homework design	0.3253	0.0629	5
						Guidance for homework completion	0.2036	0.0394	11
		Normative degree of tutoring	0.2347	0.1469	4	Strictness of agency approvals	0.1533	0.0225	20
						Training service behaviors standardization	0.1767	0.0260	18
						Standing Operations Supervision	0.2277	0.0334	15
						Regulation of training fees	0.1971	0.0290	16
						Control of tutoring advertising	0.1118	0.0164	23
						Extent of reduction in academic tutoring	0.1334	0.0196	22
		Quality of education and teaching	0.4562	0.2856	1	Degree of balanced high quality development of compulsory education	0.1648	0.0471	9
						Quality of classroom teaching	0.3053	0.0872	2
						Degree of high school enrollment reform	0.2239	0.0639	4
						Changes in the quality evaluation system	0.3060	0.0874	1
Effectiveness of Supporting Governance	0.3740	Degree of collaborative linkage	0.3933	0.1471	5	Home-school cooperation	0.5851	0.0861	3
		Degree of collaborative linkage	0.3933	0.1471	5	Deepening sectoral linkages	0.4149	0.0610	6
		Level of after-school services	0.6077	0.2269	2	extracurricular time	0.1650	0.0375	12
						After-school service hours	0.0936	0.0212	21
						Quality of after-school services	0.2117	0.0480	8
						After-school service conditions	0.1219	0.0277	17
						After-school service channels	0.1565	0.0355	13
						Free Online Learning service	0.1033	0.0234	19
						Utilization of resources on and off campus	0.1480	0.0336	14

Empirical Measurement Results

Calculate the Comprehensive Evaluation Vector

The results of the questionnaires were analyzed by considering the four regions as a whole, determining the set of fuzzy comprehensive evaluation indexes according to the constructed evaluation index system, and designing five levels of evaluation for each index, that is, V = {completely agree, relatively agree, generally agree, relatively disagree, completely disagree}. Taking the third-grade index “perfection degree of homework management mechanism” under the second-grade indexes “total homework amount and duration” as an example, 44.17% of the returned questionnaires think that they “totally agree,” 30.68% think “relatively agree,” 19.63% think “generally agree,” 4.29% think “relatively disagree,” and 1.23% think “completely disagree.” So, it can be determined that the vector of the degree of affiliation of the index is:

r_{111} = (0.4417, 0.3068, 0.1963, 0.0429, 0.0123)

In the above equation, the $r_{111}$ denotes the index affiliation vector of the 1st first grade index under the 1st second grade index under the 1st third grade index.

Similarly, the affiliation vectors of other third grade indexes under the second grade index “total homework amount and duration” are obtained as follows $r_{112}, r_{113}, r_{114}$ , construct the membership degree matrix $R_{11}$ for the subordinate third-level indicators under “total homework amount and duration”:

R_{11} = [\begin{matrix} \begin{matrix} \begin{matrix} 0.4417 & 0.3068 \end{matrix} & \begin{matrix} 0.1963 & 0.0429 & 0.0123 \end{matrix} \end{matrix} \\ \begin{matrix} \begin{matrix} 0.4172 & 0.2822 \end{matrix} & \begin{matrix} 0.2209 & 0.0368 & 0.0429 \end{matrix} \end{matrix} \\ \begin{matrix} \begin{matrix} 0.4233 & 0.3374 \end{matrix} & \begin{matrix} 0.1841 & 0.0429 & 0.0123 \end{matrix} \end{matrix} \\ \begin{matrix} \begin{matrix} 0.4663 & 0.3497 \end{matrix} & \begin{matrix} 0.1472 & 0.0307 & 0.0061 \end{matrix} \end{matrix} \end{matrix}]

Where $R_{11}$ denotes the affiliation matrix of the 1st second grade index under the 1st first grade index.

Therefore, according to the formula $B_{11} = φ_{I I I, 1} \cdot R_{11}$ It is possible to calculate the vector of comprehensive evaluation ( $B_{11})$ of the second-grade index “total homework amount and duration”:

B_{11} = (0.4342, 0.3189, 0.1889, 0.0388, 0.0192)

Where $B_{11}$ denotes the comprehensive evaluation vector of the 1st second grade index under the 1st first grade index. According to the above method, the comprehensive evaluation vector of the other 2nd second grade indexes is calculated as $B_{12}$ 和 B₁₃.

Since the affiliation vector of the second-grade indexes is the comprehensive evaluation vector of the second-grade indexes, the above evaluation vectors are combined to form the affiliation matrix ( $R_{1})$ of the second-grade indexes under the first-grade indexes “degree of ‘double reduction’”:

R_{1} = [\begin{matrix} \begin{matrix} \begin{matrix} 0.4342 & 0.3189 \end{matrix} & \begin{matrix} 0.1889 & 0.0388 & 0.0192 \end{matrix} \end{matrix} \\ \begin{matrix} \begin{matrix} 0.4707 & 0.3235 \end{matrix} & \begin{matrix} 0.1609 & 0.0270 & 0.0179 \end{matrix} \end{matrix} \\ \begin{matrix} \begin{matrix} 0.4395 & 0.3447 \end{matrix} & \begin{matrix} 0.1685 & 0.0243 & 0.0230 \end{matrix} \end{matrix} \end{matrix}]

Therefore, according to the formula $B_{1} = φ_{I I, 1} \cdot R_{1}$ , the comprehensive evaluation vector $(B_{1})$ of the first-grade indexes “degree of ‘Double Reduction’” can be calculated:

B_{1} = (0.4452, 0.3318, 0.1730, 0.0294, 0.0206)

Using the same methodology, a composite evaluation vector $(B_{2})$ for the first-grade indexes “Effectiveness of supporting governance” was calculated as follows:

B_{2} = (0.4547, 0.3164, 0.1700, 0.0338, 0.0251)

Because the first-grade indexes affiliation vector is the comprehensive evaluation vector of the first-grade indexes, the affiliation matrix ( $R)$ of the first-grade indexes is:

R = [\begin{matrix} \begin{matrix} \begin{matrix} 0.4452 & 0.3318 \end{matrix} & \begin{matrix} 0.1730 & 0.0294 & 0.0206 \end{matrix} \end{matrix} \\ \begin{matrix} \begin{matrix} 0.4547 & 0.3164 \end{matrix} & \begin{matrix} 0.1700 & 0.0338 & 0.0251 \end{matrix} \end{matrix} \end{matrix}]

Therefore, according to the formula $B = φ_{I} \cdot R$ B, the comprehensive evaluation vector of the effectiveness of the implementation of the “Double Reduction” policy in the four regions can be calculated:

B = 0.4488, 0.3260, 0.1719, 0.0310, 0.0223

Calculation of a Comprehensive Evaluation Score

After the comprehensive evaluation vector B of the effectiveness of the “Double Reduction” policy was derived, the study assigned scores to each of the five levels of the scale V = {completely agree, relatively agree, generally, relatively disagree, completely disagree}, giving “completely agree” 90 points, “relatively agree” 70 points, “generally” 50 points, “relatively disagree” 30 points, “Comparatively disagree” is 30 points, and “Completely disagree” 10 points, that is:

V = {90, 70, 50, 30, 10}

Then, according to the principle of weighted average, by the formula $F = B \cdot V^{T}$ calculated by the formula, the comprehensive evaluation of the effectiveness of the implementation of the “Double Reduction” policy in the four regions scored 72.96 points. In order to carefully analyze the shortcomings of the implementation of the “Double Reduction” policy, the study also assigns scores to the degree of affiliation of the third grade indexes according to the above method, as Table 10 shows. From the results of the empirical study presented above, the overall effectiveness of the “Double Reduction” policy implementation across the four regions is relatively positive. Notably, the outcomes have been particularly impressive regarding guidance on homework completion and the regular supervision of tutoring. However, this does not imply that there is no room for improvement. This study indicates that, concerning the total homework amount and duration completed by students, the degree of balanced high quality development of compulsory education, the conditions of after-school services, and the availability of diverse channels for after-school services, the level of acceptance remains inadequate. There is an urgent need for further optimization and enhancement in these areas.

Table 10.

Third Grade Indexes Scores.

Second grade indexes	Third grade indexes	Score	Arrange in order
Total homework amount and duration	Perfection degree of homework management mechanism	72.45	18
	Homework amount	69.88	21
	Quality of homework design	72.33	19
	Guidance on completing homework	74.79	1
Normative degree of tutoring	Strictness of agency approvals	73.80	10
	Training Service Behavioral Norms	74.04	8
	Standing Operations Supervision	74.66	2
	Regulation of training fees	74.29	3
	Control of tutoring advertising	74.07	6
	Control of tutoring advertising	73.68	11
Quality of education and teaching	Degree of balanced high quality development of compulsory education	69.75	22
	Quality of classroom teaching	74.17	5
	Degree of high school enrollment reform	73.68	11
	Changes in the quality evaluation system	73.31	14
Degree of collaborative linkage	Home-school cooperation	73.19	15
Degree of collaborative linkage	Deepening sectoral linkages	72.70	17
Level of after-school services	extracurricular time	72.82	16
	After-school service hours	74.29	3
	Quality of after-school services	74.05	7
	After-school service conditions	70.74	20
	After-school service channels	69.75	22
	Free Online Learning Services	73.68	11
	Utilization of resources on and off campus	73.93	9

Discussion

Empirical results from four Chinese provinces (Jiangsu, Shandong, Shanxi, Anhui) show the “Double Reduction” policy’s overall implementation effect is “good” (comprehensive score = 72.96/90, ≥70 threshold with 95% CI: 71.23–74.69). Two second-grade indexes—“quality of education and teaching” (weight = 0.2856) and “level of after-school services” (weight = 0.2269)—are the most influential. However, critical gaps persist: only 38% of respondents agreed with homework amount reductions, 36% with balanced compulsory education development, and 35% with diversified after-school service channels (scores < 70).

The study focuses on “after-school program quality” aligns with the OECD’s (2022) Education Policy Outlook, which identifies “student well-being” as a core metric for educational policy success. China’s emphasis on integrating after-school services with burden reduction extends this framework by linking service quality to reducing reliance on private tutoring, a challenge understudied in OECD contexts. Additionally, the index “degree of balanced high-quality development of compulsory education” resonates with UNESCO’s “Education for All” goals (UNESCO, 1990), but the study addresses a global gap: while UNESCO prioritizes equity in access, this research quantifies equity in burden reduction—a unique dimension for non-Western countries with heavy tutoring cultures (e.g., India, South Korea).

This study advances policy evaluation theory by proposing a three-dimensional evaluation framework (burden reduction degree, governance effectiveness, sustainability readiness)—a novel adaptation for non-Western contexts. Unlike Western models (e.g., CIPP Model) that prioritize process compliance, this framework centers on “effect-oriented” assessment (e.g., linking tutoring regulation to actual student burden reduction) and embeds “sustainability” (via indexes like “home-school collaboration”). This contributes to localizing Policy Instrument Theory, as it demonstrates how policy tools (e.g., after-school services, homework inspection) interact with cultural contexts (e.g., parental expectations of tutoring) to shape outcomes—filling a gap in global literature on non-Western policy evaluation.

Beyond guiding Chinese local governments, the findings offer actionable insights for emerging economies balancing burden reduction and quality improvement: for countries with large tutoring markets (e.g., South Korea, Turkey), prioritize “normative degree of tutoring” (weight = 0.1469) and link regulation to school-based alternatives (e.g., extended after-school programs); for economies with regional educational disparities (e.g., Brazil, Indonesia), replicate the “balanced high-quality development” index to monitor equity in burden reduction, supported by digital tools (e.g., big data education monitoring platforms). That said, this study has limitations: data from only four Chinese provinces may not capture urban-rural gaps (e.g., rural schools’ limited after-school service resources) or regional variations in tutoring cultures, and fuzzy comprehensive evaluation relies on subjective teacher ratings, which may underrepresent parent/student perspectives. Future research could address these gaps by conducting cross-cultural studies (e.g., comparing China’s “Double Reduction” with India’s tutoring regulations) to test the framework’s generalizability, or using longitudinal data (3–5 years) to assess policy sustainability—examining whether short-term burden reduction translates to long-term educational quality improvement.

When interpreting these findings, several methodological limitations should be considered. The primary reliance on teachers’ self-reported data poses a risk of social desirability and recall biases, potentially affecting the accuracy of ratings on issues like after-school service quality or homework duration. Furthermore, the absence of objective verification indicators (e.g., official school records on homework, tutoring institution registration data) or triangulation with data from students and parents means the findings rely solely on subjective perceptions. Finally, while the four-province sample offers valuable regional insights, it may not fully capture the significant urban-rural disparities and regional variations in tutoring cultures across China, limiting the immediate generalizability of the results. Future research should therefore prioritize cross-cultural comparative studies (e.g., comparing China’s “Double Reduction” with similar regulations in countries like India or South Korea) to test the framework’s global applicability. Additionally, employing longitudinal designs and mixed-methods approaches that incorporate objective data and the perspectives of students and parents would be crucial to validate these initial findings and assess the long-term sustainability of the policy’s effects.

Conclusion

This study has developed and preliminarily validated a unified, theory-driven evaluation system for China’s “Double Reduction” policy. While the findings are constrained by the methodological limitations of reliance on self-reported data and a regional sample, as discussed, they nonetheless make three significant contributions to the field of educational policy evaluation.

First, it constructs the first systematic evaluation index system for “Double Reduction” implementation effectiveness, comprising 2 first-grade, 5 second-grade, and 23 third-grade indexes. This system resolves the issue of fragmented “self-constructed scales” in domestic research and provides a structured tool for holistically measuring both burden reduction and the effectiveness of supporting governance mechanisms.

Second, it demonstrates the practical utility of the fuzzy comprehensive evaluation method in the context of educational policy assessment. By integrating subjective (AHP) and objective (entropy weight) weighting techniques, the method effectively handles the inherent fuzziness of educational outcomes, offering a replicable approach for evaluating policies with complex, multi-stakeholder impacts.

Third, the study provides a valuable cross-cultural reference for global burden-reduction policy evaluation. The proposed three-dimensional framework (burden reduction, governance, sustainability), grounded in the empirical context of China—an emerging economy with a massive shadow education sector—helps to fill the gap in non-Western policy evaluation models.

In light of its contributions and acknowledged limitations, this research serves as a critical foundation for future inquiry. It advances the localization of policy evaluation theories in non-Western contexts and ends with a clear call for more robust, longitudinal, and cross-cultural research to further refine the tools for assessing educational burden-reduction policies worldwide.

Footnotes

ORCID iDs

Yadong Ding

Jing Li

Ethical Considerations

This study was approved by the Institutional Review Board of the authors’ university. All procedures complied with the ethical standards of the 1964 Helsinki Declaration and its subsequent amendments, as well as guidelines for educational research involving human participants. Participant privacy and data security were strictly maintained, with no collection of personally identifiable information.

Consent to Participate

Informed consent was obtained from all participants prior to their involvement. Participants were fully informed of the study’s purpose, methods, duration, and data usage. They were also advised of their right to withdraw at any time without consequence. Consent was obtained electronically for expert and teacher participants, and verbally confirmed for parent participants.

Author Contributions

Yadong Ding contributed to the overall conception, design of the manuscript, literature search, and article writing. Bing Zhou contributed to data collection, processing and analysis, and Jing Li contributed to the planning, design and implementation of the entire study. All authors contributed to the article and approved the submitted version.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Education Sciences Planning of China (Grant No. CKA250315).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Bae

S. H.

Choi

K. H.

(2024). The cause of institutionalized private tutoring in Korea: Defective public schooling or a universal desire for family reproduction? ECNU Review of Education, 7(1), 12–41. https://doi.org/10.1177/20965311231182722

Choi

Á.

(2016). Regulating private tutoring consumption in Korea: Lessons from another failure. International Journal of Educational Development, 49, 144–156. https://doi.org/10.1016/j.ijedudev.2016.03.002

Ehrenreich

(1990). Fear of falling: The inner life of the middle class. HarperCollins.

Howlett

(2009). Governance modes, policy regimes and operational plans: A multi-level nested model of policy instrument choice and policy design. Policy Sciences, 42, 73–89. https://doi.org/10.1007/s11077-009-9079-1

Huang

Wang

Z. W.

Yao

Y. P.

Yang

X. F.

(2021, September 16). After the implementation of the “Double Reduction” policy, 72.7% of surveyed parents said their educational anxiety had been reduced. China Youth Daily. https://doi.org/10.38302/n.cnki.nzgqn.2021.003505

Jiang

(2024). Dynamic evaluation of water resources carrying capacity in Shandong province based on the comprehensive weight and TOPSIS model. Resources Science, 46(3), 538–548. https://doi.org/10.18402/resci.2024.03.08

Kirkpatrick

J. D.

Kirkpatrick

W. K.

(2016). Kirkpatrick’s four levels of training evaluation. Association for Talent Development Press.

Landis

J. R.

Koch

G. G.

(1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.1080/23311886.2015.1104977

Liu

J. Y.

(2023). Can the “Double Reduction” eliminate the demand for academic after-school training?—An empirical study based on the perspective of bounded rationality. Journal of East China Normal University (Educational Science Edition), 43(9), 71–84. https://doi.org/10.16382/j.cnki.1000-5560.2023.09.005

10.

Nunnally

J. C.

(1978). Psychometric theory (2nd ed.). McGraw-Hill.

11.

OECD. (2022). Education policy outlook 2022: Transforming pathways for lifelong learners. OECD Publishing. https://doi.org/10.1787/c77c7a97-en

12.

Z. Y.

Q. Y.

Zhang

J. Y.

(2023). Has the burden of students’ homework eased in the context of “Double Reduction”—An empirical survey based on 1786 students in 11 provinces in western China. China Educational Technology, 44(10), 73–81.

13.

Saaty

T. L.

(1980). The analytic hierarchy process. McGraw-Hill.

14.

Sahlberg

(2021). Finnish lessons 3.0: What can the world learn from educational change in Finland?Teachers College Press.

15.

She

Que

M. K.

Yang

K. Y.

Shan

D. S.

(2022). Management of students’ burden in the stage of basic education in China: “Double Reduction” policy and the long-term mechanism construction. Journal of Management World, 38(07), 163–170. https://doi.org/10.19744/j.cnki.11-1235/f.2022.0090

16.

Stufflebeam

D. L.

(2003). The CIPP model for evaluation. In Kellaghan

Stufflebeam

D. L.

(Eds.), International Handbook of Educational Evaluation: Kluwer International Handbooks of Education (Vol. 9, pp. 31–62). Kluwer Academic Publishers.

17.

UNESCO. (1990). World declaration on education for all: Meeting basic learning needs. [Declaration]. World Conference on Education for All, Jomtien, Thailand.

18.

Wang

L. M.

(2021). The “Double Reduction” policy should be an opportunity to return to the laws of education. People’s Education, 72(17), 12.

19.

Xue

(2023). What is the value essence of “double reduction” (Shuang Jian) policy in China? A policy narrative perspective. Educational Philosophy and Theory, 55(7), 787–796. https://doi.org/10.1080/00131857.2022.2040481

20.

Yang

D. G.

(2023). The importance, effectiveness, problems and countermeasures of “Double Reduction”—Starting from the “Rat Race” of compulsory education stage. Journal of Shanghai Normal University (Philosophy & Social Sciences Edition), 52(3), 108–115. https://doi.org/10.13852/J.CNKI.JSHNU.2023.03.011

21.

Z. Q.

(2023). Classroom teaching confusion, attribution and suggestions of primary school mathematics teachers under the background of “Double Reduction”—Qualitative research based on NVivo. Journal on Mathematics Education, 4, 78–84.

22.

Zhang

Chen

S. J.

(2020). Emergency capability evaluation of power grid system based on ahp and fuzzy comprehensive evaluation. China Work Safety Science and Technology, 16(2), 180–186. https://doi.org/10.11731/j.Issn.1673-193x.2020.02.029

23.

Zhang

X. J.

Yang

F. H.

(2023). From “composition” to “generation”: The formation argumentation and direction of the implementation of the “Double Reduction” policy. Journal of Tianjin Normal University (Social Science), 50(1), 88–93.

24.

Zheng

H. Y.

Zhou

(2021). Understanding the essence of ‘Double Reduction’, following educational principles, and enhancing the quality of educational publishing services. View on Publishing, 27(20), 9–13. https://doi.org/10.16491/j.cnki.cn45-1216/g2.2021.20.002

25.

Zhou

(2023). The practical dilemmas and solutions of the “Double Reduction” policy. Cooperative Economy and Science, 39(15), 174–176. https://doi.org/10.13665/j.cnki.hzjjykj.2023.15.063

26.

Zhou

Lei

Shen

(2023). Education burden reduction, family education investment, and education equity. China Economic Quarterly International, 3(3), 179–194. https://doi.org/10.1016/j.ceqi.2023.09.001

27.

Zhu

Y. M.

(2021). “Double Reduction”: Renewal cognition, institutional innovation and reform action. Nanjing Journal of Social Sciences, 32(11), 141–148. https://doi.org/10.15937/j.cnki.issn1001-8263.2021.11.016

The “Double Reduction” Policy in China: Evaluation Index System Construction on the Implementation Effectiveness

Abstract

Plain Language Summary

Keywords

Introduction

Literature Review

Policy Connotation of the “Double Reduction,” Policy: Domestic Evolution and Global Context

Evaluation Dimensions of Educational Burden-Reduction Policies: International Models and Domestic Gaps

Methodological Limitations in “Double Reduction,” Evaluation: Domestic Shortcomings and International Misalignments

Methods

Selection of Texts

Basic Information on Experts

Empirical Measurement Sample Selection

Ethical Approval and Procedures

Results

Construction of the Evaluation Index System

Results of Expert Advice

Weighting Results

Determining Subjective Weights

Determination of Objective Weights

Determination of Combined Weights

Empirical Measurement Results

Calculate the Comprehensive Evaluation Vector

Calculation of a Comprehensive Evaluation Score

Discussion

Conclusion

Footnotes

ORCID iDs

Ethical Considerations

Consent to Participate

Author Contributions

Funding

Declaration of Conflicting Interests

Data Availability Statement

References