Sage Journals: Discover world-class research

Abstract

The application of the language proficiency scales (LPS) in education validates its function, as it can explore the value in-depth. However, little systematic research on applying LPS has been conducted due to the complex intertwining of stakeholders and a lack of theoretical framework and practical approaches. Adopting the framework proposed by Y. Jin and Jie (2020), this study explored how the Common European Framework of Reference (CEFR) and China’s Standards of English (CSE) were used and impacted various stakeholders in the education context. The literature search was taken from WoSCC, Scopus, and CNKI from 2018 to 2022. Qualitative content analysis was used for systematic review. Results showed that policymakers used LPS in education policy guidance; teachers applied them as the benchmark of diagnostic assessment to get accurate language profiles of students and create new approaches to teaching; students used them as goal-setting guidance and self- or peer assessment criteria to track progress; test developers aligned them with tests to obtain reliable results; curriculum designers tailored descriptors and scales from CEFR to develop new curricula, align, or revise the existing ones; researchers used LPS as references to develop new rubrics, frameworks and assessing models. This study could provide insight in scientific application of LPS. However, it focused mainly on the CEFR and CSE with a framework for exploring the impact of language testing. Studies containing more scales and theorizing the framework of aftereffects of LPS should be encouraged.

Keywords

language proficiency scales systematic literature review stakeholders education context application

Introduction

Language proficiency scales (LPS) are extensively used for different purposes, such as course, syllabus and materials design (Nikolaeva, 2019), language learning, teaching, and assessment (J. Liu & Yang, 2021). The application value of LPS is empowered when it is designed, whether it is learning-oriented, assessor-oriented, or constructor-oriented (Jones, 2014). It is significant to study the application of LPS in the educational context. Evidence collected within the first few years after its publication is of prime importance and a key decision-making basis for the further implementation of the scale (Zhu, 2016) because it can tap the value of LPS in-depth and discover its positive role and impact on different stakeholders.

Exploring the application of LPS is a validation of its usefulness. The feedback from the stakeholders can back up the scale’s validity and provide evidence for its revisions or adaptation (Y. Jin & Jie, 2020). For example, the Common European Framework of Reference (CEFR) is the most widely used LPS for planning and evaluating curricula, certifications, examinations, and textbooks (Byram, 2020). After its publication, studies on its impacts amounted (Brunfaut & Harding, 2020; Byram, 2020; Green, 2018; Sahib & Stapa, 2022), revealing its strengths and weakness in education. The feedback contributed to the release of The Companion Volume (CV) in 2020 (Council of Europe, 2020). Exploring the application of LPS can also offer a better understanding of its usefulness in teaching, learning, testing, and curriculum design, providing enlightenment for teachers, students, testers, and policymakers, as the exploration can demonstrate how challenges are addressed and offer new perspectives on moving the field further (Harsch, 2014).

There are some influential language proficiency scales widely used for many years in international language education, such as ILR (Interagency Language Roundtable), ACTFL (American Council on the Teaching of Foreign Languages), CLB (Canadian language benchmarks), and CEFR (Common European Framework of Reference) (Zhou & Liu, 2021). CEFR is the most influential officially published scale worldwide and is widely used for curricula planning and evaluating, assessment, textbooks development, teaching, and learning.

China’s Standards of English Ability (CSE) in 2018 is a new scale developed to coordinate teaching, learning, and assessment (J. Liu, 2017). It is the first officially published scale in China, which owns the world’s largest population of English learners (Bolton & Bacon-Shone, 2020; H. Liu, 2016). After publication, CSE has been widely applied from primary schools to colleges (M. Liu & Huang, 2019; M. Liu & Liu, 2022; J. Liu & Yang, 2021; Peng & Liu, 2021; Xiong & Liu, 2020).

Considering their influence and number of users, this study selects CEFR and CSE as representatives of LPS to explore their application and impact on education. Research exploring the use of CEFR and CSE has included inclusive stakeholders and approaches. However, up to now, no systematic literature review has been conducted to collect evidence of their usefulness from different stakeholders. To bridge this gap, this study aims to tap the value of LPS in the educational domain by systematically reviewing related papers and providing a solid foundation for future academic research in education.

Framework of Systematic Review

It is challenging to learn how LPS is used in the educational domain because there is a lack of theoretical frameworks and practical approaches. The general method is adopted from language testing because they share much in common. Bailey (1996) proposed a basic model that identified participants, processes, and products (3Ps) which may influence or be influenced by washback in language testing. Based on this 3Ps theory, Y. Jin and Jie (2020) constructed a model to study the application and impact of the CSE speaking scale, as shown in Figure 1.

Figure 1.

Framework of t exploring the application of LPS.

This model illustrates seven types of stakeholders and the impact of LPS from educational and social domains. The solid arrows (numbers 1, 2) represent the impact of scales on stakeholders and their education practice; the dotted ones (marked 3, 4) indicate the washback of applied research to the scale. While X and Y refer to impacts other than the education domain. According to this model, the impact of LPS in education can be explored from six different stakeholders, as shown in Figure 1: stakeholders 1 to 6 from top to bottom are in the education domain, and the seventh stakeholder is concerned with how societies select talent by using scales.

Although Jin’s framework is designed for speaking scale, it contains the core elements of the mechanism of how the impacts are generated: how the stakeholders use LPS in teaching, learning, testing, and selecting talents, and the effects of the implementation. Hence, this study adopted this framework, and the research route is outlined in Table 1.

Table 1.

Method of Exploring the Impact of LPS.

Perspective	Stakeholders	Guidance of research
Application of LPS in Education	Policymakers	How does the use of LPS affect educational policies?
	Learners	How does the use of LPS affect students learning.
	Teachers	How does the use of LPS affect instruction.
	Curriculum designers and resource developers	How does the use of LPS affect curriculum design and resource development.
	Tester-developers	How does the use of LPS affect test development.
	Researchers	How do researchers improve or operationalize LPS.

Since this study focused on implementing LPS in the educational domain, it will explore the application of LPS by stakeholders in education and their effects. The social effect was excluded. The questions guiding this study were:

How is LPS used in the educational domain?

What is the effect of using LPS?

Method

Resources and Database

To answer these questions, this study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses protocol (PRISMA), and three databases were selected: China Knowledge Network (CNKI) core journals, WOS (core collection), and Scopus. Since CSE is a scale applied in China, literature on this range would likely be better studied and indexed by this CNKI database. And to ensure the quality of articles, only core collections were selected. WoS, the majority of which is a core collection(Carloni et al., 2018), is one of the two important and most comprehensive sources of publications and impact indicators worldwide (Pranckutė, 2021). The other is Scopus. WoS CC and Scopus are trusted publisher-independent global high-citation databases (Baas et al., 2020; Birkle et al., 2020) containing many peer-reviewed, high-caliber academic journals published worldwide. These databases could provide useful tools for systematic literature review.

PRISMA

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Statements) is the most commonly used reporting guideline for systematic reviews (Page et al., 2021). The methods and results are reported in sufficient detail to enable users to evaluate the applicability and credibility of the review findings. Besides, the PRISMA statement can make systematic review reporting more transparent, comprehensive, and accurate. Hence, it enables a thorough search for information and scientific techniques relevant to the use of LPS in education. The retrieval process is illustrated in Figure 2.

Figure 2.

An overview of the search protocol based on the PRISMA statement.

Systematic Review Process

Identification

This systematic review was initiated in 2022 and mainly involved choosing keywords for information search. In the CNKI database, “CSE,”“China’s Standard of English Proficiency Scales,” and “application” were used as keywords. In WoSCC and SCOPUS, “Common European Framework of Reference for Languages,”“application,” or “CEFR,” or “China’s Standard of English Proficiency Scales,” were used as keywords. For this review focused on the application of language proficiency scales in the education context, “validation” was excluded (See Table 2). As a result, 1,501 papers have been detected searching any studies in which the keywords were stated. Fifty-five papers were retrieved from CNKI, 689 from Scopus, and 757 from WOS, respectively, as seen in Figure 2.

Table 2.

Keywords and Information Search Strategy.

Database	Keywords
CNKI (PKU core journals and CSSCI journals)	“Language proficiency scale” OR “CSE,” OR “China’s Standard of English Proficiency Scales,”“application”
Web of science (core collection)	TS = (common European framework of reference for languages)) AND (application) OR TS = (China’s Standard of English Proficiency Scales)) NOT TS = (validation))
Scopus	TITLE-ABS-KEY Common European Framework of Reference for Languages OR China’s Standard of English Proficiency Scales AND NOT validation

Screening (Inclusion and Exclusion Criteria)

Following the inclusion and exclusion criteria in Table 3, the second stage of the systematic literature review involved screening. The first criterion was time, which was controlled in 5 years (from 2018 to April 2022) for the following considerations: limiting the research within 5 years can ensure the freshness of the literature; besides, CSE was released in 2018, and the application came after its publication; additionally, the finalized ‘CEFR Companion Volume with New Descriptors’ was also published this year (Eaquals, 2018).

Table 3.

Inclusion and Exclusion Criteria.

Criterion	Included	Excluded
Timeline	2018–2022	<2018 and >2022
Language	English and Chinese	Other languages
Research field	Psychology, social science, education and educational research, linguistics	Other research fields

The second criterion is language. Only English and Chinese publications in Scopus, WoSCC, and CNKI databases were included to overcome the distortion of meaning caused by the translation. Third, this research was refined in psychology, education, and linguistics. Other research fields were excluded, given the relevance of the articles published. As a result, 45 articles remained from CNKI core journal database, 110 from Scopus, and 192 from WOS, as seen in Figure 2.

Eligibility

Eligibility refers to the authors’ manual inclusion or exclusion of literature considering criteria in line with the research question and the study objectives. Among these 356 items, 23 papers from Scopus duplicated with WoSCC were deleted. Therefore, 333 papers were retained for manual appraisal. In this process, the authors reviewed the abstracts and full text to confirm their relevance: all the research should focus on applying LPS in education. Studies focusing on the development, validation, perception, and semantic analysis of CEFR and CSE were excluded. Reviews were also removed. Finally, 48 papers (14 articles from CNKI core journals, 9 from Scopus, and 25 from WOS) were retained for review. Sixteen of them focused on the application of CSE, and 32 on CEFR.

Categorizations

The 48 articles that remained were categorized by the authors following the framework stated in Figure 1 and Table 1. By qualitative context analysis of the abstracts and full texts, these articles were categorized into six groups according to stakeholders: policymakers, teachers, learners, curriculum and resource developers, testers, and researchers.

Synthesis and Findings

This section aims to unveil how different stakeholders apply LPS. After synthesis, results indicated that the application for CEFR covers five groups of stakeholders, except students; CSE focused on four groups, excluding policymakers and curriculum and resource developers. Among the reviewed articles, 8 papers concentrate on the policymakers (CEFR only);12 on teachers (8 for CEFR, 4 for CSE); 17 on testers (11 on CEFR, 3 on CSE, and 3 on them both); 3 on curriculum and resource developer (CEFR), 3 on students (CSE), 5 on researchers (2 on CEFR and CSE respectively, 1 article covers both), as seen in Figure 3. The effects of their application are also described in the following paragraphs.

Figure 3.

Number of grouped articles.

Policymakers

CEFR is claimed to provide a common basis for elaborating language syllabuses, curriculum guidelines, and teacher development. The studies in Table 4 showed the adaptation of CEFR in different countries. Nishimura-Sahi (2020) analyzed the educational trends and domestic needs for practical communicative proficiency in English to increase Japan’s economic competitiveness on the global stage. The CEFR-Japan was developed and successfully implemented by assembling various actors-government officers, researchers, commercial actors, administrators, and teachers. The author suggested that CEFR should be borrowed selectively to serve as a viable solution to further long-term educational and political agendas. To ensure its viability, all actors—different stakeholders and publishing houses and materials (such as guidelines and books) should be brought together.

Table 4.

LPS Were Used by Policymakers.

Authors	Focus	Methods	Results
Nishimura-Sahi (2022)	Analyze the “context-specific reasons” for CEFR borrowing in the Japanese context	Qualitative content analysis of the policy documents	The CEFR was borrowed selectively from different stakeholders. The CEFR served as the framework of the new curriculum for the course of study; besides, the CEFR reference levels were adapted to reform university entrance examinations.
Brunfaut and Harding (2020)	How Luxembourg’s educational contexts may influence standard-setting practices using the CEFR	Thematic analysis	Four key sources of influence on the adoption of the CEFR: Luxembourg’s distinct language ecology, streamed schooling, national curriculum, and ongoing exam reform project.
Savski (2019)	How the CEFR was used in Thai and Malaysian	Literature method	Implementing communicative language teaching (CLT) in Thailand and Malaysia has been unsuccessful; three other alternatives are outlined.
Nguyen and Hamid (2021)	Explore what factors induced the employment of the CEFR in Vietnam	Document analysis	The following conditions induced the employment of CEFR: English language policy changes, the need for economic and political innovations, the initiatives to reform higher education, and administrators’ tendency to solve domestic issues by looking outward.
Piccardo et al. (2019)	Successful strategies for introducing CEFR in Canada and Switzerland	Mixed method: qualitative interview and quantitative	Teacher education and CEFR-based examinations are essential to present the CEFR project
Franz and Teo (2018)	Teacher’s perception of the introduction of CEFR by moe of Thailand	Grounded Theory Methodology, qualitative analysis	CEFR was introduced as an assessment tool for teachers, where 94% failed to reach the targeted level of B2. Moreover, it was not applied to classroom teaching or learners’ assessment.
Aziz et al. (2018)	Problems of implementing CEFR in pre-primary and secondary schools in Malaysia	Qualitative	Teacher training needs improvement. All stakeholders must be adequately synchronized, aware of their responsibilities, and updated on the most recent information. Superficial training should be complemented by more support from the government.
Deygers et al. (2018)	The impact of CEFR on European university entrance policies, tests, and testers	Qualitative	The B2 level is the most adopted, and CEFR levels are frequently abused for marketing purposes or to restrict university entrance.

Savski (2019) justified how to use CEFR in Thai and Malaysia. The communicative orientation was unsuccessful in these two countries, and post-communicative philosophy should be advocated. He proposed that content and activities should be developed for learners to reflect on their identities as individuals and members of society. The content-based instruction (CBI) approach matched closely with the action-based concept of CEFR, thus having great potential as an alternative to CLT. Besides, policymakers should also consider how to empower students with CEFR criteria.

Nguyen and Hamid (2021) explained the historical and social context of adopting CEFR in Vietnam. They claimed that the following factors contributed to accelerating the adoption of CEFR in local milieus: English language policy changes, the need for economic and political innovations, the initiatives to reform higher education, and administrators’ tendency to solve domestic issues by looking outward. Nguyen and Hamid’s study demonstrated how the CEFR unfolded on the ground and interacted with the local educational context. It also highlighted the importance of global standards attached by educational actors at different levels. Piccardo et al. (2019) probed into the successful strategies of introducing CEFR in Canada and Switzerland. Results demonstrated that teacher education and CEFR-based examinations were the most important practices.

However, the nexus to CEFR is not always successful and enjoyable. Luxembourg is a critical case presenting the conflicts between international language proficiency standards and local realities (Brunfaut & Harding, 2020). The distinct language ecology, streamed schooling, national curriculum, and ongoing exam reform project limited the setting practice of using the CEFR. A dogmatic approach to CEFR as a common currency cost high in this country. Hence, a better way of theorizing should be proposed to incorporate local knowledge into the standard-setting process without compromising procedural validity when international standards collide with local educational cultures.

Another example is the introduction of CEFR in Thailand (Franz & Teo, 2018).

CEFR failed in its postulated aims of teaching in basic education and teachers’ linguistic and instructional skills. Most instructors felt that CEFR was introduced as a tool for evaluating their proficiency scales, not in classroom teaching and assessment. Additionally, they claimed this tool was suitable for Europeans, not for them, as they failed to meet the targeted B2 levels, causing them to lose face.

Aziz et al. (2018) revealed problems with implementing CEFR in pre-primary and secondary schools in Malaysia, where teacher training still needed improvement. They stated that all stakeholders must be adequately coordinated, aware of their roles, and informed of recent developments. Superficial training should be complemented by more support from the government. Deygers et al. (2018) explored the impact of CEFR on European university admission exams. Their findings indicated that B2 is the most adopted level of university entrance. However, the CEFR levels are frequently abused for marketing purposes or to restrict university entrance.

Teachers

Teachers are the primary users of LPS, as seen in Table 5. Generally, they use LPS in assessment and teaching.

Table 5.

Application of LPS by Teachers.

Authors	Focus	Method	Results
Mazlaveckienė (2018)	Assessing English grammar proficiency in terms of CEFR scales in a university in Lithuania	Qualitative	Lithuanian English Philology students often have a limited repertoire of grammatical structures ranging from level B1 to B2.
Zhao and Zhao (2023)	Teachers and learners co-constructed writing criteria based on CEFR to improve learning	Quantitative	The collaborative effort increased the viability and application of the ELP descriptors, and developed students’ cognitive and metacognitive knowledge, and their skills on setting up assessment criteria and evaluating their performance against the criteria.
Shi and Zheng (2021)	Apply CSE-based intelligent autonomous diagnostic APP in English teaching in China	Quantitative	The teaching mode assisted by an adaptive learning system is conducive to implementing formative assessment, the effect of the mode is remarkable, and students have high satisfaction with the teaching mode.
He et al. (2021)	Cognitive diagnosis models (CDM) based on CSE to assess the writing abilities of Chinese undergraduates	Quantitative	Diagnostic results could distinguish masters from non-masters. Students in high proficiency group were higher than ow proficiency students for all attributes.
Zhong (2019)	Apply CSE in English listening and speaking course in China	Qualitative	Integrating CSE in teaching can enhance vocational college students’ sense of learning responsibility and produce customized learning objectives and strategies.
Xiong and Liu (2020)	Use CSE in teaching adult English in open learning for ESP	Qualitative	The course tailed for ESP adult learners based on CSE proved effective in enhancing students’ interests and learning outcomes.
Rehner et al. (2021)	How CEFR training impacts teachers’ French Instruction	Quantitative: retrospective reports	Teachers shifted their planning priority and time, classroom delivery, and assessment practices after the CEFR-related training.
Choong et al. (2021)	Assessment of Grade 5 and 6 pupils before and after the introduction of CEFR amid COVID-19	Qualitative	Before the introduction of the CEFR, not all teachers conducted speaking assessments. However, their teaching and assessment changed as the CEFR emphasized the need for teaching and conducting speaking examinations.
Poonpon et al. (2022)	Develop a model named TIGA based on CEFR and Thailand’s basic education and core curriculum for low English proficiency students in rural secondary schools	Quantitative	The results revealed a significant difference in the experimental and control groups’ English abilities. The research revealed that the teaching strategy might encourage and engage low-ability students in improving their English proficiency.
Juan Muñoz Andrade	Use CEFR to facilitate learning in universities in Seville	Report	CEFR was used as a methodological and evaluative tool to chart students’ progress and give feedback. Students’ language proficiency was greatly improved, and they were more confident in speaking.
Sidhu et al. (2018)	The use of CEFR-aligned school-based assessment (SBA) in the Malaysian primary ESL classroom	Mixed method	SBA implementation was far from formative assessment; teachers held positive attitudes toward SBA but had limited comprehension of the CEFR-aligned ESL curriculum. They offer little or no feedback on tasks. Students were discouraged from reflecting on their work, and no self- and peer assessment was found.
Yüce and Mirici (2022)	Implementation of CEFR self-assessment in EFL classes in secondary education in Turkey	Qualitative method	The checklist of self-assessment based on CEFR was provided at the end of each unit in the textbooks; however, they were in low compatibility with CEFR and were not implemented by teachers.

For Assessment

Mazlaveckienė (2018) used the CEFR grammar scales to assess Lithuanian English Philology students. Results indicated that these students had a limited repertoire of grammatical structures ranging from level B1 to B2. It shed light on important trends in developing English Philology students’ foreign language competency in Lithuania. Zhao and Zhao (2023) explained how teachers and students in China co-constructed writing assessment criteria based on CEFR. The findings supported the efficacy and significance of developing these criteria for improving learners’ cognitive and metacognitive knowledge of writing and assessing. They highlighted the importance of learners’ competence in developing assessment criteria and implementing a future-drive self-assessment using the CEFR or LPS in local settings.

Shi and Zheng (2021) developed an intelligent diagnostic learning APP based on CSE, in which sports majors practise English adaptively. T-tests and questionnaires revealed their effectiveness in motivating and improving students’ learning outcomes. He et al. (2021) used CSE-based Cognitive diagnosis models (CDM) to assess the writing abilities of Chinese undergraduates. The linear logistic model analysis demonstrated that diagnostic results could distinguish masters from non-master and facilitate learning by increasing students’ competency through feedback and remedial activities. As the authors stated that using CSE for diagnostic purposes could provide methodological support for using a CDM-based approach in diagnostic assessment; it could also provide diagnostic feedback for L2 learners to improve learning.

For Teaching

The above articles concern how LPS were used in assessment, while the following articles focused on teaching practice. Zhong (2019) practised CSE in listening and speaking course in a vocational college by constructing a model combining self-assessment, peer assessment, AI assessment, and teachers’ assessment. This model successfully enhanced students’ sense of learning responsibility and produced customized learning objectives and strategies. It stressed the importance of applying CSE as guidance in teaching planning and Instruction. Xiong and Liu (2020) emulated the reform of ESP teaching for open universities in China based on CSE. The adapted teaching content in CSE, with references to students’ work backgrounds and assessment criteria based on CSE descriptors, made this course suitable for learners in distance education. And this method proved effective in motivating their interest and enhancing the learning outcome. Rehner et al. (2021) showed how K-12 teachers’ planning, classroom delivery, and assessment practices change after CEFR-related professional learning. Teachers prioritized speaking and listening with less time allotted to writing and reading after learning CEFR; they also shifted their focus away from previous attention to language structure and error correction toward real-life situations; besides, they focused more on students’ sociolinguistic and pragmatic competencies. As for assessment, teachers prioritised functional competence and pragmatic and sociolinguistic appropriateness, contrary to the initial focus on grammatical accuracy and orthographic control. These shifts in teachers’ planning, classroom delivery, and assessment practices after CEFR-related training signaled an apparent change of their grammar-based model to an action-oriented approach in which language learning took place in genuine communication in an authentic everyday situation.

Likewise, the study of Choong et al. (2021) illustrated how CEFR affects primary school English teachers’ behaviors in Japan. Before introducing the CEFR, not all teachers conducted speaking assessments and teaching. Their concepts of Instruction and evaluation changed with the incorporation of CEFR in the elementary English curricula. Thus, speaking was highlighted in teaching and assessment.

In Thailand, Poonpon et al. (2022) reported a model named TIGA based on CEFR and core curriculum for low English proficiency students in rural secondary schools. Results from their quasi-experiment found a significant improvement in the experimental group. This model emphasized the importance of teaching strategy in engaging low-proficient students, especially the authenticity of learning tasks.

Infante Mora et al. (2019) reported how the CEFR was used as a methodological and evaluative tool to chart students’ progress and give feedback at a university from Seville. Students’ language proficiency was greatly improved, and they were more confident in speaking. This report proved that feedback based on standards is crucial in learning, and teachers’ role as facilitators should be amplified.

Like the nexus of CEFR with local context policies, not all cases are enjoyable. Despite the introduction of CEFR in education policy, some ESL teachers in Malaysia (Sidhu et al., 2018) and Turkey (Yüce & Mirici, 2022) had limited comprehension of CEFR-aligned curriculum and could not use CEFR properly. Moreover, students were discouraged from reflecting on their work based on the tasks. More work should be done to overcome ESL teachers’ constraints and help them bridge their knowledge between policy and practice.

Learners

Table 6 reveals how students used CSE. Zhang and Wang (2022) explored the scaffolding role of the CSE writing scale in college students. Results showed that with the intervention of CSE, students’ assessing ability, writing skills, and learning confidence were significantly improved. Li (2022) examined the effect of CSE-based peer assessment and task value on Chinese undergraduates’ self-regulated learning (SRL). Results indicated that learners’ SRL was significantly improved. Another study from He and Zhang (2021) incorporated the CSE in diagnostic assessment, goals setting, and remedial instruction to facilitate learning. The quasi-experiment indicated significant improvement in listening skills. Students’ self-report demonstrated that they held positive attitudes about this approach to learning, especially the function of CSE in setting SMART (specific, measurable, attainable, realistic, and time-bounded) goals.

Table 6.

Application of LPS by Learners.

Study	Focus	Method	Results
Zhang and Wang (2022)	Scaffold CSE in college English writing and the effect of its application	Mixed method: correlational analysis of students’ self-assessment and quantitative study of students’ report	Students’ self-assessment and writing abilities improved significantly; their learning confidence also improved.
Li (2022)	Use CSE-based peer assessment and task value in writing	Quantitative method: ANOVA	Assessment for learning based on CSE and task value significantly improved students’ writing ability and enforced self-regulated learning.
He and Zhang (2021)	Incorporate CSE in self-diagnostic assessment, goal setting, and remedial instruction and learning	Quasi-experiment and qualitative (students report)	Students’ listening skills were significantly improved, and they held positive attitudes about this approach to learning, especially the function of CSE in setting SMART goals

Course Designers and Resource Developers

Table 7 shows how curriculum and resource developers use LPS. The study by Mohamed (2021) offers a practical model for constructing a CEFR-aligned curriculum. First, it should be essentially action-oriented and concentrate on supporting students in putting their competence into practise. Second, branching each CEFR level into two sub-levels (e.g., A2 into A2.1 and A2.2) would be handy; courses designed to help learners track and monitor their progress could improve their sense of achievement and motivate them. Third, introducing different themes and integrating similar functions that produce equivalent results can be more successful. Forth, a grammar syllabus should be practical and accessible for classroom learning. Mohamed provided an example of options and modifications that teachers may need to consider in implementing CEFR in their contexts. Kalnberzina (2018) compared the intercultural component in secondary and tertiary education curricula. It revealed the compatibility of these documents despite their differences in terminology, context, and level of impact; Little (2018) explored how the CEFR was adopted in designing a curriculum framework for Irish immigrant primary schools. Part of the CEFR scales and descriptors were tailored to the Irish context. And the mediation skills in the CEFR can supplement the deficiency of analytical thinking and problem-solving abilities in secondary education.

Table 7.

How Curriculum and Resource Developer Use LPS.

Author	Focus	Methods	Results
Mohamed (2021)	Compiling a list of salient features for curriculum development that would be a basis for designing a framework for a CEFR-aligned Arabic curriculum in UK universities	Inductive research approach using qualitative, interpretive methods.	The study described the context and technique for developing a CEFR-aligned Arabic curriculum framework using a collection of curriculum salient features from the CEFR.
Kalnberzina (2018)	Aligning intercultural components in the English curricula for secondary and tertiary education in Latvia with the CEFR	Documentary analysis.	Despite the differences in terminology, context, and level of impact, these documents were generally compatible.
Little (2018)	How the CEFR was applied in the design of a curriculum framework in Irish primary schools	Report	Borrowing part of scales and descriptors from CEFR to develop the Irish curriculum.

Testers-Developers

Tester developers generally use LPS to align tests and different frameworks as a criterion for rating.

Alignment

Table 8 indicates that tester developers mainly use LPS to align tests like IETSL and TOEFL in the international arena, large-scale tests in specific contexts, and in-house tests. Also, alignments between LPS were conducted.

Table 8.

Application of LPS by Tester-Developers.

Authors	Focus	Methods	Results
Green (2018)	The general alignment of CEFR with IETSL, TOEFL, CAE, PET-A.	Qualitative analysis of IELTS, TOEFL, IBT PTE-A CAE documents	Testing agencies seldom used CEFR categories to interpret test content; they depicted the relationships between their tests and the CEFR in different terms and reached conflicting conclusions about the correlation between test scores and CEFR levels.
Fleckenstein et al. (2020)	Alignment of CEFR with TOEFL rubrics in upper secondary education in Germany and Switzerland	The standard-setting methodology was used to establish the linkages.	The TOEFL test results can be meaningfully expressed within the framework of the CEFR, which underlies educational standards in both countries.
Hidri (2021)	Alignment of CEFR with International English Language Competency Assessment (IELCA) in listening, reading, speaking, and writing	Familiarization, specification, standardization training, and benchmarking standard setting, and validation	The five linking stages explained that the IELCA suite examinations’ four levels, B1, B2, C1, and C2, onto the CEFR, providing fair judgments and informed decisions about this mapping task’s practical consequences and validity arguments.
Peng (2021)	Alignment of CSE with CEFR writing scales	Rasch model analysis	CSE levels 1 and 2 correlate primarily to CEFR levels A1 and A2, 3 to A2, 4 and 5 to B1, 6 to B2, 7 to C1, 8 to C1 and C2, and levels 9 to C2 and above.
Peng and Liu (2021)	Alignment of CSE with CEFR listening scales	Rasch model analysis	CSE level 1 matches mainly to the CEFR A1 level, level 2 to A2, level 3 to A2 and B1, level 4 to B1, level 5 to B1 and B2, level 6 to B2, level 7 to C1, and level 9 to C2.
Peng et al. (2022)	Aligning CSE with CEFR	Rasch model analysis	CSE level 1 corresponds primarily to the CEFR level below A1, level 2 to A1, level 3 to A2, level 4 and level 5 to B1, level 6 to B2, level 7 to B2 and C1, level 8 to C1 and C2, and level 9 to C2.
Wang (2020)	Alignment of CSE with in-house English proficiency tests in reading, listening, and writing skills	Correlation analysis of students’ self-assessment and teachers’ assessment based on the CSE description	The seven reported levels of the SJTU-EPT can be linked to the CSE levels four to eight.
Min and Jiang (2020)	Alignment of the listening subtest of an in-house English test and CSE	In the standard setting, the Modified Angoff Method, Contrasting Groups Method, and Multi-Facet Rasch Analysis	The listening subtest of the in-house tests aligns with level 5 of CSE; the two standard-setting approaches produce congruent results.
Harsch and Kanistra (2020)	Align the Integrated Skills of English (ISE) suite in Trinity College London to the CEFR	item-descriptor-matching (IDM) method, Cronbach alpha, and Multi-faceted Rasch Modeling analysis	High agreement for task judgments, acceptable reliabilities and consistency for examinee-centered ratings, and varying levels of agreement for descriptor choices.
Baharum et al. (2021)	Alignment of CEFR with English Language Competence Score Average (ELCSA) in a university in Malaysia	Quantitative method: correlative analysis	The results showed a significant positive correlation that varied in strength, with writing the strongest correlation.
Sufi et al. (2021)	Mapping English writing skills tests in English Proficiency Tests (EPT) with CEFR in International Islamic University of Malaysia (IIUM)	Quantitative method: correlation analysis	EPT writing bands correlated positively to scales of the CEFR.
Shak and Read (2021)	Aligning English for Occupational Purposes (EOP) meeting assessment in Malaysia to the CEFR level.	Qualitative: NVIVO-coding	A revised set of language assessment criteria was introduced; results showed how the scoring criteria could be aligned with the CEFR scale through a systematic comparison of language functions generated in the meeting task.
Shermis (2018)	Provide a crosstalk between CEFR and autorotated writing evaluation (AWE) system	Regression model approach	The CEFR traits and their machine scoring system aligned in fluency, coherence, and accuracy. While traits of range and interaction were less well aligned.
Al Habbash et al. (2021)	Alignment of CEFR standards with Emirates Standardized Test (EmSAT) in the United Arab Emirates and IELTS	Quantitative and qualitative	EmSAT and IELTS are not rigorously aligned with the CEFR standards. Furthermore, the EmSAT mostly aligned with the lower levels of the CEFR, whereas the IELTS mostly aligned with the higher levels of the CEFR.
Jie (2019)	Alignment of CET-CET 4 with speaking scales of CSE	Multi-facet Rasch model	Through the test task analysis, the panelists could select relevant descriptors. Following thorough training, they demonstrated good consistency and accuracy at each level of standard setting.
HoIzknecht et al. (2018)	Raters from Finland and Austria use the CEFR-based rating scale to measure students’ writing abilities	Rasch model analysis	Although the Austrian raters were marginally more lenient than the Finnish raters, the range of disagreement was tiny. Thus, these two teams mostly agreed upon the participants’ CEFR levels.
Silveira and Martins (2020)	How experienced raters use CEFR holistic and analytic scales to assess oral proficiency progress in English as a second language	Quantitative method, correlation analysis of analytical and holistic tests	Significant positive correlations existed between holistic and analytic assessment, and raters rated consistently with analytical scales. A better speaker performance across time is detected in fluency, while pronunciation and grammar improvement was insignificant.

Alignment With International Tests

Fleckenstein et al. (2020) aligned the writing rubric of TOEFL with CEFR in a standard-setting methodology in Germany and Switzerland. Results indicated that the TOEFL test could be meaningfully expressed within the framework of the CEFR. However, the study by Green (2018) showed that IETSL, TOEFL, CAE, and PET-A test agencies made little use of CEFR categories to explain test content and arrived at conflicting conclusions about the test scores and CEFR levels. Among these tests, PTE-A was the only one that defined “at” a level in terms of success likelihood in relation to “Can Do” descriptors for users; others’ band boundaries did not correlate directly to CEFR levels. He highlighted the importance of content and quality of assessment procedures. Hidri (2021) aligned the CEFR with International English Language Competency Assessment (IELCA) in listening, reading, speaking, and writing. He also demonstrated that the alignment of five major stages (familiarization, specification, standardization training, and benchmarking standard setting, and validation) could provide abundant evidence of dependable results and made the skills and items in test more specific to reflect the CEFR descriptors.

Hidri prioritized using CEFR to map tests by addressing different mapping stages. It could help teachers effectively use the CEFR descriptors to align IELCA tests and empower them to implement curriculum activities in class transparently and coherently.

Alignment With In-House Tests

Wang (2020) showed how the CSE levels 4 to 7 aligned with SJTU-EPT (Shanghai Jiaotong University English Proficiency Tests). The scores based on the descriptors of the CSE from the teachers and students claimed that the B and B+ levels in SJTU-EPT corresponded with level 6 in CSE, and the C and C+ aimed at level 5. Most of the A-level students corresponded with level 7, and D aimed at level 4 in CSE. Min and Jiang (2020) aligned an in-house English test subtest at Zhejiang University in China with CSE with Modified Angoff Method and the Contrasting Groups Method. The congruent results indicated that the tests align with level 5 of CSE.

Harsch and Kanistra (2020) aligned the ISE suite in Britain, Trinity College London, to the CEFR with an item-descriptor-matching (IDM) method and a complementary benchmarking approach. Results showed high agreement for task judgments, acceptable reliabilities and consistency for examinee-centered ratings, and varying levels of agreement for descriptor choices. In Malaysia, scholars aligned CEFR with Competence Score Average (ELCSA), English Proficiency Tests (EPT) in universities, and English for Occupational Purposes (EOP) meeting assessments (Baharum et al., 2021; Shak & Read, 2021; Sufi et al., 2021). Results showed an overall positive correlation between CEFR and these in-house tests, proving the acceptability and credibility of these tests. Shermis (2018) established a crosstalk between CEFR and one automated writing evaluation system (AWE) in America. The CEFR traits and their machine scoring system were clearly aligned in fluency, followed by coherence, and accuracy. While traits of range and interaction were less well aligned. The author highlighted that operationalizing “good writing” and advocating the traits in CEFR could help machine scoring accurately.

Alignment With Large-Scale National Tests

In China, CET-4 is a test with the largest population. Most college students attend this test. Jie (2019) aligned CET-4 with speaking scales of CSE, demonstrating good consistency and accuracy at each standard-setting level. Another alignment made by Al Habbash et al. (2021) revealed that the large-scale tests–Emirates Standardized Test (EmSAT). In United Arab Emirates was not rigorously aligned with the CEFR standards.

Alignment Between CEFR and CSE

A series of alignments between these scales were made by the same author, proving that CSE level 1 corresponds primarily to the CEFR level below A1, level 2 to A1, level 3 to A2, level 4 and level 5 to B1, level 6 to B2, level 7 to B2 and C1, level 8 to C1 and C2, and level 9 to C2 (Peng, 2021; Peng & Liu, 2021; Peng et al., 2022). As the author stated: these studies contributed to the internationalization of the Chinese assessment system and provided references for the alignment of language standards and language education in China.

Rating

Another type of application concerns how raters use CEFR scales to assess learners’ proficiency levels. Silveira and Martins (2020) explored how experienced raters used CEFR holistic and four analytic scales (vocabulary, grammar, fluency, and pronunciation) to measure students’ oral proficiency progress. The results demonstrated that the five scales were positively correlated, and raters were consistent in using these criteria. However, subscales detected significant progress in fluency only. Grammar and pronunciation improvement was hardly seen. The authors claimed that even in a communicative teaching context, grammar and pronunciation should be emphasized to coordinate the development of subcomponents of oral proficiency. The research from HoIzknecht et al. (2018) also showed that raters in Finland and Austria might differ in leniency; they agreed to a large extent on the CEFR levels of the participants if they are trained and experienced in the CEFR-based rubric.

Researchers

Researcher concerns with the development and improvement of scales (Y. Jin & Jie, 2020). As seen in Table 9, a Study by Y. Liu et al.(2021) demonstrated how the English proficiency standard for adult learners in open universities (OUSE) is developed based on CES and CEFR and how this scale is used to certify their learning outcomes. The authors reported the specific steps of designing and applying this scale, showing that the OUSE provided a benchmark for assessing adult learners in open universities. Ma and Chen (2021) constructed a pragmatic competence assessment model and standard based on the CSE, complementing the current assessment. Yang et al. (2021) developed Typical Interpreting Activity Scales (TIAS) based on CSE.

Table 9.

Application of LPS by Researchers.

Authors	Focus	Methods	Results
Ma and Chen (2021)	Construct a pragmatic competence assessment model and standard based on the CSE	Mathematical modeling, Delphi method, quantitative method	A college English pragmatic competence assessment model was developed and tested in practice. This model proved to be complementary to current assessment forms.
Y. Liu et al. (2021)	Developing a framework of English proficiency standards at the open university of China based on CSE and CEFR	Delphi method, Rasch analysis	A five-level scale was developed, and the policies and procedures for accrediting adult learners’ English learning outcomes were formulated.
Yang et al. (2021)	Developing Typical Interpreting Activity Scales (TIAS) based on CSE	Quantitative analysis	Interpreting activities were categorized into eight groups; the “Can do” descriptors present interpreting performance at different topics, interpreting models, and skills in ascending levels.
Yannakoudakis et al. (2018)	Developing an Automated Writing Placement System for ESL Learners based on CEFR full scales	Quantitative method	The system is developed to assess learners’ proficiency levels on the CEFR scale. This model was incorporated into Cambridge English Write & Improve system to offer diagnostic feedback for learners.
Schmidt et al. (2019)	Developing a guidebook and tools to implement the CEFR for course design	Report	The CEFR-related resources were thematically rearranged based on the following function: curriculum and course design, assessment, and learner autonomy as a guidebook.

Yannakoudakis et al. (2018) developed an Automated Writing Placement System for ESL Learners based on CEFR full scales. This model was incorporated into Cambridge English Write & Improve system to offer diagnostic feedback for learners, facilitating self-assessment, tutoring, and improvement in learning. Schmidt et al. (2019) developed a guidebook and tools to implement the CEFR for course design, simplifying the implementation of CEFR, and fostering the novice use of it.

These studies operationalized standards in LPS and guided language teaching and assessment from theoretical to practical levels.

Discussion

This paper reviewed and summarized the application of LPS in education, focusing on the CEFR and CSE. After eligibility, 48 articles met our inclusion criteria; 16 studies were about CSE, and 32 were on the CEFR. These studies showed how LPS was applied by policymakers, curriculum designers, researchers, test developers, teachers, and students.

Policymakers

Mohamed (2021) stated that the CEFR had been used more frequently at the macro level, that is, for policy-making. The results of this review, as seen in Figure 3, confirmed this finding. The CEFR is adopted globally for its open-mindedness and vagueness, which scholars often criticize. However, this quality made the CEFR flexible for local contexts (Savski, 2019). The reviewed papers demonstrated how the CEFR was adopted selectively in different backgrounds. The interaction of CEFR with the local setting can be positively and negatively influenced by local realities. Social needs necessitated the adoption of global standards, and the government or policymaker selectively borrowed criteria from the CEFR (Nguyen & Hamid, 2021; Nishimura-Sahi, 2022).

However, the introduction of foreign LPS should be highly cautious. They should be tailored to the specific context. Otherwise, conflicts may arise. The study of Brunfaut and Harding (2020) served as an extreme case of the tension between the CEFR and the local realities. The standard setting process of Luxembourg Épreuve Commune for English was highly influenced by the local realities, such as multilingual learning ecology, streamed schooling system, national curriculum, and exam reform. The lessons of contextualization of CEFR shed some light on policy-making in other countries. When global criteria and local context collide, a better way of theorizing how to integrate local knowledge and international standard should be proposed without compromising standard-setting procedures (Brunfaut & Harding, 2020).

Besides, the introduction of LPS in policy should be accompanied by updating teaching philosophy. As Savski (2019) claimed, the incompetence of the old version of communicative teaching cannot match the concepts advocated by the CEFR in Thailand and Malaysia. New agendas for policymakers to adapt to the practicing of CEFR were proposed: alter teaching philosophy to a post-communicative concept, devitalize the teaching process, and empower students with criteria.

Furthermore, introducing foreign standards should also prioritize the training of teachers. The unsuccessful adoption and implementation of CEFR in Thailand, Malaysia, and European universities (Aziz et al., 2018; Deygers et al., 2018; Franz & Teo, 2018) prioritized the significance of teacher training and synchronization of all stakeholders. Otherwise, the CEFR would be misused or abused.

Teachers

Teachers adopted LPS as an assessment for learning. LPS are benchmarks of assessments to evaluate students’ language proficiency more accurately and pinpoint essential trends in developing students’ language competency (Mazlaveckienė, 2018). They could also be references to diagnostic assessments, from which the feedback could inform students what remedial works should be performed to improve their language proficiency. This type of application highlighted the merits of advanced psychometric techniques to provide diagnostic feedback for L2 learners (He et al., 2021; Shi & Zheng, 2021).

The application of LPS could also lead to new approaches to teaching. Zhong (2019) constructed a model combing self-assessment, peer assessment, AI assessment, and teachers’ assessment to facilitate learning. Zhong’s study exemplified the function of CSE as goal-setting in the teaching process. Shi and Zheng (2021) designed an adapted smart testing system based on CSE to meet the objective and subjective needs for the practice of a “learning, teaching, and testing” integrated teaching model. This model highlighted how diagnostic assessment could be used to facilitate learning. Xiong and Liu (2020) explored using rubrics and contents adapted from CSE in assessing and improving English proficiency in open universities. The study of Zhao and Zhao (2023) demonstrated that the collaborative process improved the feasibility and usefulness of the CEFR descriptors and developed students’ cognitive and metacognitive knowledge and skills for setting up assessment criteria. These studies proved the effectiveness of LPS as a tool in improving learning outcomes and activating learning interests.

Teachers practised LPS creatively to facilitate students’ learning, and in turn, their concepts and teaching philosophies were affected by LPS. As CEFR promoted an action-oriented approach in language teaching, it changed teachers’ concepts from grammar-oriented learning to authentic-task-based learning, input-focused to out-put-focused teaching (Poonpon et al., 2022; Rehner et al., 2021). It also gave birth to the student-centered concept, emphasising the collaborative learning process of teachers and students (Zhao & Zhao, 2023).

Students

Generally, students used LPS as self- or peer-assessment tools and the goal-setting benchmark. Self-assessment based on LPS plays a scaffolding role in learning, and improved students’ language proficiency, assessment literacy, and confidence (Zhang & Wang, 2022). When integrated into learning, LPS improves the efficacy of self- and peer-evaluation, enhances students’ self-regulation, and boosts the value of assessment for learning (Li, 2022). Moreover, LPS could guide goal-setting and offer students a benchmark to analyze and reflect on their learning progress critically and actively and remedy their learning (He & Zhang, 2021). In the long term, LPS could be an essential learning guidance and assessment tool to cultivate independent lifelong learners.

Curriculum and Resource Developers

Adopting descriptors from LPS scales to local context and adding them to an existing curriculum document should be encouraged. Compiling descriptors from the CEFR, Mohamed (2021) developed a generic concise Arabic curriculum of salient features and aligned it with CEFR. This curriculum conformed to the CEFR’s philosophy, that is, transparent, coherent, and flexible. Kalnberzina (2018) added some CEFR cultural components and standards into the secondary school curriculum to develop learners’ intercultural decision-making abilities. These two studies set good examples of complementing existing curricula by aligning CEFR standards. Both studies highlighted the importance of adapting to local context and content alignment. However, the number of retained papers indicates that studies on this type need to be fleshed out by further explorations.

Testers-Developers

Test scores alone are insufficient to support administrators or teachers in making meaningful decisions, nor can the test takers be well-informed of their proficiency levels. Aligning the scores with the “can do” descriptors in LPS is significant in teaching, learning, and assessment (Wang, 2020).

The alignment of tests to LPS levels could provide learners with a valuable sense of their current language ability and a more detailed and comprehensive view of students’ linguistic profiles (Sufi et al., 2021). With the help of a competent teacher—this alignment might form the basis for further study or remedial learning (Fleckenstein et al., 2020). It could also reflect how to use unified standards to interpret students’ authentic language proficiency, provide feedback for teaching, and back up the learning plan and objectives (Wang, 2020).

Alignment with tests also accentuates the importance of using LPS to ensure the credibility and reliability of testing results (Hidri, 2021), especially for in-house test, which differs from school to school. The same score in different schools does not claim the same level of proficiency. Aligning in-house tests with LPS can bridge this gap by measuring students’ ability more accurately with a common benchmark and promote the accreditation of academic scores in different schools (Min & Jiang, 2020); it could also provide evidence for further improvement of language tests (Baharum et al., 2021).

Studies on alignment between different LPS are also crucial. As alignments could promote the recognition of standards from other areas and cultures, highlighting the significance of language scales in use and construction, helping the mutual-recognition of different standards (Peng & Liu, 2021; Peng et al., 2022).

Furthermore, LPS is a benchmark or a tool for language assessment. Slightly modified LPS can be a valid and reliable tool for assessing language proficiency. However, training LPS standards and the raters’ language proficiency levels should be reinforced (HoIzknecht et al., 2018; Silveira & Martins, 2020). Under these premises, using the LPS descriptors for rating can ideally lead to the same results across different contexts and achieve high congruence for all scales.

Researchers

LPS can provide a theoretical framework, methodology, and source of descriptors for developing new tools. The CEFR is well-known for benchmarking the design of a contextualized language assessment framework or systems. CSE also proved to be a practical reference for constructing new scales and assessment models (Y. Liu et al., 2021; Ma & Chen, 2021; Yang et al.,2021). And the new tools help to operationalize the practice of LPS (Yang et al., 2021), simplify their implementation, and usher in novice use of LPS (Schmidt et al., 2019). Finally, the new tools could contribute to students’ reflection on their errors, tracking progress, and facilitating learning (Yannakoudakis et al., 2018).

Conclusion

This research adopted the PRISMA systematic review method for an in-depth review of 48 articles regarding how different stakeholders use LPS in the education domain and the effect of applying LPS. The findings revealed that policymakers used LPS selectively to backbone their education decisions and reforms (Nishimura-Sahi, 2020). The adoption of CEFR should consider specific social contexts, such as language ecology, streamed schooling, the national curriculum, ongoing exam reform, and concrete economic and political situation (Nguyen & Hamid, 2021). Teachers used LPS to assess students’ language proficiency to gain an overall profile of students’ competence and diagnose their problems, achieving goals of assessment for learning (He et al., 2022; Shi & Zheng, 2021). New teaching models based on LPS were also adopted to improve learning confidence and outcomes (Shi & Zheng, 2021; Xiong & Liu, 2020; Zhong, 2019). Students used LPS as a benchmark to provide feedback from their self-assessments, track their progress, and set learning goals (He & Zhang, 2021; Li, 2022; Zhang & Wang, 2022). The intervention of LPS in learning could help cultivate self-regulated learners and enhance students’ motivation and learning outcomes. Curriculum designers tailored descriptors and scales from CEFR to develop new curricula, align the existing ones, and make revisions where necessary (Kalnberzina, 2018; Mohamed, 2021). Test developers aligned LPS with tests to make the results more reliable and credible, measure students’ ability more accurately, and provide evidence for improving language tests (Hidri, 2021; Sufi et al., 2021). Raters used LPS to assess more accurately (HoIzknecht et al., 2018; Silveira & Martins, 2020). With the methodology and theoretical framework, LPS could offer references for developing new rubrics and frameworks and assessing models for researchers (Ma & Chen,2021; Yang et al.,2021).

Despite the potential in education, to ensure the successful implementation of LPS, trainers, teachers, testers, and raters alike, all the stakeholders should update their knowledge and information, improve their language skills, and take their responsibilities; the government should make more effort (Aziz et al., 2018). The application of LPS in education is a systematic project supported by wide-ranging stakeholders and updated concepts and behaviors; it calls for capturing political interests, developing contextualized relevant resources, and providing sufficient teacher training (Nishimura-Sahi, 2020).

This review offers a panoramic view of how LPS are used in education, providing evidence for the application validity LPS. However, there are some limitations in this study, for it mainly concerns CEFR and CSE. Besides, it explored the application and impact of LPS by adopting a model based on language testing for the paucity of theory in LPS application. Studies theorizing the validation of the aftereffects of other LPS should be encouraged.

Footnotes

Acknowledgements

I would first like to thank my supervisor Dr Samah Ali Mohsen Mofreh, whose expertise was invaluable in formulating the research questions and methodology. Her insightful suggestion pushed me to sharpen my thinking and brought my work to a higher level. I also would like to thank Dr Zhong for helping appraise the articles and the anonymous peer reviewers whose feedback offered insightful points in revising this article.

Author contributions

AZ writing the original draft and revising the article. SM conceived the initial idea and supervised the literature and methodology. Moreover, she revised the first complete draft. SS helped in conceiving the idea and provided guidance and supervision on the theory and empirical side. He offered some suggestions on the final revision.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Aihua Zhu

Samah Ali Mohsen Mofreh

Sultan Salem

References

Al Habbash

Alsheikh

Liu

Al Mohammedi

Al Othali

. (2021). A UAE standardized test and IELTS Vis-À-Vis international English standards. International Journal of Instruction, 14(4), 373–390. https://doi.org/10.29333/iji.2021.14422a

Aziz

A. H. A. A.

Rashid

R. A.

Zainudin

W. Z. W.

(2018). The enactment of the Malaysian common European framework of reference (CEFR): National master trainer’s reflection. Indonesian Journal of Applied Linguistics, 8(2), 409–417.

Baas

Schotten

Plume

Côté

Karimi

(2020). Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies. Quantitative Science Studies, 1(1), 377–386. https://doi.org/10.1162/qss_a_00019

Baharum

N. N.

Ismail

Nordin

Razali

A. B.

(2021). Aligning a university English language proficiency measurement tool with the CEFR: A case in Malaysia. Pertanika Journal of Social Sciences & Humanities, 29, 157–178. https://doi.org/10.47836/pjssh.29.s3.09

Bailey

K. M.

(1996). Working for washback: A review of the washback concept in language testing. Language Testing, 13(3), 257–279. https://doi.org/10.1177/026553229601300303

Birkle

Pendlebury

D. A.

Schnell

Adams

(2020). Web of Science as a data source for research on scientific and scholarly activity. Quantitative Science Studies, 1(1), 363–376. https://doi.org/10.1162/qss_a_00018

Bolton

Bacon-Shone

(2020). The statistics of English across Asia. In Bolton

Botha

Kirkpatrick

(Eds.), The handbook of Asian Englishes (pp. 49–80). Wiley. https://onlinelibrary.wiley.com/doi/10.1002/9781118791882.ch3

Brunfaut

Harding

(2020). International language proficiency standards in the local context: Interpreting the CEFR in standard setting for exam reform in Luxembourg. Assessment in Education-Principles Policy & Practice, 27(2), 215–231. https://doi.org/10.1080/0969594X.2019.1700213

Byram

(2022). Politics, origins and futures of the CEFR. The Language Learning Journal, 50(5), 586–599. https://doi.org/10.1080/09571736.2020.1845392

10.

Carloni

Tsenkulovsky

Carloni

Mangan

(2018). Web of science core collection descriptive document. https://www.google.com.hk/search?client=aff-cs-360se&ie=UTF-8&q=Web+of+Science+Core+Collection+Descriptive+Document

11.

Choong

E. E. C.

Manoharan

Rethinasamy

(2021). Speaking assessments by Japanese English teachers pre and post implementation of CEFR in the midst of a global pandemic. Pertanika Journal of Social Sciences & Humanities, 29, 335–349. https://doi.org/10.47836/pjssh.29.s3.17

12.

Council of Europe. (Ed.). (2020). Common European framework of reference for languages: Learning, teaching, assessment ; companion volume. Council of Europe Publishing.

13.

Deygers

Zeidler

Vilcu

Carlsen

C. H.

(2018). One framework to unite them all? Use of the CEFR in European university entrance policies. Language Assessment Quarterly, 15(1), 3–15. https://doi.org/10.1080/15434303.2016.1261350

14.

Eaquals. (2018, June 20). The CEFR companion volume launch conference. https://www.eaquals.org/2018/06/20/the-cefr-companion-volume-launch-conference-may-2018-tim-goodier/

15.

Fleckenstein

Keller

Kruger

Tannenbaum

Koller

(2020). Linking TOEFL iBT (R) writing rubrics to CEFR levels: Cut scores and validity evidence from a standard setting study. Assessing Writing, 43, 33–47. https://doi.org/10.1016/j.asw.2019.100420

16.

Franz

Teo

(2018). ‘A2 is Normal’– Thai secondary school English teachers’ encounters with the CEFR. RELC Journal, 49(3), 322–338. https://doi.org/10.1177/0033688217738816

17.

Green

(2018). Linking tests of English for academic purposes to the CEFR: The score user’s perspective. Language Assessment Quarterly, 15(1), 59–74. https://doi.org/10.1080/15434303.2017.1350685

18.

Harsch

(2014). General language proficiency revisited: Current and future issues. Language Assessment Quarterly, 11(2), 152–169. https://doi.org/10.1080/15434303.2014.902059

19.

Harsch

Kanistra

V. P.

(2020). Using an innovative standard-setting approach to align integrated and independent writing tasks to the CEFR. Language Assessment Quarterly, 17(3), 262–281. https://doi.org/10.1080/15434303.2020.1754828

20.

Jiang

Min

(2021). Diagnosing writing ability using China’s Standards of English Language Ability: Application of cognitive diagnosis models. Assessing Writing, 50, 1–14. https://doi.org/10.1016/j.asw.2021.100565

21.

Zhang

(2021). The application of the China’s Standards of English Language Ability in remedial instruction and learning. Foreign Language Testing and Teaching, 3, 1–11.

22.

Hidri

(2021). Linking the International English Language Competency Assessment suite of examinations to the Common European Framework of Reference. Language Testing in Asia, 11(1), 1–24. https://doi.org/10.1186/s40468-021-00123-8

23.

HoIzknecht

Huhta

Lamprianou

(2018). Comparing the outcomes of two different approaches to CEFR-based rating of students’ writing performances across two European countries. Assessing Writing, 37, 57–67. https://doi.org/10.1016/j.asw.2018.03.009

24.

Infante Mora

Muñoz Andrade

Greenwood

Feldman

Ivanchikova

Cívico Gallardo

García Saez

. (2019). Part 2: The development of an active pedagogy. Learning and Teaching, 12(3), 18–45. https://doi.org/10.3167/latiss.2019.120303

25.

Jie

(2019). Relating English speaking tests to China’s Standards of English: CET-SET 4 as a case study. Foreign Language World, 190(1), 71–80.

26.

Jin

Jie

(2020). The application and impact of language proficiency scales: A case of the CSE Speaking Scale. Foreign Language World, 198(3), 52–60.

27.

Jones

(2014). Multilingual frameworks (Vol. 40). Cambridge University Press.

28.

Kalnberzina

(2018). Intercultural/pluricultural communication construct and its levels. Baltic Journal of English Language Literature and Culture, 8, 40–55. https://doi.org/10.22364/BJELLC.08.2018.03

29.

(2022). Effects of CSE-based assessment for learning and task value on self-regulated learning in college English writing classroom. Journal of China Examinations, 12, 19–26. https://doi.org/10.19360/j.cnki.11-3303/g4.2022.12.003

30.

Little

(2018). Functional approaches to syllabus design: From the threshold level to the common european framework of reference for languages. In A. Faravani (Ed.), Issues in syllabus design (pp. 99–109). Brill.

31.

Liu

(2016). Language policy and practice in a Chinese junior high school from global Englishes perspective [PhD Thesis]. University of Southampton.

32.

Liu

(2017). China’s standards of English and English learning. Foreign Languages in China, 14(6), 4–11. https://doi.org/10.13564/j.cnki.issn.1672-9382.2017.06.002

33.

Liu

Yang

(2021). Exploring the applicability of the CSE in language testing and assessment. Foreign Language Testing and Teaching, 42(2), 1–11.

34.

Liu

Huang

(2019). Primary English story teaching based on China’s standards of English language ability. Basic Foreign Language Education, 21(3), 91–98+111.

35.

Liu

(2022). Building college English translation teaching model based on China’s standards of English language ability. Journal of Tonghua Normal University, 43(03), 139–144. https://doi.org/10.13877/j.cnki.cn22-1284.2022.03.022

36.

Liu

(2021). Development of foreign language proficiency standards for adult learners and certification of their learning outcomes based on China’s standards of English proficiency—A project on the development of English proficiency standards at the open university of China. Technology Enhanced Foreign Language Education, 199(03), 63–69+10.

37.

Chen

(2021). An empirical study on construction of the assessment model of college English pragmatic competence based on China’s standards of English. Journal of Northeast Normal University, 310(02), 151–164. https://doi.org/10.16164/j.cnki.22-1062/c.2021.02.020

38.

Mazlaveckienė

(2018). Assessment of university students’ English grammar proficiency in terms of CEFR criterial achievement levels: The case of Lithuanian university of educational sciences. Theoria et Historia Scientiarum, 15, 35–50. https://doi.org/10.12775/ths.2018.003

39.

Min

Jiang

(2020). Aligning an in-house listening test to China’s Standards of English Language Ability. Foreign Language Education, 41(4), 47–51. https://doi.org/10.16362/j.cnki.cn61-1023/h.2020.04.009

40.

Mohamed

(2021). The development of an Arabic curriculum framework based on a compilation of salient features from CEFR level descriptors. The Language Learning Journal, 51(1), 1–15. https://doi.org/10.1080/09571736.2021.1923781

41.

Nguyen

V. H.

Hamid

M. O.

(2021). The CEFR as a national language policy in Vietnam: Insights from a sociogenetic analysis. Journal of Multilingual and Multicultural Development, 42(7), 650–662. https://doi.org/10.1080/01434632.2020.1715416

42.

Nikolaeva

(2019). The common European framework of reference for languages: Past, present and future. Advanced Education, 6(12), 12–20. https://doi.org/10.20535/2410-8286.154993

43.

Nishimura-Sahi

. (2020). Policy borrowing of the common European framework of reference for languages (CEFR) in Japan: An analysis of the interplay between global education trends and national policymaking. Asia Pacific Journal of Education, 42(3), 574–587.

44.

Nishimura-Sahi

(2022). Assembling educational standards: Following the actors of the CEFR-J project. Globalisation, Societies and Education, 21(3), 392–404. https://doi.org/10.1080/14767724.2022.2037071

45.

Page

M. J.

McKenzie

J. E.

Bossuyt

P. M.

Boutron

Hoffmann

T. C.

Mulrow

C. D.

Shamseer

Tetzlaff

J. M.

Akl

E. A.

Brennan

S. E.

Chou

Glanville

Grimshaw

J. M.

Hróbjartsson

Lalu

M. M.

Loder

E. W.

Mayo-Wilson

McDonald

… Moher

(2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71

46.

Peng

(2021). Aligning the CSE with the CEFR: Level alignment in writing ability. Foreign Language World, 5, 84–93.

47.

Peng

Liu

(2021). The listening skill level alignment of the CSE with the CEFRA. Foreign Language education, 42(5), 43–50. https://doi.org/10.16362/j.cnki.cn61-1023/h.2021.05.008

48.

Peng

Liu

Cai

(2022). Aligning China’s standards of English language ability with the common European framework of reference for languages. The Asia-Pacific Education Researcher, 31(6), 667–677. https://doi.org/10.1007/s40299-021-00617-2

49.

Piccardo

North

Maldina

(2019). Innovation and reform in course planning, teaching, and assessment: The CEFR in Canada and Switzerland, a comparative study. Canadian Journal of Applied Linguistics, 22(1), 103–128. https://doi.org/10.7202/1060908ar

50.

Poonpon

Satthamnuwong

Sameephet

(2022). The effectiveness of task-based and genre-based integrated learning on English language proficiency of Thai rural secondary school students. Theory and Practice in Language Studies, 12(9), 1736–1747. https://doi.org/10.17507/tpls.1209.05

51.

Pranckutė

(2021). Web of science (WoS) and scopus: The titans of bibliographic information in today’s academic world. Publications, 9(1), 1–59. https://doi.org/10.3390/publications9010012

52.

Rehner

Popovich

Lasan

(2021). How the CEFR is impacting French-as-a-second-language in Ontario, Canada: Teachers’ self-reported instructional practices and students’ proficiency exam results. Languages, 6(1), 15. https://doi.org/10.3390/languages6010015

53.

Sahib

F. H.

Stapa

(2022). Global trends of the common European framework of reference: A bibliometric analysis. Review of Education, 10(1), 1–26. https://doi.org/10.1002/rev3.3331

54.

Savski

(2019). Putting the plurilingual/pluricultural back into CEFR: Reflecting on policy reform in Thailand and Malaysia. Journal of Asia Tefl, 16(2), 644–652. https://doi.org/10.18823/asiatefl.2019.16.2.13.644

55.

Schmidt

M. G.

Nagai

Naganuma

Birch

(2019). Teacher development: Resources and devices to promote reflective attitudes toward their profession. Language Learning in Higher Education, 9(2), 445–457. https://doi.org/10.1515/cercles-2019-0024

56.

Shak

Read

(2021). Aligning the language criteria of a group oral test to the CEFR: The case of a formal meeting assessment in an English for occupational purposes classroom. Pertanika Journal of Social Sciences and Humanities, 29(S3), 133–156. https://doi.org/10.47836/pjssh.29.s3.08

57.

Shermis

M. D

. (2018). Establishing a crosswalk between the common European framework for languages (CEFR) and writing domains scored by automated essay scoring. Applied Measurement in Education, 31(3), 177–190.

58.

Shi

Zheng

(2021). Practice of the intelligent autonomous diagnostic applications based on CSE in English teaching for sports major students. Journal of Guangzhou Sport University, 41(3), 114–119. https://doi.org/10.13830/j.cnki.cn44-1129/g8.2021.03.028

59.

Sidhu

G. K.

Kaur

Chi

L. J.

(2018). CEFR-aligned school-based assessment in the Malaysian primary ESL classroom. Indonesian Journal of Applied Linguistics, 8(2), 452–463. https://doi.org/10.17509/ijal.v8i2.13311

60.

Silveira

Martins

(2020). Assessing second language oral proficiency development with holistic and analytic scales. Ilha Do Desterro-A Journal of English Language Literatures In English And Cultural Studies, 73(3), 227–249. https://doi.org/10.5007/2175-8026.2020v73n3p227

61.

Sufi

Abu

Ibrahim

E. H. E.

(2021). Mapping IIUM students’ English language writing proficiency to CEFR. Pertanika Journal of Social Sciences & Humanities, 29(S3), 85–101. https://doi.org/10.47836/pjssh.29.S3.05

62.

Wang

(2020). Aligning school-based English proficiency tests with China’s standards of English language ability: A case study. Foreign Language World, 200(5), 72–79.

63.

Xiong

Liu

(2020). Development of foreign language proficiency standards for adult learners and certification of their learning outcomes based on china’s standards of English proficiency—A project on the development of English proficiency standards at the open university of China. Chinese Vocational and Technical Education, 199(8), 81–86.

64.

Yang

Zhang

(2021). Classifying typical interpreting activities based on CSE-interpreting competence scales. Foreign Language Research, 220(3), 96–102. https://doi.org/10.16263/j.cnki.23-1071/h.2021.03.016

65.

Yannakoudakis

Andersen

O. E.

Geranpayeh

Briscoe

Nicholls

(2018). Developing an automated writing placement system for ESL learners. Applied Measurement in Education, 31(3), 251–267. https://doi.org/10.1080/08957347.2018.1464447

66.

Yüce

Mirici

I. H.

(2022). Self-assessment in EFL classes of secondary education in Türkiye: The Common European Framework of Reference for Languages (CEFR)-based implementations. Pegem Journal of Education and Instruction, 13(1), 349–359. https://doi.org/10.47750/pegegog.13.01.38

67.

Zhang

Wang

(2022). The application of CSE in students’ self-assessment in a college English writing course: From the perspective of dynamic assessment theory. Foreign Languages in China, 19(1), 71–78. https://doi.org/10.13564/j.cnki.issn.1672-9382.2022.01.003

68.

Zhao

(2023). Co-constructing the assessment criteria for EFL writing by instructors and students: A participative approach to constructively aligning the CEFR, curricula, teaching and learning. Language Teaching Research, 27(3), 765–793. https://doi.org/10.1177/1362168820948458

69.

Zhong

(2019). The application of CSE in listening and speaking courses in vocational colleges in China. Chinese Vocational and Technical Education, 17, 92–96.

70.

Zhou

Liu

(2021). Developing a scale of language proficiency for specific purposes. Journal of Foreign Languages, 44(4), 33–41.

71.

Zhu

(2016). A validation framework for the national English proficiency scale of China. China Examinations, 8, 3–13. https://doi.org/10.19360/j.cnki.11-3303/g4.2016.08.001

The Application of Language Proficiency Scales in Education Context: A Systematic Literature Review

Abstract

Keywords

Introduction

Framework of Systematic Review

Method

Resources and Database

PRISMA

Systematic Review Process

Identification

Screening (Inclusion and Exclusion Criteria)

Eligibility

Categorizations

Synthesis and Findings

Policymakers

Teachers

For Assessment

For Teaching

Learners

Course Designers and Resource Developers

Testers-Developers

Alignment

Alignment With International Tests

Alignment With In-House Tests

Alignment With Large-Scale National Tests

Alignment Between CEFR and CSE

Rating

Researchers

Discussion

Policymakers

Teachers

Students

Curriculum and Resource Developers

Testers-Developers

Researchers

Conclusion

Footnotes

Acknowledgements

Author contributions

Declaration of Conflicting Interests

Funding

ORCID iDs

References