Abstract
In this three-phase, systematic review, we comprehensively synthesize research on bilingual education in higher education institutions over a 19-year period, starting from the initiation of this top-down educational provision across mainland China. In this context, although English has no official status, it is highly regarded and widely used as a medium of instruction alongside Chinese. We critically examine studies (n = 1,632) published in both English and Chinese outlets on academic discourse that deliberate on what was studied, how it was studied, and what the publication trend was. We argue that scholarly attention should continue to revolve around the benefits of bilingual programs, due to the lack of rigorous empirical evidence that attests to the effectiveness of bilingual education in Chinese higher education. Our review echoes similar research syntheses conducted in Europe and worldwide and is, therefore, expected to shed light on the policy implications and the practice of bilingual education in higher education on a global scale. Recommendations for future research are provided.
Keywords
Last year, that is, 2019, marks the 19th year since the Ministry of Education (MOE, 2001) officially mandated the implementation of bilingual education in higher education institutions across the People’s Republic of China. Following this initiative, institutions of higher education began to offer courses in both Chinese and another foreign language (predominantly English), particularly in fields that are directly related to national development and internationalization (e.g., biotechnology, information technology, finance, and law; MOE, 2001). Since then, bilingual education in mainland China has received both favorable attention and critical scrutiny (X. Gao & Ren, 2019). Proponents endorse this educational policy on the basis that it could help to prepare a new generation that is bilingual, bicultural, and biliterate in various disciplines. Proponents also suggest that bilingualism will make Chinese graduates more competitive in the global market, which is expected to support an increase in national power (e.g., Y. Feng, 2009; D. Zheng & Dai, 2013; Zhu & Yu, 2010). Critics, however, have presented the compelling argument that access to opportunity and equity encoded in this educational reform will result in further divides in China, by accentuating the vertical structure of society (e.g., G. Hu, 2008; G, Hu et al., 2014).
Despite such controversy, this language provision has gained popularity and momentum and has made its way into higher education in China (Y. Feng, 2005) through the subsequent government policies that promote it (e.g., MOE, 2005, 2010). According to N. Yang and Zhang (2015), there have been 150 to 200 different courses taught in Chinese and English in a handful of highly ranked universities. An elite university, located in Beijing, offered 200 disciplinary bilingual courses, including some with 100% of the instruction in English (MOE, 2017).
The evolution and expansion of bilingual courses is a product of internationalization in which English plays a crucial role as the lingua franca, resetting the de facto boundary of many English-speaking countries. Consequently, there is a growing interest in the medium of instruction (MOI) in higher education, particularly in regions where English either coexists with another official language (e.g., South Africa) or has no official status (even if it is widely used and highly regarded, such as in China and Japan; Baker & Wright, 2017; X. Gao & Wang, 2017; Palfreyman & van der Walt, 2017; Van der Walt, 2017; J. Zhao & Dixon, 2017). However, in China, the spread of bilingual education has outgrown the existing empirical research on the topic (such as its feasibility and effectiveness; G. Hu et al., 2014; Tong & Tang, 2017). Comprehensive literature reviews have been conducted to address the effectiveness of bilingual education in the United States (see a narrative review by Rossell and Baker [1996]; a best-evidence synthesis by Slavin and Cheung (2005); and a meta-analysis by Willig (1985), Greene (1997), Rolstad et al. (2005)), Europe (see a meta-analysis by Reljić et al., 2015), Australia (see a systematic review by Silburn et al., 2011), and elsewhere (see a systematic review by Macaro et al., 2018). Although the body of English-medium instruction (EMI) is largely dominated by work conducted in Europe and Asia, only three articles discuss higher education in mainland China (see Macaro et al., 2018). There is little else available on bilingual education in a Chinese, postsecondary context, perhaps because such an approach is relatively uncommon and difficult to identify in the region (e.g., literature published in non-English languages, as acknowledged by Macaro et al. (2018) and Reljić et al. (2015)).
As such, the present study arises from the need to address the aforementioned publication bias to contribute to our understanding of the complex nature of bilingual education in mainland China. This article aims to comprehensively and systematically review research that has been conducted over the past 19 years and published either in English or Chinese outlets on the subject of bilingual education (with both Chinese and English as the MOI) in Chinese higher education. To this end, we synthesized the academic discourse of what was studied, how it was studied, and what the publication trend was in the academic community, since this top-down language and educational provision came into effect.
A Terminological Choice: Bilingual Education versus EMI
It is worth mentioning that this research uses the term bilingual education instead of EMI, content and language integrated learning (CLIL), and content-based instruction (CBI) for the following reasons. First, our choice is aligned with the wording that appeared in the headings of Chinese government policy documents (i.e., shuang-yu, which translates to bi-lingual; see MOE, 2001, 2005). Although the 2001 government policy documents stated that the use of English as the MOI is encouraged, statements in 2005 and 2007 were titled “shuangyu jiaoxue” (bi-lingual instruction). Research in English-speaking countries has demonstrated that 100% English instruction may not yield the highest gain for students whose native language is not English (Lindholm-Leary, 2016; Tong et al., 2008). Furthermore, this nationwide educational undertaking intends to solidify knowledge in both languages, as well as enhance college students’ English language proficiency, which coincides with the commonly held goal of bilingual education in English-speaking environments (Y. Feng, 2005; Genesee, 1999). Although differences in ideology, as well as in sociopolitical and sociocultural implications, exist between China and other English-speaking countries, a more holistic view of bilingualism was recently conveyed in Baker and Wright’s work (2017), which found that two thirds of the world population is bilingual to some degree; their work conceives of language ability as a spectrum, with some falling into the category of incipient bilingual (Diebold, 1964) and others falling into the category of maximum bilingual (Bloomfield, 1933). Baker and Wright embraced the concept of selective bilingualism, a characteristic of learners from the language majority group, who choose to learn a second language without losing their mother tongue. Such language choice is typical in the Chinese context. On account of the many caveats that exist in language learning, we argue that bilingual education is a broad term that subsumes various forms.
Similar to the use of CLIL, a fast-developing phenomenon in Europe (Lasagabaster, 2015) that has already spread to South America (Siqueira et al., 2018), and CBI, popular in North America (Tedick & Cammarata, 2012), the term EMI more frequently refers to the language used in teaching within Asia, Europe, Africa, and the Middle East (Macaro et al., 2018; J. Zhao & Dixon, 2017). Such linguistic and policy choices reflect the elite status of the English language, as well as the socioeconomic value of being able to speak English that has been upheld in these regions. Although it does not preclude the use of learners’ native language (L1) in practice, it signifies a fondness that downplays the importance of L1 instruction, which has demonstrated promise for young immigrants in North America (e.g., Cheung & Slavin, 2012). The fundamental theoretical proposition establishes that learning a second language (L2) can be facilitated when command of a native language reaches a certain threshold (see Cummins’ Common Underlying Proficiency [CUP] theory, 1976; 1979), at which point the linguistic and content knowledge in L1 can be transferred to an L2 (see Krashen’s second-language acquisition hypotheses, 1985). For Chinese students, such a preference may infer linguistic imperialism, discount the benefit of material in their L1, and make it even more challenging to reconcile the tension between the culture and value embedded in the Chinese language versus in a foreign language such as English (Kirkpatrick, 2014; J. Liu & Fang, 2017), thereby, defeating the purpose. Finally, these terminologies reflect an approach or means of implementing bilingual programs, whereas bilingual education points to a clear goal of bilingualism and biliteracy.
Our choice of terminology for bilingual education is also congruent with the way it is defined by researchers. Among many conceptualizations and theorizations of bilingual education (Baker & Wright, 2017), we adopt the delineation of Lasagabaster’s (2015) bilingualism and Palfreyman and van der Walt’s (2017) biliteracy in our study. In addition, we regard bilingual education as the use of two languages as a shared MOI in the academic context of higher education, with the objective of promoting learners’ communicative competence (both orally and in written form) in discipline-specific knowledge.
Models of Bilingual Education in Chinese Higher Education
A traditional view of bilingual education models in China can be summarized in three ways: immersion, maintenance, and maintenance or infiltration (H. Xu, 2008). In the immersion program, which was adopted from the Canadian model, English is acquired in the process of content learning. Instructors use English as the instructional language most of the time, and textbooks and other learning materials are in English. In the transitional program, Chinese is used as the primary MOI at the initial stage and gradually transitions to English as its language of instruction. In a maintenance or infiltrative program, Chinese serves as the MOI for the majority of the time (e.g., 90%), but textbooks and materials are in English. Although the forms of bilingual education can vary significantly across different contexts, there is a general consensus that the ultimate goal of bilingual education in the Chinese context is to equip bilingual people with specialized knowledge in academic fields, so that they can use English to communicate with English-speaking specialists and professionals as needed (Y. Feng, 2005).
Reviews on Chinese–English Bilingual Education in Mainland China
After an extensive literature search, we identified six reviews of Chinese–English bilingual education in mainland China (see Table 1). Four (Fan, 2014; H. Xu, 2008; D. Zheng & Dai, 2013; Zhu & Yu, 2010) were published in Chinese outlets, and two (i.e., F. G. Fang, 2018; G. Hu, 2008) were published in English outlets. Five major themes can be drawn from this body of literature to inform our own review. First, G. Hu (2008) argued that the craze of bilingual education has perpetuated the inequalities of accessing education in Chinese society. His argument was reiterated by D. Zheng and Dai (2013) as most English–Chinese bilingual courses are implemented in top-tier universities. According to the official definition, there are 151 key universities included in “211” or “985” projects, both of which are initiatives of the Chinese government to promote higher education and world-class universities in the 21st century (MOE, 2008, 2011, 2013). G. Hu and his colleagues (2014) claimed that highly qualified bilingual instructors, with strong communicative English proficiency and overseas experience, were more likely to be attracted by elite universities that offered competitive recruitment packages with the privilege of central/local funding support, which is a significant contributor to disparities in economic, cultural, and social capitals.
Research Reviews on Bilingual Education in China.
CNKI = China National Knowledge Infrastructure.
Second, among the six reviews, only one (i.e., F. G. Fang, 2018) examined seven studies published in English. Drawing from these seven articles, Fang argued that further assessment of the benefits and cost of bilingual courses in Chinese higher education was essential. He also called for a contextualized policy that considers the landscape of multilingualism in China and provides language support and guidance to both students and instructors so as to evaluate the impact of bilingual education on students’ English language and disciplinary learning, and to unpack the future development of bilingual education in Chinese higher education. It is apparent that on this particular topic, more articles are published in Chinese than English. This observation is also confirmed by two reviews of bilingual education in Europe (Reljić et al., 2015) and worldwide (Macaro et al., 2018), in which the authors acknowledged not only the existence of a large number of studies written in Chinese and other non-English languages, but also a lack of manpower and resources to review these articles.
Third, there was a reported scarcity of empirical research. For example, D. Zheng and Dai (2013) explained that from 23 articles published between 2003 and 2012, only 17% were empirical, and only one was an experimental study and one a classroom observation. The percentage of data-driven research was even smaller in Fan’s (2014) count (i.e., 9%; n = 8), in which students’ attitudes toward bilingual education, implementation of bilingual education models, students’ command of English for academic purposes (EAPs), bilingual instructors’ code-switching, and students’ self-efficacy were studied. According to these authors, despite the wide-spread educational practices and the considerable body of theoretical investigation from diverse perspectives, many studies focus on duplicate topics that do not enrich the basis of knowledge.
Fourth, except for F. G. Fang (2018), the other five reviews discussed various models of bilingual education and students’ attitudes and perceptions toward the focal program related to students’ English proficiency and bilingual instructors’ qualification. Furthermore, four Chinese articles, while highlighting challenges in research and practice, were unequivocally positioned in a supportive stance because bilingual education in the Chinese context is transitioning from a pedagogical/curricular alternative to a policy imposition. They called for an ongoing dialogue between English teaching at colleges and bilingual instruction, as well as interdisciplinary collaboration among research experts, English instructors, and content specialists so as to improve the quality of bilingual education and its ability to support learning (Fan, 2014; H. Xu, 2008; D. Zheng & Dai, 2013; Zhu & Yu, 2010).
Finally, Zhu and Yu (2010) highlighted that a lack of breadth and depth of bilingual education research has presented more problems than solutions. They pointed to the urgent need for scientific investigation through comparative approaches and repeated measure design with advanced statistical analyses that can generate rigorous evidence on the practice of bilingual education. Similarly, G. Hu’s (2008) narrative review scrutinizes K–12 bilingual education in China. He questioned the methodological rigor of program evaluations that were filled with only favorable findings. An imperative recommendation for future research is to attend to students’ affective domains, such as motivation, interest, learning anxiety, behavior, and efficacy, as part of the evaluation system (D. Zheng & Dai, 2013).
The Present Study
However, none of the aforementioned reviews are comprehensive in the breadth of the coverage period, types of journals, themes, and the language of publication. For example, H. Xu (2008), G. Hu (2008), D. Zheng and Dai (2013), and Fan (2014) limited their search to core periodicals in either higher education or foreign language education and, therefore, excluded discipline-specific journals where a larger number of relevant articles exist. G. Hu (2008) and Zhu and Yu (2010) targeted either earlier phases of education (K–12) or K–16 without an emphasis on higher education, where it is most responsive to MOE policies and where the operation of bilingual courses is an obligation rather than a choice. Another limitation of earlier reviews is the small number of articles (i.e., n = 7 in F. G. Fang, 2018; n = 23 in D. Zheng & Dai, 2013, within a 10-year span) or lack of information (i.e., H. Xu, 2008) that might lead to a biased or ungrounded conclusion on the characteristics of scholarly investigation. Equally problematic was the absence of articulating research questions, conceptual frameworks, screening and sifting processes, and a formulation of inclusion/exclusion criteria, all of which are typical, best practices in conducting systematic literature reviews (Siddaway et al., 2019). With the exception of G. Hu (2008), the other reviews failed to engage in a substantial discussion, which is likely due to constraints on the length of articles that are normally accepted in Chinese periodicals. The only English review of published studies in mainland China (i.e., F. G. Fang, 2018) failed to specify the sources identified for the search. Finally, to our limited knowledge, no previous literature review has been conducted to systematically and comprehensively scrutinize the academic discourse and new developments of Chinese–English bilingual education in postsecondary schools in mainland China (extending from databases in both languages to avoiding the language and availability bias that could occur in research synthesis; Borenstein et al., 2009). Given the internationalization of higher education in China, such a systematic synthesis “can promote dialogue between the East and the West” (H. Guo et al., 2018, p. 13).
In this study, to address the above issue of publication bias, we synthesize research studies examining bilingual education in postsecondary education in mainland China over the past 19 years (2001–2019), through a systematic review of “the research literature using systematic and explicit accountable methods” (Gough et al., 2012, p. 261). We follow the major themes outlined in existing reviews, by considering what the publication trend was, what was studied, and how it was studied. By doing so, we present a historic, impartial, and informative account of how the academic trajectory has advanced in terms of the challenges and promises of bilingual education in Chinese colleges and universities. This review intends to shed light on the policies and practices of this educational provision in the global community.
Approach for Systematic Synthesis
Inclusion and Exclusion Criteria
In this article, we employed a multiphase systematic approach to analyze data collected through a search of two sets of databases and other online sources, to comprehensively cover articles published in both English and Chinese. We adopted and adapted a process described by H. Cooper (2009) for conducting a systematic review and leveraged the characteristics of that process that were most relevant to our purposes. Furthermore, based on the work of I. D. Cooper and Crum (2013), which identified the central role of librarians in systematic reviews of health and medical science, we collaborated with two librarians that specialized in information management, one that was a native English speaker and one that was a native Chinese speaker. With their assistance and expertise, we developed the terms required for searching appropriate sources and managing articles, as well as for documenting the search, retrieval, and archival processes. We believe that such a practice is also beneficial and applicable in educational research, as it ensures that the pursuit is both exhaustive and reliable. Inclusion criteria for screening included the following:
Public institutions of higher education in mainland China;
Research designated/entitled/described as EMI, CLIL, CBI, or bilingual education;
Research published in peer-reviewed journals and book chapters;
Research in settings where English is used as the language of instruction.
In addition to the inclusion criteria, we also applied the following exclusion criteria for screening:
Nonaccredited/private institutions of higher education or K–12;
Research on English language teaching/EAP, or English for specific purposes (ESP; unless it focused on content learning);
Master’s theses and doctoral dissertations;
Other systematic reviews, meta-analyses, meta-syntheses, and best-evidence syntheses (unless used for this article’s literature review and discussion);
Research conducted in Hong Kong, Macao, and other specially administered Chinese-speaking regions outside of mainland China;
Ethnic minority language (Tibetan, Miao, Korean, etc.) as an MOI.
Rationale of the Inclusion and Exclusion Criteria
Given the purpose of this study, which is to investigate the academic discourse of postsecondary Chinese–English bilingual education, we established the aforementioned inclusion and exclusion criteria for several reasons. First, we excluded studies conducted outside mainland China (such as Hong Kong, Macao, and other special administration regions) because the educational policies in these areas differ from those of the mainland. For example, the Basic Law of the Hong Kong Special Administrative Region (1997) of the People’s Republic of China clearly states that On the basis of the previous educational system, the Government of the Hong Kong Special Administrative Region shall, on its own, formulate policies on the development and improvement of education, including policies regarding the educational system and its administration, the language of instruction, the allocation of funds, the examination system, the system of academic awards and the recognition of educational qualifications. (p. 43)
We also included research in settings where the language of instruction was English, instead of an ethnic minority language (such as Tibetan, Miao, Korean, etc.), and excluded nonaccredited/private institutions of higher education or K–12.
Moreover, to control for the quality of studies included in our review, we set “research published in peer-reviewed journals and book chapters” as the inclusion criterion and “Master’s theses and doctoral dissertations” as the exclusion criterion because dissertations and theses are defined as gray literature that have not been published in a traditional format (Adams et al., 2017). Moreover, such unpublished literature is often hard to locate through common searching protocol/strategies; therefore, it is more difficult to archive, analyze, synthesize, and integrate (Scherrer & Preckel, 2019). We included “research designated/entitled/described as EMI, CLIL or CBI, bilingual education” and excluded “research on English language teaching/English for academic purposes (EAP) or English for specific purposes (ESP) (unless it had a focus on content learning)” because EMI, CLIL, CBI, and bilingual education share a common interest in the learning outcomes of both subject content and language proficiency (Brown & Bradford, 2018), whereas EAP and ESP place emphasis on providing students with language skills to master subject content and are more often designed as language courses in ESL/EFL settings (Airey, 2016; W. Yang, 2016).
Three-Phase Review
Informed by the themes derived from existing reviews, this review entails three phases. Phase I involves records that met the inclusion and exclusion criteria; Phase II focuses on empirical studies; and Phase III is devoted to empirical research that involves a comparison group, in which we seek to gain insight into the effectiveness of bilingual education. Unique to this study is a loosening of the restriction on empirical research in Phase I, so as to establish an optimal boundary, instead of a conservative one, and better understand the actual practice that is prevalent in bilingual programs that respond to the government initiative. In addition, Chinese journals in foreign language research and higher education publish articles that are restricted in length. Such brief articles do not normally have the space to substantively elaborate on data exploration, as typical empirical studies do (Tierney & Kan, 2016). Instead, more common forms of inquiries include recounts, essays, narratives, argumentative pieces, and reviews that may also reflect the trend of research in bilingual education. This manifestation of local scholarly literacy is oftentimes neglected as a result of internationalization, and instead redirected toward a Western intellectual tradition (Alatas, 2006; Mok, 2007).
We present a flowchart in Figures 1 (Chinese sources) and 2 (English sources) to outline the decision-making process following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Moher et al., 2010). A four-member team participated in the review process. When there was disparity, reviewers discussed it until they reached a consensus. Agreement was established at 90% on the Chinese database (Kendall’s tau value above .75, p < .001) and 95% on the English database (Kendall’s tau value above .98, p < .001), suggesting high interrater agreement (Landis & Koch, 1977).

PRISMA flowchart on Chinese sources. Note that there were originally 2,413 records retrieved from CNKI, with one written in English with Chinese abstract. Therefore, we counted this article in the English database.

PRISMA flowchart on English sources.
Research Questions
We began the process of this review with the following questions, associated with the three phases of review. Each research question is mapped onto the five themes identified in the literature review that was presented earlier in this article:
What was the publication trend? (Phase I)
What is the type of institution and funding support where bilingual education is implemented?
Is there a difference between Chinese and English publications?
What was explored? (Phase II)
3. What are students’ attitudes/perceptions toward bilingual education, and what challenges do they perceive?
4. What form of bilingual education is most popular?
How was it studied? (Phase III)
5. What are the characteristics of the research design in comparative studies (per H. Cooper, 2009)?
Type of assignment
Baseline equivalence
Threats to internal validity (e.g., confounding, selection bias)
Type, language, and evidence of validity/reliability of the outcome measure
Chinese Database
The Chinese database was built from a search conducted from January 2001 to September 2019 using the Periodical Database in the China National Knowledge Infrastructure Net (CNKI, zhongzhiwang), which is the largest integrated knowledge resource system in China. Most of the publications in this database are in Chinese, but it also includes articles that are written in English. To capture all the research that is related to bilingual education in colleges and universities, two specific criteria for inclusion were developed to index optimal references. A primary search based on the combination of controlled terms was conducted, including “bilingual education (shuangyu jiaoyu),” “bilingual instruction/teaching (shuangyu jiaoxue),” or “English instruction (yingyu jiaoxue).” The initial search yielded 2,413 bibliographic entries after being imported into Rayyan (Elmagarmid et al., 2014) for appropriateness and duplication. Each article, indexed by searching primary keywords, was assigned to four raters for a screening on title, keywords, and abstracts. During this phase, 16 duplicates were removed, and 743 articles met the exclusion criteria, resulting in 1,653 entries for the full-text review. After the second round of screening, a total of 1,572 Chinese articles and one English article (with a Chinese abstract) were retained in Phase I of this review. In Phase II, we further applied an additional criterion to exclude nonempirical studies, resulting in 271 empirical studies, among which 31 comparative (experimental/quasi-experimental/nonexperimental) research studies were eligible for the in-depth review of Phase III (Figure 1).
English Database
The English database was built using a search from January 2001 through to September 2019 from popular online libraries, which can yield the most peer-reviewed academic journals in the social science field, such as Education Resources Information Center (ERIC), Academic Search Ultimate, Education Source, and Linguistics and Language Behavior Abstracts (ProQuest). In addition, we also conducted a search on Google Scholar, as well as in journals that are most likely to publish studies on this topic (e.g., International Journal of Bilingual Education and Bilingualism). To capture all the research that is relevant to bilingual education in Chinese higher education, we conducted our search with all possible terms, including “China,” “Bilingual Education,” “Bilingual Program,” “Bilingual Schools/Student/Teachers,” “Bilingual Instructional Materials,” “Bilingualism,” and “English Medium Instruction”. To avoid translation ambiguity, we also searched for “Chinese,” “Mandarin Chinese,” “Bilingual Education Programs,” “Bilingual Teaching Materials,” “Bilingual Students,” “English Instruction/Immersion/Learner/Medium,” and “English Language Learners” as substitutable keywords. Following the same steps to screen and select articles as described above for the Chinese database, we retained 60, 30, and 3 studies in the three phases, respectively (Figure 2).
Results and Discussion
In this section, we report the main results of the synthesis of Chinese and English publications so as to describe developments in the research of bilingual education in Chinese higher education. To clarify the organization of the presented data, we have merged the discussion with the findings by presenting them in the order of each review phase: (a) what the publication trend was (Q1–Q2); (b) what was explored (Q3–Q4); and (c) how it was studied (Q5).
Phase I: Trends in Publications
In this phase, we address Research Question 1, the types of universities and funding support, and Research Question 2, the epistemological disparity between Chinese and English publications. A total of 1,632 studies, including nonempirical research, were reviewed and synthesized below.
Types of universities and funding support (Q1)
Among the 647 studies that specified the university where research was conducted, less than a quarter (18%, n = 116) were ranked as key universities in the projects of either “211” or “985,” and the rest were all nonelite universities that offered bilingual courses. A total of 286 studies claimed to receive support through central/provincial (n = 136, 48%) or local (150, 52%) funds. Figure 3 presents a comparison between key and non-key universities regarding allocation of funding. For example, resources from central/provincial government were distributed to 31 (51%) key universities and 105 (47%) non-key universities.

Percentage of funding sources by university type.
Discussion
D. Zheng and Dai (2013) reported that most of the English–Chinese bilingual courses were implemented in top-tier universities; however, according to our review, although these courses may have been initiated in more privileged institutions, they were expanded to lower tiered institutions with varying quality. As such, bilingual education has been embraced across the nation (A. Feng et al., 2017) and is not exclusively a service that is available to the elite. Furthermore, our data support the fact that research on bilingual education in colleges and universities has been funded by central, provincial, and local resources. In fact, a slightly larger proportion of financial support was distributed through local agencies, particularly to non-key universities that fall outside the top 100 national rankings. The well-balanced allocation of funds between key universities and their counterparts reflects the current practice of lower tier institutions allocating more resources for the implementation of bilingual programs. Such a trend also stands in contrast with the general critique that bilingual education perpetuates inequality in social capitals because only the elite can afford it (A. Feng et al., 2017; G. Hu, 2008; G. Hu et al., 2014), and supports Wong’s (2008) recommendation for a decentralized policy shift in terms of funding. Therefore, we contend that the issue of equal access to education should not be used as a reason to oppose bilingual education in Chinese higher education.
The epistemological disparity between Chinese and English publications (Q2)
As presented in Figure 4, over the 19-year span, there has been a growing volume of work being published in Chinese (n = 1,572, 95%) compared with in English. We discerned that 83% of these Chinese articles were nonempirical and included essays, reviews, recounts, and interpretative, reflective, commentary, or argumentative pieces. Although much smaller in number (n = 60), half (50%) of the English articles included at least some data that were acquired through a quantitative or qualitative epistemological approach. This trend remains unchanged (Figure 5). Figure 6 also illustrated that a similar proportion of studies in both Chinese (n = 31, 2%) and English (n = 3, 5.2%) were experimental and involved a comparison/reference group, which will be reviewed in Phase III.

Total number of studies by year by language.

Percent of empirical studies by year by language.

Percent of studies by methodological approach and language.
Discussion
The second trend that we observed in the reviewed publications is the disparity between Chinese and English articles in terms of quantity, as well as their epistemological approach to researching bilingual education. Kirkpatrick (2011) expressed a concern that the focus on foreign language as an MOI is inevitably accompanied by the requirement of disseminating knowledge in that foreign language. However, his concern was not supported by our systematic review; except for a volume of work most recently published in English (see Zhao & Dixon’s edited book, 2017, and two studies in a special issue edited by A. Gao in the International Journal of Bilingual Education and Bilingualism, 2019), the number of English articles does not accommodate non-Chinese scholars’ increasing interest in, and demand for, an understanding of bilingual education in Chinese higher education. At the time when Baker (2007) stated, “much is known about bilingual education in North America and in Western Europe. The world knows very little about bilingual education in China” (p. vii), there were four English articles related to bilingual education (Figure 6). After a decade of research scholarship, a global systematic review only identified three studies in Chinese universities (see Macaro et al., 2018). We observed the same pattern in our own review, mainly due to the lack of dissemination of the English language outside the Chinese community (although there was a preponderance of literature published in Chinese).
The language of publication differs not only in quantity but also in the research paradigm. More specifically, there is a consistently reported paucity of empirical or data-based studies in Chinese (e.g., 7% in J. Liu et al., 2006; 7% in D. Zheng & Dai, 2013). This may be explained by the holistic and dialogical thinking of “me” in Chinese, compared with the Western tradition that detaches the authors to a third-person stance, and seeks the analytic investigation prevalent in Western academia (Tierney & Kan, 2016; Y. Zhao et al., 2008). We agree with Kirkpatrick’s (2011) recommendation that the dissemination of knowledge and scholarship should maintain a balance between local language and English, and that bilingual journals should be established in colleges and universities. At the very least, English abstracts should be searchable and made available to an international audience. To this end, our synthesis also increases awareness of the lack of Asian studies on this subject and addresses the criticism on the overrepresentation of American studies in major educational and psychological journals. Another observation worth mentioning is that the English articles included in this review were all authored by native English speakers or Chinese scholars who received doctoral or postdoctoral training in Western countries. The nature of bilingual education relies on research that can generate practical evidence and, therefore, identifies a need for more empirical studies with rigorous designs that can contribute to “a solid knowledge base for policymaking” (X. Gao & Wang, 2017, p. 228). We speculate that this epistemological form of inquiry will be realized as more Chinese scholars with overseas credentials and Western research dispositions return. As many higher institutions in China increase their efforts to recruit these scholars, as well as the internationalization of higher education more generally (Hughes, 2008; Tierney & Kan, 2016), we expect a considerable volume of work on the topic to appear in English outlets.
Phase II: What Was Explored
In this section, we examined 301 empirical studies to answer Research Question 3 “What are students’ attitudes/perception toward bilingual education, and what are their perceived challenges?” as well as Research Question 4 “What form of bilingual education is most popular?” A discussion follows each question.
Students’ attitudes/perceptions and perceived challenges in bilingual education (Q4)
Among all 301 data-driven studies, 123 (92 Chinese articles, 6%; 31 English articles, 52%) addressed the topic of Chinese college students’ attitudes toward bilingual education. Responses were expectedly and predominantly in favor of bilingual education; however, further investigation revealed a few important issues. First, students perceived learning a content area in English as challenging, due to the highly specialized vocabulary of the discipline (e.g., B. Peng, 2016; W. Yu et al., 2016). Second, students also expressed concerns that their English proficiency played a critical role in the success of learning their subject in English (e.g., G. Hu & Lei, 2014; J. Li & Zhang, 2016; X. Xiao et al., 2011), and there was a lack of opportunity to enhance communicative competency in English (W. Wang & Curdt-Christiansen, 2019). A third issue was associated with five studies (Bolton & Botha, 2015; L. Guo et al., 2016; G. Hu & Lei, 2014; Ouyang & Gao, 2016; P. Wang et al., 2016) that reported contradictory responses from (a) architecture and chemistry majors who expressed little interest in bilingual course content and (b) medical students who did not believe that participating in the course had facilitated learning academic English or improved their general English skills. These findings reveal that students’ attitudes were associated with the challenge of quality bilingual programs (i.e., students’ English proficiency and instructors’ qualifications), which takes us to the next section.
Students’ English proficiency
Among the 301 data-driven studies, 170 (128 Chinese articles, 8%; 42 English articles, 70%) addressed the topic of college students’ English proficiency. These studies suggested that students’ English proficiency was a determinant of the quality of bilingual education (e.g., L. Guo et al., 2011; N. Wang & Du, 2012; L. Yu & Han, 2011). In addition, this body of literature pointed to a disconnect between students’ general English proficiency and their academic English in the specific areas of content (J. Li & Zhang, 2016; T. Wang, 2015; X. Zhang et al., 2015), as well as a great variation among students’ English proficiency (Z. Wang, 2016). Some researchers reported that students had limited listening and oral skills in English (e.g., X. Chen, Lv, et al., 2016; W. Yang, 2016). Therefore, to benefit the most from bilingual courses, students were recommended to demonstrate an initial threshold (e.g., J. Han & Yu, 2007; L. Yu & Han, 2011), which was normally measured by two nationally standardized assessments (i.e., College English Test-Band 4 [CET-4] and College Entrance Exam-English test). CET-4 is mandatory for all non-English majors to test their general English ability in listening, speaking, reading comprehension, and writing (Y. Yang & Qian, 2017), and is usually taken in the second semester of the sophomore year (J. Xu & Fan, 2017). Students who pass CET-4 are considered to have mastered a sufficient amount of language (i.e., 4, 500 words and 700 phrases, MOE, 2004) to participate in a bilingual program (J. X. Han, 2009; J. Jiang, 2004; X. Li et al., 2009; Z. Wu et al., 2017). Regarding the College Entrance Exam, G. Hu et al. (2014) proposed a cutoff score of 120 (80%) as an eligibility criterion for program participation. Their proposal was echoed in an earlier empirical study in which 80% on CET 4 (or 60% on CET 6, an advanced English level) was associated with a solid linguistic repertoire that is deemed appropriate for learning content in English (J. Han & Yu, 2007).
Instructors’ English proficiency
Another finding derived from a survey of students’ perception concerns bilingual instructors’ qualifications, which was the main focus of almost all the English studies (n = 27, 90%), and a little over a quarter of Chinese studies (n = 84, 31%). Strong recommendations were presented in these studies to enhance instructors’ content area knowledge and skills in English, including (a) ongoing workshops/training on native pronunciation and communication, and pedagogy in teaching bilingual courses (e.g., Z. Chen & Goh, 2014; Y. Feng, 2009; G. Hu & Duan, 2019; Yin & Chen, 2016); (b) internal collaboration with faculty in foreign language/linguistic departments to form team teaching (e.g., Bi, 2011; K. Feng, 2016; L. Jiang, Zhang, & May, 2016; H. Li, 2011; Yan, 2016), or external/vertical collaboration with bilingual instructors from tier-one institutions in the same discipline (L. Zhao et al., 2010); (c) international experiences in English-speaking academic settings, such as attending professional conferences (Z. Chen & Goh, 2014; Du & Zhao, 2013; Y. Fang, 2009; Hou & Hu, 2014; C. Wu, 2011; K. Xiao, 2013); and (d) direct recruitment of candidates with advanced degrees from these institutions (e.g., A. Feng et al., 2017; Y. Huang, 2006; J. Li et al., 2016; B. Liu et al., 2016; Peng, 2016; Yin & Chen, 2016).
Discussion
Findings from this phase suggest that notwithstanding a large number of survey research, no detailed process was adopted by higher institutions in consulting and engaging their faculty and/or students, who are stakeholders that are directly affected by this educational movement. This finding echoes the current trend in bilingual education worldwide (Macaro et al., 2018). Moreover, despite the generally positive attitudes toward bilingual education identified among college students majoring in diverse disciplines, their increased awareness of the challenges associated with offering quality bilingual courses was consistently reported in previous reviews (e.g., F. G. Fang, 2018; D. Zheng & Dai, 2013; Zhu & Yue, 2010).
First, the shortage of qualified instructors has become a major roadblock for the successful continuation and expansion of bilingual education in Chinese universities (Cheng, 2017). Nevertheless, most of the empirical studies in our review were conducted through survey research, capturing students’ perspectives. Few studies directly addressed the best practices for improving pedagogy (e.g., overseas training in Cheng, 2017; training in the use of interactional/high-cognitive strategy, G. Hu & Duan, 2019) or provided a profile of the instructor’s professional background when the effect of bilingual education was examined (e.g., Tong & Shi, 2012). While teaching abroad may be a strong desire for bilingual teachers (Werther et al., 2014), the “effectiveness of a borrowed idea, practice or innovation depends crucially on its appropriateness for the specific, local, and dynamic reality of teaching and learning in a particular educational context” (G. Hu, 2009, p. 131). Therefore, an overseas training program is not the solution, but rather a first step in building up a support mechanism for bilingual instructors’ professional development, which is an ongoing process (Cheng, 2017; E. Zhou & Ding, 2012) that requires significant resources (Macaro et al., 2018). Equally important is the belief that for these instructors to be agents of change, they need to feel a sense of entitlement in this educational investment, rather than a passive role of participation. More evidence-based research can further such an understanding and reflect practice before a high-quality bilingual course is offered, for the purpose of maximizing student learning.
Second, in regard to students’ English proficiency, we raise concerns that were partly due to there being no clear definition of English proficiency. To be more specific, a large proportion of the studies cited a perceived improvement in students’ English proficiency as a great benefit of bilingual education (e.g., J. Li et al., 2016) without psychometrically sound instruments to measure such proficiency. Although very few studies showed a high passing rate of CET-4 among bilingual participants (e.g., Ma et al., 2016), there was no mention of the rate among nonbilingual participants, or their initial English levels prior to participation. In addition, little evidence regarding CET being indicative of higher academic achievement exists. As a result of these limitations, the bulk of the literature reviewed in this study failed to contribute to the discussion on the effectiveness of bilingual education. Instead, the literature speaks to a timely pursuit in defining and evaluating English proficiency so as to address the question of whether participation in bilingual education can truly improve students’ English competence, as proposed by Macaro et al. (2018) in their global systematic review. We want to remind the reader that scholarly attention should not only be allocated to English proficiency as a gatekeeper of bilingual education; what is more vital and beneficial, we argue, is to conduct research on how to provide English support and integrate English into curriculums, so that students can continue developing their academic English proficiency and thus be prepared for content taught in English.
On a different note, although not a specific focus of this review, we found that the vast majority of Chinese articles published with content area instructors being the lead authors (reflecting on their practices) included neither a coauthor that had expertise in second language acquisition and pedagogy nor one that was trained in research methodology. More than a decade later, the recommendation of H. Xu (2008) and D. Zheng and Dai (2013) that collaboration should occur not only between subject and language specialists but also between practitioners and researchers in bilingual education is yet to be realized.
Bilingual program models (Q4)
Among all empirical articles, only 40 (13%) mentioned the types of bilingual models that were implemented, with the majority being immersion (n = 33, 85%; e.g., Lian et al., 2011; H. Zhang, 2012) and a small proportion being transition (n = 4, 10%; e.g., Ma & Liao, 2013) or maintenance models (n = 3, 5%, e.g., Y. Zhao, Yang, et al., 2007). A similar pattern was observed among the larger pool in Phase I, where the immersion bilingual model turned out to be the most popular (71%). However, according to Y. Li (2012), in some immersion programs, Chinese accounted for at least 50% of the language of instruction. Such a mismatch between program label and actual implementation led us to further explore classroom deliveries. Among the 301 empirical studies reviewed, we identified an emerging trend with three Chinese studies (i.e., L. Guo & Wang, 2017; L. He et al., 2016; Kang, 2015) and eight English studies (i.e., Chang, 2017; G. Hu & Duan, 2019; G. Hu & Li, 2017; A. L. Jiang & Zhang, 2017; L. Jiang, Zhang, & May, 2016; Lei & Hu, 2014; Tong & Tang, 2017; W. Wang & Curdt-Christiansen, 2019; X. Yang, 2017) that described pedagogical occurrences in bilingual classrooms, effectively enriching our understanding of bilingual education in practice. For example, some studies reported instructors’ inadequate use of higher order thinking questions that have been proven to promote bilingual students’ academic English language in ESL settings (e.g., G. Hu & Duan, 2019; G. Hu & Li, 2017; Tong & Tang, 2017).
Discussion
Findings regarding bilingual program models suggest that specifications of program models are far from adequate. Despite the popularity of the immersion model among the small percentage of studies that discussed models of implementation, S. Zhang (2015) strongly promoted the transitional bilingual model that takes into consideration the challenge of the authentic English language environment, instructors’ qualification, and instructional material. Some researchers suggested a combination of language of distribution, for example, having 30% in English (X. Liu et al., 2012; M. Lu & Ma, 2016). However, according to G. Hu’s (2008) criticism, these terminologies on program models (i.e., immersion, transitional) are misaligned with the international literature on bilingual education. This is because there are fundamental differences in sociocultural, educational, linguistic, political, economic, and historical contexts between China and the countries where these models originated (Q. Qu, 2015; Tong & Shi, 2012). Taking a transitional bilingual model as an example, the concept was imported from North America, where the language of instruction transitioned from a minority language to majority language; this contrasts the transition program in Chinese higher education, which aims to use Chinese (majority language) as a bridge to English (minority language), for the purpose of developing students’ English proficiency in an academic context (P. Wang, 2017). Due to such a distinction, the exclusive use of English became a disservice to bilingual/multilingual students, particularly in a context where much more information is available in L1. These activities are dangerous in that they contribute to a form of linguistic hegemony that can be disruptive to the ecology of a local language (F. G. Fang, 2018; Kirkpatrick, 2014; D. Li, 2013).
A substantial distinction has been uncovered between what a program is labeled as and what is actually practiced, not only in an English-speaking setting (Irby et al., 2007) but also in China (Y. Li, 2012) and other Eastern countries (Barnard & McLellan, 2014). Although our review points out such a distinction, research in this area is still scarce. Without more information on observed practices, program evaluation stands on no ground. Therefore, it is imperative to objectively capture pedagogical practices in bilingual classrooms (H. Guo et al., 2018; Tong, Luo, et al., 2017). We urge Chinese academics to purposefully reconsider a framework with appropriate designation or variation of forms of bilingual education that is analogous to terms widely known to English audiences; more importantly, such a framework ought to accommodate the needs of stakeholders (i.e., students and instructors) and fits into the nativized landscape of Chinese higher education.
Relatedly, H. Guo et al. (2018) reasoned that the yet-to-be proved effectiveness of bilingual education in China is due to “a lack of a commonly adopted, comprehensive evaluation framework that draws from, and is informed by, empirical evidence produced through quality research” (p. 13). We assert that a localized bilingual education theory with model specifications can significantly contribute to guiding data-driven research in Chinese higher education institutions.
Phase III: How Was It Studied
From a methodological perspective and the review in Phase I, we found that 18.7% of studies were data driven. It is also worth mentioning that 73% (n = 203) of the empirical studies in Phase II involved survey research with mostly researcher-developed measures based on authors’ experiences or adapted from existing instruments. In general, there was a lack of information on psychometric properties, such as the reliability and validity of survey instruments that were used to collect the data. Among these survey articles, only nine reported reliability, including five in English (i.e., M. Li, 2017; Tong et al., 2017; Tong & Shi, 2012; Wei et al., 2017; Xu, 2017) and four in Chinese (i.e., M. Lu & Ma, 2016; Wan et al., 2015; H. Zhang & Zhang, 2011; X. Zhang et al., 2015), with two using a structural equation modeling approach in which reliability is conventionally calculated as part of the statistical model. In this section, we continue our review of the methodological characteristics of 34 comparative studies in Phase III following H. Cooper’s (2016) elements: (a) type of assignment; (b) baseline equivalence; (c) threats to internal validity (e.g., confounding, selection bias); and (d) type, language, and evidence of validity/reliability of an outcome measure. Detailed coding of these elements was demonstrated in Table 2.
Coding Sheet of 34 Studies in Phase III Review.
CET = College English Test; RCT = randomized controlled trial; QED = quasi-experimental design; PBL = project-based learning.
RCT/QED that reported statistically positive effect in both English and content area (n = 2).
RCT/QED that reported statistically positive effect in either English (n = 2) or content area (n = 6).
Random assignment and baseline equivalence
After careful scrutiny, we identified only 10 studies (nine in the field of medicine and one in business) that were randomized controlled trials (RCTs), and only three studies that had intact class as a unit of assignment (i.e., Y. He et al., 2018; Sha et al., 2014; Shi et al., 2016). The other seven randomly assigned students to either bilingual or monolingual instruction (i.e., L. He et al., 2016; A. Liu, 2019; Long et al., 2019; Mi, 2018; Xing et al., 2012; Yuan, 2016; X. Zhao et al., 2016). Yuan (2016), for instance, applied a block randomized design strategy based on students’ test scores on the content areas which were first divided into five categories from levels A to E. Within each category, students were then randomly assigned into bilingual or monolingual Chinese classes. In the remaining 24 articles, random assignment was either falsely claimed (i.e., Sun & Xiao, 2006; G. Zhang, 2012) or unclaimed (e.g., Z. Liu, Luo, & Han, 2012). For example, G. Zhang (2012) randomly selected one class (from a total of six) to receive bilingual instruction, and another class to receive Chinese-only instruction.
In addition to this, when random assignment did not occur, an examination of initial equivalence was required to ensure the comparability of the two groups from the outset (Campbell & Stanley, 2015). However, only three quasi-experimental designs (QEDs) reported the baseline of participants’ gender distribution, age, and English proficiency (i.e., J. X. Han, 2009; Lei & Hu, 2014; G. Zhang, 2012), in which J. X. Han’s (2009) study was quantitative with 274 participants (137 in treatment and 137 in control condition). The author conducted an independent t test on participants’ English language proficiency measured by a university-level English placement test that was administered during the first week of the semester. No statistically significant difference was found between the two groups. Although there were another five articles that compared age, gender, English proficiency, and attitudes between bilingual and monolingual classes, no descriptive or inferential statistics were presented to support their statements (e.g., N. Zhang et al., 2012).
Evidence of validity/reliability and types of outcome measures
Among 34 comparative studies, there were three RCTs (i.e., L. He et al., 2016; A. Liu, 2019; Shi, Chen, et al., 2016) that measured participants’ outcomes in both English and content knowledge at the end of the program. Other outcomes included participants’ satisfaction (i.e., L. He et al., 2016; L. Zhang, 2016), anxiety, confidence, interest, learning initiative, memory (i.e., Zhan et al., 2016), interest (e.g., L. Chen et al., 2016; G. Zhang, 2012), and self-efficacy, motivation, and metacognition (i.e., Shi et al., 2016). These outcomes were all collected through self-reported instruments in the respective studies, in which the researchers failed to provide psychometric evidence.
Based on the previously mentioned outcomes in Table 2, a total of 20 studies demonstrated a significant difference in favor of bilingual programs in English, specific subjects, or affective domains, including eight RCTs and two QEDs that all came from the medical science field (e.g., anesthesiology, nephrology, and physiology in Chinese medicine), except for one that was in the field of math. A closer examination of Table 2 reveals that among the 10 methodologically sound studies of RCTs and QEDs (i.e., L. He et al., 2016; G. Zhang, 2012), two found statistically significant positive outcomes in both English and content knowledge. The other eight reported a positive effect of bilingual courses in terms of students’ performance either on their specific subject (n = 6) or in English (n = 2).
Finally, in studies that failed to detect a statistically significant difference in the content area, interpretations were formed from a contrasting perspective. J. X. Han’s (2009) QED concluded that bilingual instruction was equally as effective as Chinese-only instruction in supporting students’ academic achievement in mathematics. Lei and Hu (2014) concluded with an undetermined quality of the focal program, despite bilingual students’ overall satisfaction of the program. As was mentioned above, a serious weakness in Lei and Hu’s study is the initial inequivalence between the two groups of students, which raises questions about the comparability, and leads to an unfavorable conclusion of bilingual education.
Discussion
We now turn to a discussion on the comparative studies reviewed in this article. First, G. Hu and Li (2017) summarized that there was virtually no empirical investigation that involved a comparison group of monolingual Chinese instruction to address the effect of bilingual education on students’ English proficiency and academic outcome. Their concern was supported in our comprehensive review. Despite a substantial amount of work, research over the course of nearly two decades has only produced a total of 34 studies that compared bilingual education with a monolingual, Chinese-only approach. Furthermore, only 13 articles attempted a randomized technique at the student/class level or identified comparable counterparts, which are the most rigorous designs for testing causality (Campbell & Stanley, 2015). Unfortunately, our in-depth review of these studies revealed recurring methodological issues, such as nonrandom assignment, group incomparability, missing information, a lack of statistical control for baseline inequivalence on participants’ knowledge and skills in the subject, or a lack of information on its implementation. These flawed approaches compromised the nature of internal validity, one of the most critical elements in experimental design (as it is associated with random assignment and, thus, causality; Campbell & Stanley, 2015; Coleman, 2018), which consequently undermined the credibility of the findings.
Second, it is not surprising that the two most commonly examined outcomes were English language proficiency measured by CET, or other English tests and grades on content knowledge (measured by instructor-developed, nonstandardized instruments). However, problems still exist. For example, the studies that attend to both outcomes are scarce, which is problematic as the ultimate goal of the Chinese government is to prepare people with both a strong communicative ability in English and knowledge and skills in their respective subject area. The six studies that reached a certain level of consensus on the positive outcome in the medical science disciplines were overshadowed by abundant, nonempirical research, which corresponded to the conclusions of existing reviews presented earlier in this article (e.g., F. G. Fang, 2018; D. Zheng & Dai, 2013). What is more, none of the 34 studies reported any reliability (e.g., internal consistency) or validity (e.g., construct validity) indicators of the measures used for comparison. Although CET is nationally normed with strong psychometrics (College English Test Band 4 and Band 6, 2018), no information regarding the sample was presented in these studies. There is a common understanding that reliability and validity are critical psychometric features of an instrument, and their findings inform the professional community, as well as the policy-makers that make high-stake decisions (Gitomer et al., 2019). The lack of such information hinders meaningful interpretation of the results, rendering them inadequate and unconvincing. We agree with Lei and Hu’s (2014) recommendation that more discipline-specific measures of English should be developed and validated.
Third, after a long debate in the United States, the positive effect of bilingual education among young children has been documented in quality research and acknowledged by researchers and practitioners (Irby et al., 2010; Lindholm-Leary, 2016) through a well-controlled, randomized design with a high level of implementation fidelity. This is not the case when it comes to an inquiry into the effectiveness of bilingual education in China. Studies with cognitive/affective domains (such as self-identity, self-efficacy, learning anxiety, and learning motivation) are rarely conducted, which corresponds to the previous reviews by H. Xu (2008) and D. Zheng and Dai (2013). These constructs are expected to affect the quality of bilingual education (D. Zheng & Dai, 2013) and, thus, deserve comprehensive exploration.
These findings, however, not only resonate with previous reviews in the context of mainland China (i.e., Fan, 2014) and Hong Kong (Lo & Lo, 2014), but are also applicable to academic discourse of bilingual education worldwide (Macaro et al., 2018). The aforementioned issues have significantly hindered the establishment of a causal effect relationship that is typically derived from rigorous randomized controls to address the impact of bilingual education; this suggests a need for more scientific exploration before any research synthesis that involves statistical approaches (such as best-evidence and meta-analysis) can be undertaken to quantify the effectiveness of bilingual programs in Chinese higher education. Conducting experimental research in bilingual education is challenging (X. Gao & Wang, 2017; G. Hu & Lei, 2014); nevertheless, it is only through solid design and evaluation that research scholarship can be enriched with compelling evidence to address the ultimate question: is bilingual education effective?
Recommendations and Conclusion
From the insight provided by this systematic review, we suggest that even after almost two decades of research and practice in bilingual education in Chinese higher education, there is still a dearth of strong evidence from a contextualized body of research that can attest to the effectiveness of bilingual education as a result of ideological and epistemological orientation, as well as a lack of rigorous research design and implementation. We believe that the distorted academic discourse elaborated in G. Hu’s (2008) study a decade ago, that was rife with misunderstanding, misrepresentation, and misinterpretation, is partly due to this. The well-intended and far-reaching policy provision of bilingual education has not resulted in significant and favorable conclusions. However, before such a definitive and convincing statement can be made, we insist that scholarly attention should continue to revolve around the quality and implementation of bilingual programs with the following recommendations, which are derived from our findings and supported by the existing body of literature:
An instruction/curriculum/evaluation team to be formed including a content specialist, language teacher, and researcher (Fan, 2014; H. Xu, 2008; D. Zheng & Dai, 2013; Zhu & Yu, 2010);
Examination of local policy and resource allocation that has the potential to shape educational practices;
A scientific program evaluation framework to be established and reinforced (H. Guo et al., 2018);
Observational research on the interplay of the distribution of two languages, content of each language, and instructor–student interaction (H. Guo et al., 2018; G. Hu & Duan, 2019; Macaro et al., 2018; W. Wang & Curdt-Christiansen, 2019);
Large-scale empirical/data-based research and longitudinal (more than 1 year) investigations that address linguistic, academic, and affective outcomes (such as self-identity, learning motivation, learning strategy, attitudes; D. Zheng & Dai, 2013);
Well-controlled experimental studies comparing bilingual course with L1-instructed courses, controlling for learners’ initial level of content area knowledge, as well as access to instructional and learning material outside the classroom (or socioeconomic status; X. Gao & Wang, 2017);
Inclusion of psychometrically sound instruments in survey research and standardized outcome measures, the result of which can be compared across studies (Bray et al., 2014; Macaro et al., 2018);
Dissemination in English to reach a broader audience of researchers and practitioners who are interested in bilingual education through an international lens that promotes scholarly exchange.
To conclude, to the best of our knowledge, this study is the first to systematically synthesize research studies (published in both English and Chinese) of postsecondary bilingual education and provides ample evidence of students’ learning in mainland China, which has 3.5 million English learners. Results from this review can, in turn, shed light on the debate of the educational benefits of bilingual education in mainland Chinese society, the scholarly community, and the government’s formulation of bilingual education policies in the coming decades. In addition, by scrutinizing the academic discourse in bilingual education, this study may provide policy implications and suggestions on how to put L2 learning and bilingual education programs into practice in a global context. Again, it is worth mentioning that this article is not intended to be viewed as an advocation for bilingual education; more rigorous research that can generate credible evidence speaking to the effectiveness of bilingual education may emerge elsewhere. Instead, this article responds to an urgent need to substantively highlight a picture of the scholarly trajectory to guide further research in this top-down educational movement that continues to increase its presence, momentum, and inevitability with unquestioned institutionalized policies and practices.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
