Abstract
This study tests how initial English proficiency (Gaokao English), gender, province of origin, and semester timing relate to undergraduate GPA at a Sino-foreign English-medium instruction (EMI) university in China. Design/Methodology/Approach: We conducted an exploratory analysis of institutional records from 2019 to 2022 (≈4,000 undergraduates; 16,281 GPAs), modelling GPA as a function of Gaokao English scores, gender, province, and semester (spring vs autumn), and applying regression and non-parametric tests with effect sizes. Findings: All focal variables showed statistically detectable but small associations with GPA, indicating theoretical rather than practical salience. Females earned higher GPAs than males; GPAs differed by province yet not in line with regional economic levels; and spring GPAs modestly exceeded fall GPAs. The English proficiency–GPA association was positive but weak and diminished across years of study. Research limitations and implications: Evidence derives from a single institution and one EMI context, with pandemic-period semesters and limited covariates; results invite multi-institution replication and richer controls. Practical implications: Given the small effects, proportionate, program-level supports are indicated (e.g. early-semester academic-language scaffolding, light-touch diagnostics) rather than high-stakes screening. The patterns can inform resource allocation and the timing of supports. Originality/value: Using a large multi-year dataset from a Chinese EMI university, the study isolates language-of-instruction influences from cultural relocation and examines how English proficiency and background factors jointly relate to achievement, filling a gap on EMI outcomes in non-Western contexts and providing evidence to guide English-medium higher-education practice.
Plain Language Summary
This study looked at what relates to grades at a Chinese university that teaches in English. We focused on four things: students’ English level when they enter (their Gaokao English score), gender, home province, and whether classes are taken in spring or fall. We analysed routine records from 2019–2022 for about 4,000 students (over 16,000 GPAs) and asked how much each factor is linked to GPA. The links were real but small. Students with higher entry English tended to get slightly better grades at first, and that advantage faded as they adapted to studying in English. Female students earned higher GPAs than male students across most subjects. Grades differed by province, but not simply in line with regional wealth. GPAs were a little higher in spring than in fall, which may reflect adjustment after the long summer break or other calendar effects. Because these differences are modest, the best actions are low-stakes and program-level: short, early-semester help with academic English and study skills, light diagnostics to spot students who might benefit from support, and attention to transitions between terms. The results come from one institution during years affected by COVID-19, so they should be tested elsewhere with richer information about students’ backgrounds. Still, the patterns offer a clearer view of how entry English, background, and timing relate to success in an English-medium university without the added challenge of moving to a new country.
Keywords
Introduction
Among the many quantifiable indicators of student success in higher education, GPA is one of the most widely used measures. A previous report emphasise pre-college factors (e.g. demographics, academic preparation) alongside college experiences and post-college outcomes in a broad student success framework (Kuh et al., 2006). Prior research has shown that various background factors influence educational achievement. Socioeconomic status (SES) and family background correlate with student performance (e.g. parental education), such that students from more educated or higher-income families tend to achieve higher grades. In the People’s Republic of China (PRC), regional economic disparities mean educational resources and quality vary by province, and students from urban, wealthier regions often outperform those from rural areas. For instance, regional SES differences have been linked to achievement gaps. Similarly, gender differences in academic performance are well documented. Female students have often been found to earn higher grades overall (including in humanities), whereas male students sometimes excel in specific fields like mathematics and engineering. However, these trends can vary: some studies report females outperforming males in general GPA, while others find male advantages in specific domains.
External environmental factors can also impact achievement. For example, seasonal timing has been associated with performance differences: some research noted students perform better in spring than in autumn, possibly due to health or behavioural patterns. Seasonal illnesses or post-holiday adjustment periods might affect learning in fall semesters. The winter of 2019 to 2020 introduced an extreme seasonal disruption with the COVID-19 pandemic, which caused one of the most significant educational disruptions in recent history. Such disruptions can confound typical seasonal performance patterns (Pokhrel & Chhetri, 2021).
Learning in a second language environment poses additional challenges for students whose first language is not the medium of instruction. From a sociocultural perspective (Vygotsky, 1978), language learning and academic success are contextually mediated social processes. Students studying in a foreign language often experience linguistic and cultural hurdles that native-language peers do not. Research on international students has long documented stressors related to language barriers. Difficulties in understanding lectures, expressing ideas in the second language, and writing academically can increase cognitive load and impede learning. These challenges can lower academic performance: for example, non-native learners have been found to score roughly 9% to 10% lower on assessments than native-language peers, on average. Distinction between basic interpersonal communicative skills (BICS) and cognitive academic language proficiency (CALP) is instructive here (Cummins, 1979). Students may have conversational English fluency (BICS) yet still struggle with the more demanding academic English skills (CALP) required for university study (Cummins, 2008). Under this framework, second-language students might perform worse academically until their CALP develops. Indeed, language barriers can reduce meaningful class participation and integration. Studies have observed that language-related stress is a key issue among international students in non-native environments (Ali et al., 2020; Chen, 1999; Nazir & Özçiçek, 2022). Writing in a second language often leads to grammatical and expressive compromises, affecting the quality of assignments. These findings highlight the need to understand how studying in a non-native language impacts academic outcomes beyond general cultural adaptation issues.
Research in English-medium instruction (EMI) contexts has yielded mixed results (Polat et al., 2024). Some studies found no significant differences in content learning outcomes between EMI students and those taught in their native language (i.e. no penalty for learning in English). For example, a study in Spain reported no statistical difference in academic achievement between students in EMI programs and those in Spanish-medium programs (Arroyo-Barrigüete et al., 2022). Other studies have found that English language proficiency correlates significantly with academic success in EMI programs. In a Turkish university context, students’ English test scores or self-rated proficiency predicted their GPA and course performance (Budi & Panmei, 2021; Lee et al., 2025; O’Dwyer et al., 2018; Schoepp, 2018). For instance, higher IELTS scores have been linked to higher university GPAs in EMI settings (Budi & Panmei, 2021; Schoepp, 2018). These discrepancies suggest that the impact of second-language proficiency may depend on context and how academic success is measured. It underscores the importance of examining second-language effects on performance while accounting for other factors.
In essence, all university-bound students take the National College Entrance Examination (Gaokao) in Mainland China. English is a tested subject, providing a standardised measure of English proficiency at university entry. The Gaokao system has regional variations: most provinces administer their own version of the exam, adapted to local curricula. The exam is high-stakes and comprehensive, covering Chinese, Mathematics, English (or another foreign language), and electives. Higher Gaokao scores facilitate admission into better-ranked universities and are believed to lead to better career outcomes. The English Gaokao score for each student in our study offers a quantifiable proxy of their English proficiency upon entering the university. It allows us to investigate how this initial English proficiency (a form of CALP at entry) relates to subsequent academic performance (university GPA). Notably, the Gaokao is taken in the student’s home country, so unlike TOEFL/IELTS for international study, it reflects English ability developed in the Chinese school context.
Wenzhou Kean University (WKU) is a Sino-foreign cooperative university in China that offers American-style higher education with English as the medium of instruction. It implements the curriculum and academic standards of its partner institution in the United States, including a 4.0 GPA grading system. WKU provides a unique context for studying second-language academic performance. Students are Chinese and remain in China (thus, their social/cultural environment is mainly Chinese), but all coursework is delivered in English by an international faculty following an American curriculum. This context minimises typical confounding variables present when studying international students abroad, such as culture shock or adaptation to a new country, especially since students operate in their native cultural environment while learning in a second language. In other words, the setting allows us to isolate the impact of English language proficiency on academic performance without the full spectrum of challenges students overseas face. Students’ English Gaokao scores serve as a baseline proficiency measure, and their GPAs over time indicate academic achievement in an English-medium setting. This scenario aligns with a sociocultural and academic literacies perspective: students must acquire the academic literacy practices of an American university (e.g. critical thinking, academic writing in English) while still in their home culture. Developing this academic literacy in a second language is a known challenge (Lea & Street, 1998), making WKU an informative case for research.
Previous studies on second-language learners often involve students studying abroad, where language challenges intertwine with adapting to a new culture. Our study removes the intercultural adaptation factor, focusing purely on the language of instruction within a constant cultural setting. While research on educational inequality in China examines primary and secondary education, fewer studies have explored how provincial origin (and associated educational backgrounds) affect performance in an English-medium higher education context. We aim to address how students from different provinces fare in the same EMI university. Furthermore, prior research on gender differences in academic performance often focus on specific majors or single cohorts (e.g. engineering students or one graduating class). By analysing a comprehensive dataset across all majors and four academic years, including the period impacted by COVID-19, we provide a broader assessment of gender effects in an EMI environment. Finally, the literature on seasonal academic performance differences (“summer learning loss” or term timing effects) has not been examined within a Chinese university context, especially under an EMI curriculum. Our study covers multiple fall and spring semesters, including those during the pandemic when academic schedules and student experiences were disrupted, offering new insight into seasonal performance variation in higher education.
Against this backdrop of American-style education implemented in a Chinese context, we examine how initial English proficiency, geographic background, biological sex, and academic semester timing influence university GPA. Understanding these factors is critical for the holistic development and support of students in EMI programs. For this, we formulated four hypotheses:
Hypothesis 1: Students’ English proficiency at entry (measured by Gaokao English score) can predict their tertiary GPA in the EMI university setting.
Hypothesis 2: After accounting for English proficiency, GPA differences exist among students from different provinces.
Hypothesis 3: There are differences in GPA between male and female students.
Hypothesis 4: Students’ GPA differs between spring and fall semesters.
Methodology
This study utilised a quantitative ex-post facto (quasi-experimental) design to analyse how pre-college factors and term timing relate to academic performance. The independent variables were English proficiency (Gaokao English score), home province, biological sex, and semester (spring vs. fall). The dependent variable was students’ GPA at the university. We retrospectively analysed student records from WKU over 3 years (Fall 2019 through Spring 2022). All data were obtained from the university’s registrar database for routine institutional data collection. The dataset included each student’s GPA for each term, their Gaokao English and total scores, biological sex, home province, and semester/academic year. The study was exempt from additional ethics review because the data were pre-existing, de-identified, and collected as part of normal educational administration.
Testing Samples
The sample consisted of undergraduate student records spanning eight main semesters (Fall 2019 to Spring 2022). We analysed N = 16,281 GPA entries from approximately 4,000 individual students. Of these records, 96.4% were from students who entered WKU via the Gaokao (national exam), while the remaining 3.6% were from students with other entrance exams (e.g. from autonomous regions or international curricula). We included only PRC domestic students in the analysis to maintain a consistent educational background. The sample represented students across all four-year levels of study and all four colleges at WKU. Specifically, 6,836 GPA records were from male students and 9,445 from female students (about 42% male vs. 58% female). These records encompassed the College of Business and Public Management (CBPM), the College of Liberal Arts (CLA), the College of Architecture and Design (CAD), and the College of Science, Mathematics, and Technology (CSMT). The largest share of data points came from CBPM (54% of records), with smaller proportions from CLA (∼17%), CAD (∼11%), and CSMT (∼17%). Each student appeared multiple times in the dataset (once per semester enrolled), allowing us to observe performance over time. First-year students (Freshmen) contributed 5,413 records, Sophomores 4,321, Juniors 3,235, and Seniors 3,203 (our data still classified Students beyond fourth year as “Senior”). Table 1 summarises the sample distribution by college, sex, and academic year.
Biological Sex and College Distribution of Undergraduates.
CPBM = College of Business and Public Management; CLA = College of Liberal Arts; CAD = College of Architecture and Design; CSMT = College of Science, Mathematics and Technology.
Procedure
Student data were separated and analysed by relevant subgroup factors: year of study, semester (fall or spring), province of origin, sex, and Gaokao scores. We used IBM SPSS Statistics (Version 26.0) for all analyses. All hypothesis tests were two-tailed with an alpha level of .05. Preliminary checks for normality (Shapiro–Wilk test) and homogeneity of variance (Levene’s test) were conducted to determine the appropriate statistical tests.
For Hypothesis 1 on the relationship between English proficiency and GPA, we employed a linear regression analysis (ordinary least squares). The predictor was Gaokao English score, and the outcome was cumulative GPA. This analysis was initially performed on the full sample and then further examined with interaction terms to see if the relationship changed over the years of study. Gaokao English scores served as a pragmatic proxy for initial proficiency, though provincial format/difficulty/scaling differences introduce noise and may attenuate coefficients toward zero.
For Hypothesis 2, the effect of the home province, the GPA data violated normality and equal variance assumptions (Shapiro–Wilk and Levene p < .05). Therefore, instead of ANOVA, we used the non-parametric Kruskal–Wallis H test to compare GPA distributions across different provinces. We tested all students collectively, then conducted separate Kruskal–Wallis tests within each college to see if provincial differences were consistent across disciplines. We also performed a follow-up regression to explore whether provincial economic status was related to student GPA. Each province was assigned its Gross Domestic Product (GDP) per capita ranking as a proxy for development level, which we regressed against mean GPA. Effect sizes used
For Hypothesis 3 (sex differences) and Hypothesis 4 (semester differences), independent samples t-tests were unsuitable due to non-normal GPA distributions and unequal variances between groups. Thus, we used the Mann–Whitney U test (a non-parametric equivalent) to compare male versus female GPA distributions and fall versus spring GPA distributions. These comparisons were made for the overall sample and within each college to identify any interaction between sex or semester and college (major field). For Kruskal–Wallis tests, we report
Results
Table 1 details the composition of the dataset by college and sex. Below, we present each hypothesis’s findings and relevant test statistics. Overall, the data consisted of 16,281 GPA records from 2019 to 2022. Analyses revealed multiple significant effects of the investigated factors on GPA, although effect sizes were consistently small, and baseline model R2 values were near zero (e.g. ∼.02), indicating that the findings are theoretically informative but of limited practical significance.
Descriptive Statistics
Before hypothesis testing, we note some descriptive trends. The mean GPA across all students and semesters was approximately 3.30 on a 4.0 scale. Female students had a higher average GPA (around 3.45) than male students (around 3.12). Regarding English proficiency, the Gaokao English scores ranged widely (as expected from a national exam) but showed a slight positive association with GPA. By year in school, average GPAs tend to decrease slightly from freshman to senior year. The average GPA in fall semesters was about 3.28, compared to 3.34 in spring semesters, suggesting a modest overall difference by term.
Hypothesis 1: Effect of English Proficiency on GPA
We hypothesised that students’ English proficiency at the entrance (Gaokao English score) would predict their university GPA. A simple linear regression confirmed a statistically significant but weak positive relationship. Gaokao English score was an important predictor of GPA, F (1, 11966) = 26.6, p < .001, with an unstandardised coefficient B ≈ 0.004 (SE = 0.001). This indicates that for each one-point increase in English exam score, the predicted GPA increased by only 0.004 on a 4-point scale (holding other factors constant). The effect size was very small: R2 = .002, meaning English proficiency explained only about 0.2% of the variance in GPA. Thus, while Hypothesis 1 was supported, the practical impact of initial English proficiency on academic performance was minimal.
To explore whether the influence of English proficiency changed over time, we included an interaction term between Gaokao score and year of study in a regression model. The interaction was significant (p = .022), showing that the predictive power of English proficiency decreased as students progressed through university. Specifically, the coefficient for Gaokao English × Year was negative (B ≈ –0.002 per year). By the fourth year, English exam scores had little to no predictive value for GPA. After centring variables on addressing multicollinearity, the regression with the interaction term had R2 = .017 (1.7% variance explained). These results suggest that students with higher initial English proficiency earned slightly better grades in the first year. Still, this advantage diminished as all students adapted to the English-medium environment and improved their academic English. English proficiency at entry statistically significantly influenced GPA, particularly early on, but the effect was small and waned over time. While statistically reliable, the magnitude of this association was small, with minimal variance explained.
Hypothesis 2: Provincial Differences in Academic Performance
Hypothesis 2 posited that students from different provinces would have different GPA outcomes, even after accounting for their English proficiency. We first ensured that any such differences were not due to English proficiency disparities: Gaokao English scores were included as a covariate in our analysis (and, as noted, had only a minor effect). Using the Kruskal–Wallis H test, we found a significant effect of home province on GPA. When considering all student records together, the Kruskal–Wallis test was statistically significant (H = 333.353, df = 21 provinces, p < .001). This indicates that at least one province’s students had differing GPA distributions. We then stratified by college to see if these differences persisted within each field. Table 2 shows the Kruskal–Wallis results for the whole sample and for each college. All four colleges showed significant GPA differences among provinces (H values ranging from 79.938 to 204.054, all p < .001). The effect sizes (η2 based on H) were in the small range for all cases (η2 ≈ .02–.05), suggesting that while statistically reliable, provincial origin accounted for only a small fraction of GPA variance.
Kruskal–Wallis Test for Different Students’ GPAs Among Provinces.
To understand the pattern, Table 3 presents the average GPA (±standard error) of students from each province, overall and by college. We excluded provinces with very few students (e.g. Gansu, Jilin) from comparisons due to insufficient data. The data show that students from certain provinces consistently earned higher GPAs. For example, students from Shandong and Zhejiang provinces were among the top performers on average, whereas some other provinces had lower averages. Notably, students from Shanxi province (not to be confused with Shaanxi) had the highest overall mean GPA (around 3.53), although the sample from Shanxi was relatively small. In contrast, students from Beijing (very few at WKU) had a lower average GPA (∼2.49). Broadly, coastal provinces like Zhejiang and Shandong did well, but the pattern was not strictly geographic.
The Average GPA of Students From Each Province, Overall and by College.
Figure 1 provides a visual representation of these provincial GPA differences. Figure 1a–f maps the mean GPA of students from each province for the entire sample (Figure 1a) and for each of the four colleges (Figure 1b–e), with darker colours indicating higher GPA (maximum 4.0). The maps in Figure 1 show no clear geographic gradient from south to north or east (coastal) to west (inland) regarding GPA. In other words, academic performance did not simply mirror China’s well-known regional economic divides. We examined the relationship between the provincial economic level and student GPA to prove this. We took each province’s GDP per capita ranking as an indicator of its development and ran a linear regression against the mean GPA of students from that province. Surprisingly, this analysis revealed a significant but weak negative correlation: students from wealthier provinces tend to have slightly lower GPAs on average (unstandardised coefficient B ≈ –0.01 GPA per rank; R2 = .018, p < .001). In other words, coming from a province with a higher economic status was associated with a marginally lower academic performance at this university (the opposite of what one might expect) despite the very weak R2.

(a–f) Distribution of students’ mean GPA averages by province in the whole database and each college (Max GPA is 4). The intensity of the colour reflected the GPA level. (a) Enrollment of students by province in years 2019 to 2022. (b) Distribution of mean GPA of all students by province. (c) Distribution of mean GPA of students in CBPM by province. (d) Distribution of mean GPA of students in CLA by province. (e) Distribution of mean GPA of students in CAD by province. (f) Distribution of mean GPA of students in CMST by province.
Overall, the data supported Hypothesis 2: statistically significant GPA differences by province. However, the magnitude of these differences was small, and the trend with respect to province-level wealth was inverse to expectations. We accepted Hypothesis 2, noting that provincial background had a detectable, though limited, influence on academic outcomes in this EMI context.
Hypothesis 3: Gender Differences in GPA
Hypothesis 3 stated that male and female students would differ in academic performance. The Mann–Whitney U test comparing GPA distributions of male versus female students was highly significant for the combined sample (U ≈ 3.61 × 107, p < .001). On average, female students earned higher GPAs than male students, supporting Hypothesis 3. The mean GPA for females was 3.45 (SE = 0.007) versus 3.12 (SE = 0.011) for males, a mean difference of about +0.33 grade points in favour of females. This gender gap was consistent across all academic years and most majors. We further examined the differences within each college (disciplinary context). Figure 2 illustrates the GPA distribution by gender. In all four colleges, females had higher median and mean GPAs than males. A breakdown by college showed the gender GPA gap (female minus male) was roughly 0.32 in CBPM, 0.34 in CLA, 0.27 in CAD, and 0.19 in CSMT. All these differences were statistically significant (Mann–Whitney p < .001 for each college, except CAD, p = .0062).

GPA distribution between male and female students.
We also calculated rank-biserial effect sizes for the gender differences. The effect of sex was small in the CAD college (r ≈ .28), slightly smaller than the other colleges, the effect sizes were moderate (r in the .3–.6 range). Across the whole dataset, the effect size for females versus males was very small and insignificant.
Hypothesis 3 supported that females consistently outperformed male students in GPA at the institution. The gender disparity was present across all majors and was statistically robust. While the overall effect on GPA was modest in magnitude, it was remarkably consistent, making a strong case that gender is associated with academic performance in this EMI context.
Hypothesis 4: Semester (Fall vs. Spring) Performance Differences
Lastly, Hypothesis 4 predicted that student performance would differ between spring and fall semesters. A Mann–Whitney U test comparing GPA in fall versus spring terms revealed a significant difference in the overall sample (U = 3.61 × 107, p < .001). The median and mean GPA were slightly higher in the spring semesters than in the fall. Specifically, the average GPA in the fall (all students, all years) was about 3.28 (SE = 0.008), and in spring, it was about 3.34 (SE = 0.009). This pattern suggests that students perform better academically in the spring term, supporting Hypothesis 4.
Semester differences within each college (Table 4) using Mann–Whitney for fall versus spring by college showed that three of the four colleges showed a significant difference: in CBPM, CLA, and CSMT. Spring semester GPAs were significantly higher than fall GPAs (p < .001 in each case). In the CAD college, the difference did not reach significance at the .05 level (p = .062), suggesting architecture/design students had similar performance across semesters. Even where significant, the effect sizes of the semester were small (rank-biserial r ≈ .09 for overall; college-specific r values <.1). This indicates the semester effect, while consistent, was not large.
Mann–Whitney U Test for GPA: Fall Versus Spring Semesters.
In practical terms, the spring advantage amounted to roughly 0.05 to 0.07 GPA points on average. Although modest, this difference was observed over multiple years and cohorts. We thus accept Hypothesis 4: there is a recurring pattern of slightly better academic performance in spring semesters compared to fall semesters among WKU students. This finding aligns with prior observations of seasonal performance shifts in education.
Having confirmed all four hypotheses with statistical significance, we next discuss these results in the context of existing literature and theoretical frameworks.
Discussion
This study explored several factors influencing students’ academic performance in an English-medium instruction (EMI) university in China. Focusing on Chinese students’ English proficiency (Hypothesis 1), geographic origin (Hypothesis 2), biological sex (Hypothesis 3), and semester timing (Hypothesis 4), we found that each factor had a significant impact on GPA. Within our university sample, GPA varied by province, female students outperformed male students, and students achieved slightly higher GPAs in spring than in fall, albeit in very small effects with the exception of gender effects. Students’ English proficiency at entry had a significant but very small effect on their college GPA, and this effect diminished as students spent more time in the EMI environment. Many associations are statistically significant, their effect sizes are small (model R2 near zero). Our focus is therefore on providing a theoretical basis, documenting small-but-reproducible patterns consistent with EMI-related academic language development. A practical reading of these small effects (e.g. R2 ≈ .002–.05) underscores low-stakes programme adjustments, such as semester-start diagnostics for adaptation signals, rather than high-stakes individual interventions, informing EMI administrators on resource allocation.
For Hypothesis 1, the finding that initial English proficiency significantly influenced academic performance is expected in an English-taught university. Language proficiency is a foundational component of academic success, and students must understand lectures, participate in discussions, and complete assignments in English. However, our study found this effect weak and waned over time. Higher English Gaokao scores corresponded to marginally better GPAs in the first year, but the correlation nearly vanished in later years. This suggests that students gradually adapted to the English language environment or developed coping strategies to overcome their initial language limitations. From a BICS/CALP perspective (Cummins, 1979), many students likely entered with limited academic English proficiency (CALP), initially hindering their performance. Over time, through immersion and coursework, they improved their academic language skills, reducing the gap between higher and lower-proficiency students. From another perspective, this indicates that baseline conversational English (BICS) does not, on its own, determine the development of academic language (CALP). When educational resources, pedagogy, and assessment are held constant, students’ CALP trajectories converge over time through sustained disciplinary reading and writing, iterative feedback cycles, and genre practice. Another possible explanation is modern translation technology and peer support: students might use translation apps or bilingual resources extensively in early semesters, mitigating the disadvantages of lower English proficiency. As they progress, even those with lower English ability gain the skills (or at least find effective strategies) to succeed in their studies. In our context, all students must also take required English language courses in the first year, which could help equalise their academic language proficiency. Thus, while English proficiency at college entry is a predictor of GPA, its predictive power diminishes over time, and by the senior year, other factors dominate academic performance. This finding resonates with the idea that second-language academic skills can be acquired on the job, as it were. It also suggests that initial language support and effective study skill programs can help lower-proficiency students catch up, aligning with the broader literature that finds language proficiency is necessary but not solely sufficient for long-term academic success (Schoepp, 2018). By graduation, factors like individual motivation, study habits, and mastery of content likely outweigh the influence of initial English test scores. In our EMI context, sociocultural adaptation (Vygotsky, 1978) mediates BICS-to-CALP progression via immersion in American-style coursework and peer interactions, fostering academic literacies despite initial barriers. This mechanism explains the waning proficiency effect, as lower-proficiency students leverage institutional supports to bridge gaps.
Turning to Hypothesis 2, we observed significant differences in average GPA among students from different provinces. This result aligns with studies documenting regional disparities in student performance within China’s education system (Banda et al., 2023). Students from provinces with traditionally strong K–12 education (e.g. Zhejiang, Jiangsu, Shandong) tend to have higher GPAs at WKU, suggesting that regional educational quality and student preparedness continue to matter at the tertiary level. Interpretations centre small magnitudes and avoid socioeconomic causation, focusing on associational channels like selection rather than direct wealth effects.
To probe potential drivers, we next examined provincial economic levels. Interestingly, our regression indicated that students from higher-GDP provinces had slightly lower GPAs, a pattern that appears counterintuitive at first glance. We interpret this as an exploratory, non-causal association consistent with two objective mechanisms. First, a selection/composition mechanism: economically stronger provinces host a denser concentration of high-prestige universities and often maintain meaningful local admission advantages (Borsi et al., 2022). High-achieving students from these provinces therefore face more attractive outside options and are less likely to enrol at our institution; those who do matriculate may occupy a lower within-province percentile of prior attainment, yielding slightly lower average GPAs in our setting. Second, a time-allocation/opportunity-cost mechanism: students from richer provinces typically encounter a richer off-campus opportunity set (internships, competitions, entrepreneurship, metropolitan networking) (K. H. Li & Lau, 2023). Time invested in these activities,valuable for labour-market preparation, can displace coursework effort at the margin and modestly depress term GPAs even when academic ability is comparable.
Complementing these structural accounts, prior work also points to family environment and motivation. Under rapid economic growth and the one-child policy context, some affluent urban families may inadvertently foster lower perceived academic pressure, potentially reducing day-to-day academic drive (Sang, 2017), whereas students from less developed areas or lower-SES families may feel a stronger imperative to study as a route to mobility (Kim et al., 2018). Taken together, our findings challenge the simplistic assumption that coming from a richer province automatically confers an academic advantage in college; instead, they highlight a nuanced interplay between structural factors (schooling quality, higher-education supply, opportunity structures) and individual factors (motivation, aspirations).
Curricular alignment may also matter. Differences across provincial Gaokao systems, some more oriented toward rote learning, others toward problem-solving, could yield varying levels of preparedness for an American-style curriculum, further shaping early university performance. Overall, Hypothesis 2 underscores that geographic background is a noteworthy factor in an EMI university, but not always in expected ways. This suggests that educators should recognise heterogeneity in student preparation and provide targeted support for those from under-resourced regions, while also designing strategies to keep students with abundant outside options meaningfully engaged in coursework. These provincial patterns align with sociocultural theory, where regional preparation influences initial CALP adaptation in EMI settings.For Hypothesis 3, we found that female students outperformed male students in GPA across all majors. This aligns with a large body of research in both Chinese and international contexts that reports females achieving higher grades on average (Erdem et al., 2007; Kurek & Górowski, 2020). It is notable that the female advantage at WKU was upheld even in fields like science and technology, where males often are presumed to excel. In our data, the GPA gap was present in STEM (CSMT) and non-STEM fields alike, though slightly smaller in CSMT. These results echo findings from other studies suggesting that female students tend to have better study habits and may be more diligent in continuous assessment environments (Woodfield et al., 2005), where one possible reason is that the American-style education at WKU involves a lot of coursework, assignments, class participation, and exams. Prior research has suggested that such continuous assessment formats may favour female students, who typically engage more consistently with coursework. In contrast, male students might fare better when performance hinges on high-stakes exams alone.
Additionally, females generally have stronger language skills on average, which could confer an advantage in an EMI setting (Božinović & Sindik, 2011; Kaushanskaya et al., 2011). If female students have even slightly better English proficiency or communication skills, this could help them grasp material and express understanding more effectively. Another factor could be differences in extracurricular activities or time use: our study took place during years when online gaming and internet use among youth have been noted to affect academic focus. Prior research in China reported that male students spend more time online gaming, which can detract from study time (Teng et al., 2014). If that pattern holds, it could partially explain why male students underperform academically relative to females. It is also worth noting that this female advantage in grades does not necessarily contradict studies that find males scoring higher in standardised math tests or specific cognitive skills; instead, it suggests that academic performance is multifaceted, and in holistic university evaluation (projects, papers, participation, etc.), females currently have an edge. Our findings make a strong case for the persistence of the gender gap favouring females in academic achievement (supporting Hypothesis 3). This has implications for educators and policymakers: while it is positive that female students are thriving, attention might be needed to understand and support male students’ engagement and learning strategies. The goal should be to ensure all students reach their potential, possibly by addressing factors that particularly hinder male academic performance (for example, by promoting positive study behaviours or providing mentorship for at-risk male students). To deepen this, future mixed-methods work could test speculations via surveys on study habits/gaming, revealing if EMI assessment formats exacerbate gaps (e.g. via regression on time-use data).
The 2019 to 2022 period overlapped COVID-19, with partial online/remote delivery, on-campus isolations, and temporary Pass/Fail policies (C or above as Pass, excluded from GPA). This likely reduced GPA variance and attenuated small associations, complicating semester comparisons. For Hypothesis 4, we observed that students performed better in spring than fall semesters, albeit very marginally. This mirrors the findings by others who noted seasonal performance patterns in another context (Beşoluk & Önder, 2011). One explanation relates to the academic calendar: fall semesters are preceded by the long summer break, during which students may experience “learning loss” or need time to readjust to academic work. Spring semesters, in contrast, follow only a short winter break and start right after fall so that students may carry forward their momentum. Our data support this interpretation that the first semester back (fall) saw slightly lower average GPAs, and students were in full stride by spring (the second continuous semester). This aligns with summer learning loss, where students forget some of what they learned or lose academic skills over an extended break (Kuhfeld, 2019). Another contributing factor could be health and well-being: the fall semester runs through late summer and autumn, whereas the spring semester includes winter and early spring. Some studies suggested that illness patterns (e.g. flu season in winter) could affect academic outcomes (Shephard & Aoyagi, 2009), but if anything, one might expect winter illnesses to hurt spring performance. However, by spring, first-year students have also adjusted to university life, and others have chosen courses or schedules better suited to them, possibly improving performance. It is also plausible that instructors tighten standards in the fall when courses begin and might be somewhat more lenient, or students are more accustomed to expectations by spring. The pandemic adds another wrinkle: during 2020 to 2021, many fall classes were online or disrupted, while spring 2021 resumed more normal operations in China, potentially influencing grades. Setting those anomalies aside, the “spring effect” we observed remained consistent across multiple years. While the effect size is small, it suggests that academic support may need to be enhanced for students returning from summer. Some scholars have debated whether such differences reflect learning loss or different course-taking patterns (Kuhfeld, 2019). In our case, the finding invites further investigation: do students take harder courses in the fall and easier ones in the spring? Or is it indeed that they are a bit rusty in the fall? Future studies could look at measures of student engagement or diagnostic tests at the start of each term to quantify any decline in skills over the summer. If summer learning loss is a factor, one practical implication could be to offer optional refresher workshops or coursework reviews at the beginning of the fall term or encourage students to engage in academic activities during summer. Overall, Hypothesis 4’s confirmation adds to the growing evidence that timing matters in academic performance, even at the college level. Alternatives like course difficulty distribution were not tested here due to data limits; future analyses could regress GPA on course loads by semester to disentangle adjustment versus selection effects.
Limitations and Further Research
While this study analysed a large dataset and yielded significant findings, several limitations should be acknowledged. Non-parametric tests suited non-normal data but limited advanced modelling (e.g. interactions); parametric alternatives post-transformation could enable deeper analyses in future. As a single-institution study, external validity is limited; multi-institution replication in other Chinese EMI contexts is needed.
First, although significant, the effect sizes for provincial and semester differences were small; the Kruskal–Wallis tests indicated statistically reliable variation among provinces, but these differences explained only a small share of GPA variance. Non-linguistic attributes (e.g. self-efficacy) are likewise critical to how EMI outcomes are formed (Tang & Curtis, 2025). Thus, caution is warranted in inferring practical impact; unmeasured factors such as individual student aptitude, study habits, or instructor grading differences likely account for more variance. In addition, the analyses draw on records from a single EMI institution, which limits external generalisability to other EMI contexts in China or elsewhere. Further research could incorporate additional individual-level variables such as high-school GPA, motivation, or English communication skills upon entry to provide a fuller picture of academic success predictors. A second limitation is that our “provincial effect” measure is somewhat indirect. We attributed differences to the province of origin, but provinces differ in many ways: education systems, cultural attitudes, and possibly student self-selection (certain universities attract certain regions). Our exploratory finding of a negative correlation between provincial GDP rank and GPA is non-causal and may reflect (i) selection/composition effects (e.g. local university supply and admissions advantages) or (ii) time-allocation costs (e.g. off-campus opportunities in wealthier provinces). Qualitative inquiry into these channels is a future direction. Future research should examine why students from wealthier provinces underperformed expectations. One hypothesis is differing parenting or learning cultures. Surveys or qualitative studies could explore students’ study attitudes and values in college relative to their upbringing. For instance, investigating parenting styles by province (e.g. more authoritarian vs. indulgent) could shed light on academic behaviours. China’s provinces also have varying “gaokao immigration” policies, and some students attend high school outside their home province to gain an advantage; such complexities were beyond our scope (X. Li & Zhang, 2023). A follow-up study might control for students’ high school locations or specific Gaokao test versions to refine understanding of regional preparation differences.
Regarding the English proficiency measure (Gaokao scores), one limitation is measurement limitation for “initial proficiency”. Our use of Gaokao English as the entry proficiency indicator entails cross-province comparability limits: paper formats, difficulty, scaling, and cut-scores vary across provincial examinations and cohorts. Consequently, the “initial proficiency” variable should be interpreted as a coarse proxy that blends true language skill with province-specific test features. This proxy limits comparability; future work could use external diagnostics (e.g. EAP/IELTS). Future studies could incorporate an independent English diagnostic test upon college entry and later during college to directly measure language gains, although this could be logistically challenging. This could validate our interpretation that students’ English (especially academic English) improved over time, contributing to the diminishing effect of initial proficiency. It would also help identify students who might benefit from targeted language support early in their studies.
Across models, associations that reached statistical significance were small in magnitude with limited variance explained. This constrains generalisability and practical salience. Such small effects may reflect restricted measurement of language proficiency, unobserved confounding (e.g. socioeconomic status, prior attainment, parental education), between-programme/course heterogeneity, and possible attenuation due to grading and delivery changes during COVID-19.
The gender difference we observed, while robust, leads to a somewhat uncomfortable question: what can be done about it? We acknowledge that our study did not delve into causative factors for why male students had lower GPAs. We speculated on factors like language, study behaviour, and distractions (e.g. gaming) but did not measure these. Future research could include surveys on study time allocation, extracurricular activities, or academic engagement broken down by gender. There may also be pedagogical implications: if certain teaching methods or assessments inadvertently favour one gender’s learning style, adjustments could be considered. However, given that this gender gap is found in many contexts (not just EMI), it likely reflects broader educational social trends. While our research demonstrated the gap across all majors at WKU, interventions to reduce this disparity are not straightforward. It may require understanding motivational and learning differences, potentially through qualitative research (interviews or focus groups with male and female students about their academic challenges).
Another limitation is the unique context of this study: a Sino-foreign university with an American curriculum in China. While this provided an interesting setting to isolate language effects, it is not a typical university environment. Caution is needed to generalise findings to other EMI programs or regular Chinese universities. For example, WKU’s grading practices and student population (who chose an English-taught program) might differ from a regular Chinese university or a Western university’s Chinese international student population. Replicating this study to other EMI universities in non-English-speaking countries would strengthen the validity of these results. It would be beneficial to compare this with data from foreign students in anglophone countries to see if similar patterns (e.g. spring vs. fall differences, gender gaps) exist when cultural adaptation is a factor.
Pandemic-period exposure (2019–2022) is an additional limitation; the direction and magnitude of its impact on GPA remain contested. Many universities adopted flexible and lenient grading policies during the pandemic, and some of these policies were retained even after it ended (Kuperman et al., 2025). Our study window overlapped with COVID-19 disruptions: although instruction at WKU was predominantly in person, some terms and modules were delivered partly online or remotely due to entry/travel restrictions, with periods in which students remained on campus but attended classes online. In the same period, some courses temporarily allowed a Pass/Fail option (grades of C or above could be converted to Pass and were excluded from GPA). Several studies suggest that, during COVID-19, upper-year (senior) students were more likely to opt for flexible grading policies (e.g. Pass/Fail), which may have reduced GPA dispersion among seniors (Mostafa et al., 2023). Lower-scoring students likewise tended to use flexible grading to diminish the impact of low-GPA courses (Rodríguez-Planas, 2022). International mobility was largely suspended, so the cohorts included few or no study-abroad/exchange students. These features may have reduced observable GPA variance (via Pass/Fail), introduced self-selection into which grades entered the GPA, curtailed peer interaction and feedback cycles during isolation, and increased heterogeneity in study time and assessment conditions (e.g. quarantines, stress, connectivity constraints). As a result, the already small associations (e.g. entry-English → GPA, semester contrasts) should be read conservatively, and external generalisability beyond the pandemic context is limited. Future research conducted under more stable conditions can confirm that the patterns reported here are not artefacts of this extraordinary period. Finally, due to data-availability constraints, we could not observe several established predictors of academic success (e.g. high-school GPA or entrance percentile, socioeconomic status, and parental education). Accordingly, our estimates should be read as partial associations conditional on the available controls, with scope for omitted-variable bias. In addition, the province-related analyses cannot adjust for potentially important confounders such as course/major mix and grading severity, high-school location (urban/rural), or province-specific Gaokao-migration policies; thus, the observed province–GPA patterns are best interpreted as compositional signals rather than causal effects. Finally, although we consistently observe a gender gap, we lack direct measures of hypothesised mechanisms (e.g. study habits and time use, gaming/screen time, preferences for assessment formats, attendance, feedback uptake), so any mechanistic discussion remains tentative. Future work that links institutional records with survey/administrative covariates and course-level assessment metadata would enable richer controls and sharper tests of these mechanisms.
In terms of further research directions, one area could be examining academic literacy development more closely. We inferred that students improved their academic English, but how did they achieve this? Did their writing quality or critical thinking in English improve markedly from freshman to senior year? Conducting a longitudinal study of a subset of students and evaluating samples of their academic work overtime could provide insight. Another promising area is the impact of support systems: since WKU is an EMI institution, it likely has tutoring, writing centres, or bilingual support for students. Investigating which supports are most utilised and effective (and by which students) would inform best practices for EMI programs. For instance, do lower-proficiency students heavily use office hours or tutoring, and does that correlate with narrowing the GPA gap?
Our findings open new questions about the interplay of language, culture, and education. By focusing on a context that strips away some usual confounds, we were able to observe more subtle effects. Given the consistently small effects (e.g. H1: R2 ≈ .002), these patterns are better understood as programme-level signals rather than individual decision rules. For EMI practice, they are suited to low-stakes, proportionate adjustments: embedding short, discipline-specific academic-language scaffolds in core courses; front-loading supports in the first year; piloting light-touch bridging activities for cohorts from particular curricular backgrounds; and trialling optional, gender-responsive engagement strategies, with attention to participation and uptake rather than placement or gatekeeping. More broadly, the results pose cautious questions about how language, culture, and educational structures interact. Observing a relatively homogeneous institutional context helped surface subtle associations; continuing this line of inquiry, thatwith clearer measures of academic-language practice and policy exposure, could inform the design of language training, region-sensitive bridging, and targeted engagement in multilingual higher education without overstating effects.
Conclusion
In conclusion, our findings provide several insights into factors affecting academic success in an English-medium tertiary program in China. First, while English language proficiency at entry does play a role in an EMI university, its effects were surprisingly small and diminished over the college years. This suggests that initial language disadvantages can be overcome and should not be considered deterministic barriers to success. Second, we found that students’ provincial backgrounds influenced their university performance, but not straightforwardly, as one might anticipate students from less economically developed regions performed as well as or better than those from wealthy areas, highlighting the importance of individual motivation and perhaps challenging assumptions about resource advantage. Third, we observed that female students consistently outperformed male students across all fields of study, reinforcing findings of a gender gap in continuous academic assessment environments. This pattern calls attention to how educational practices can effectively support all genders. Fourth, we identified a seasonal academic performance pattern, with students doing better in spring than fall semesters, hinting at phenomena like summer learning loss or adjustment effects that merit further exploration in higher education.
Ultimately, this study highlights that improving tertiary education outcomes in EMI settings is multifaceted. By shedding light on how English proficiency, region, gender, and study time relate to performance, we hope to inform more holistic strategies to foster student success. Our research opens avenues for future inquiry into the mechanisms behind these effects, and we anticipate that ongoing research will continue to uncover how best to support learners in an increasingly English-mediated yet culturally diverse landscape of higher education.
Footnotes
Acknowledgements
ChatGPT 4 was used to improve the readability and clarity of the writing.
Consent to Participte
Informed consent were exempted as the satisfaction was aggregated in the collection during routine university processes.
Author Contributions
B.L. performed the study and wrote the draft. S.K.-E.G., A.T., and C.Q. edited the manuscript. C.Q. verified the statistics. H.Z. provided resources and administration of the project. Q.Z. provided the data and resources for the work.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was financially supported by Wenzhou-Kean University High Level Talent Program (No. WB20220901000091) and the Zhejiang Province Fourteenth Five-Year Plan “Education Reform Project.”
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The raw data is available upon reasonable request made to the corresponding author.
