Abstract
Aims:
This study set out to test the hypothesis that proficiency in a second language can lead to emotional advantage, via increased Emotional Intelligence and improved Facial Emotion Recognition (FER).
Design:
Unlike previous studies, this project adopted a within-subject design, rather than comparing bi- and monolinguals. We investigated the participants’ performance on FER tasks, as a function of their second-language English proficiency and trait emotional intelligence.
Data and analysis:
Using an online experimental task, we tested FER in static posed photographs in 256 adult participants with a wide range of native languages. To examine the role of task type, multiple-choice and free-labelling protocols were used. We collected self-reported measures of L2 English proficiency and administered a direct proficiency measure, as well as a measure of trait emotional intelligence. Multiple regression analysis was used to examine the relationship between the variables.
Findings:
The analysis revealed only a relationship between the direct proficiency measure and the multiple-choice FER task, but no effect of trait emotional intelligence or self-reported L2 English proficiency.
Originality:
This study contradicts previous findings based on across-subject comparisons in linguistically and culturally homogeneous populations.
Implications:
The results suggests that the relationship between bilingualism and FER is sensitive to methodological, cultural, and linguistic differences. Future investigations of the relationship between language and emotion in bilinguals should take that into consideration.
Introduction
Facial emotion recognition
Facial emotion recognition (FER) is the ability to infer emotional states from facial expressions. Since awareness of the feelings of others is essential for successful social functioning, FER can be expected to constitute an important part of socioemotional competence (Saarni, 1999). Indeed, research has revealed an association between FER deficits and such diverse mental health concerns as autism spectrum disorders (Ola & Gullon-Scott, 2020), history of childhood physical abuse (Pollak & Sinha, 2002), borderline personality disorder (Daros et al., 2013), mood disorders (Suddell et al., 2021), and psychopathic traits (Cooper et al., 2020). In children, low emotion recognition ability has been shown to predict emotional and behavioural problems (Blair & Coles, 2000; Shields et al., 2001) and lower social adjustment (Garner et al., 1994). Furthermore, SSRIs have been shown to positively impact FER, and there is some evidence that FER coaching may improve psychological outcomes (Suddell et al., 2021), implying a causal relationship. While the associations between FER and wellbeing are likely complex, and may be moderated by additional factors, their robust and lifelong nature strongly suggests that FER is a fundamental component of healthy psychological functioning.
FER and language
Nonetheless, the psychological foundations of the mechanism of FER are not yet fully understood. The classic early studies of emotion recognition suggested that FER is an innate, universal skill (Ekman, 1992; Ekman & Friesen, 1971). These studies used a range of posed facial expressions for the six basic emotions: anger, sadness, happiness, surprise, fear, and disgust, and tasked participants from different cultural backgrounds to match them with the labels. That task became a gold standard in emotion research, and the high levels of performance led Ekman and his colleagues to conclude that these basic emotions are biologically wired. However, later research, extending the range of cultures, individuals, and tasks indicated wide-ranging variability in performance (Barrett et al., 2011).
First, cross-cultural convergence on labels varies significantly across emotions, with happiness achieving the highest agreement between and across Western and non-Western cultures, followed by surprise, sadness, disgust, anger, and fear (Russell, 1994). For the latter emotions, inter-subject agreement often does not exceed 60%–80% (Chan, 1985; Matsumoto, 1992), and some cross-cultural studies indicate agreement rates approaching chance (Winkelmayer et al., 1978). It is only the general difference between positive and negative faces that appears to be universally recognised – a pattern that extends to developmental studies with babies (Barrett et al., 2011; Lindquist & Gendron, 2013).
Second, performance is significantly affected by the research methodology used–including variation in participants, design, and task. When investigating non-Western cultures, large differences are observed between literate and illiterate, urban and rural subjects, presumably due to a greater Western culture exposure of the urban, literate groups (Ducci et al., 1982). Similarly, results differ between studies employing within-subject versus across-subject design (Russell, 1991). When the original forced-choice task is replaced by free naming, accuracy can drop by more than 50% (Izard, 1971; Russell et al., 1993). Perhaps most remarkably, even forced-choice task can elicit different responses, depending on the number and prototypicality of the options offered (Russell, 1993). Furthermore, the cross-cultural overlap in picture-based emotion categorisation drops significantly when participants are asked to sort pictures based on similarity, rather than label them (Gendron et al., 2014).
Such findings motivated the emergence of constructivist approaches to emotion (e.g., Barrett, 2012, 2014, 2017), which propose that both emotion expression and emotion recognition are not innate, but rather learnt in development, with language serving as the essential factor connecting multiple different experiences and observations of an emotion to a single label. In this view, emotion words serve as ‘glue’ that binds distinct expressions and experiences of emotional states together to form a culture-specific, coherent emotion category. Instead of being innately specified, emotions are thus learnt in a similar way to other abstract concepts (Hoemann et al., 2020), and emotion recognition is then guided by both bottom-up (interoception) and top-down (conceptual knowledge) processes (Barrett, 2012).
Indeed, much developmental research seems compatible with the idea that emotion concept development is gradual and affected by language: not only is emotion recognition linked to general language development (Pons et al., 2003; Widen & Russell, 2008), but also to breadth of emotion vocabulary in particular (Streubel et al., 2020), and young children are better at matching emotion illustrations to a linguistic label than to a prototype (Camras & Allison, 1985; Russell & Widen, 2002). In adults, there is experimental evidence that people’s perception of emotions is affected when language is manipulated. Specifically, when access to verbal labels is blocked through repetition, participants become worse at emotion categorisation (Doyle et al., 2021; Lindquist & Gendron, 2013). Overall, it appears that language plays at least a facilitatory role in emotion recognition.
FER and bilingualism
The role of language in emotion recognition can be further illuminated by looking at second-language acquisition. While FER in bilinguals is still a fairly new research area, it has already revealed evidence of lifelong emotional development, as well as differences between mono- and multilinguals in the area of emotion knowledge and experience. For example, recent findings indicate that there is progressive synchronisation of emotional reactions and perceptions which occurs through a person’s prolonged contact with a non-native language and culture (De Leersnyder, 2017). This process, termed emotional acculturation, has been observed in immigrants (De Leersnyder et al., 2011; Hammer & Dewaele, 2015), cultural minorities (De Leersnyder et al., 2020; Senft et al., 2022), and foreign language learners (Alqarni & Dewaele, 2020; Lorette & Dewaele, 2015).
Furthermore, the small number of studies which have investigated the FER skills of adults in their non-native language report within-subject differences in performance across languages, often related to L2 language proficiency. Somewhat counterintuitively, several of these studies have found an advantage in emotion recognition in English for unbalanced bilinguals for whom English is their less dominant language. For example, Spanish-English and Hindi-English bilinguals showed higher FER accuracy in English than in their L1 (Matsumoto & Assar, 1992; Matsumoto et al., 2008). On the other hand, other findings across different native languages and modalities suggest varying patterns, including an advantage for non-native speakers over native speakers (Alqarni & Dewaele, 2020; Zhou et al., 2021), or over balanced bilinguals (Lorette & Dewaele, 2020), lack of significant difference (Dromey et al., 2005), or an inverse relationship between L2 English proficiency and emotion recognition (Graham et al., 2001).
This pattern of conflicting results is similar to the one observed in research into cognitive dis/advantages of bilingualism in general (Bak, 2016). While some studies report positive associations between bilingualism and various cognitive skills (Bialystok, 2015), many fail to report any effects, which has led to a heated academic debate (cf. Adesope et al., 2010; Lehtonen et al., 2018; Novitskiy et al., 2019), often focusing on the methodological issues of studying the relationship. Indeed, it is conceivable that the diversity of results may be related to the differences in tasks and demographics of the specific studies. For example, group comparisons between bilinguals and monolinguals or L1 and L2 speakers tend to be notoriously plagued by confounding factors, due to the inevitable self-selection of participants (Grosjean, 1998). Furthermore, language differences are often difficult to disentangle from cultural differences (Bak, 2016), which may be particularly important in studies of emotion recognition, as it is related to emotional acculturation. Finally, study design also plays a part – in the realm of FER studies, some use the traditional FER paradigm, while some vary either the response format (e.g., MCQ vs free labelling), task type (L1 or L2) or the nature of the stimuli (e.g., statis vs dynamic materials, in visual vs auditory modality). The fact that subjects come from very different demographic populations might also have an effect, especially that it is likely that emotional acculturation also plays a part.
In addition – partly due to these complications, and partly due to the nature of the scientific endeavour – even decisively showing an association does not in itself have explanatory power. While proponents of a general cognitive bilingual advantage propose improvements in the executive function as the mediating process, this is an unlikely explanation for any advantage in emotion recognition. Therefore, other possible links have been proposed in the emotion recognition literature, such as emotional intelligence or emotion regulation (Alqarni & Dewaele, 2020; Matsumoto et al., 2020). Indeed, links between trait Emotional Intelligence and FER in monolinguals have previously been reported (Dewaele et al., 2019), suggesting that bilingualism could increase emotional intelligence, leading to improved emotion recognition ability. Such interpretation would be compatible with the constructivist approach to emotion, as people with a wider emotion vocabulary would be expected to predict and perceive emotions in a more nuanced and accurate way. It is therefore plausible that bilinguals, with their large combined emotion vocabularies, might experience a boost in trait Emotional Intelligence, leading to improved emotion recognition ability.
Nonetheless, it is difficult to imagine how to disentangle improvements in emotional intelligence and emotional acculturation. As Russell (1994) persuasively argued in his critique of cross-cultural FER studies in general, literate participants from Western and Western-adjacent cultures are likely to develop similar understanding of emotion through exposure to similar content in drama, television, or other media. This issue is perhaps even more relevant to 21st-century second-language English speakers, who have much more extensive exposure to Western content on social media. Therefore, their performance in English could simply reflect their better adjustment to native-like expectations set by researchers in that language. In addition, L2 speakers of English will have most certainly be taught labelling emotion expressions in the course of their language learning. It is not inconceivable that under some conditions, the fairly artificial and stereotypical nature of the FER tasks makes them resemble practices typical of second-language classroom, which could explain participants’ better performance in their L2.
Summary
In summary, FER is an essential socioemotional skill, which is still not fully understood. The large across- and within-participant differences in performance across cultures, ages, and tasks, indicate that this skill is acquired in development through exposure to language and culture. This suggests that we can learn more about the complex nature of FER by studying second-language learners, to reveal the precise effects of language on the skill, without resorting to group comparisons. However, existing research to date has delivered mixed results, and thus the question of the relationship between bilingualism and emotion recognition remains open. Therefore, the current study set out to contribute to this growing research area by investigating the relationship between L2 language proficiency, emotion recognition, and trait emotional intelligence, in a population of unbalanced bilinguals with English as L2.
Using a within-subject design, we set out to (1) test the participants’ FER using both a forced-choice and a free-labelling task, to identify potential effects of task type, and (2) collect direct and indirect measures of their second-language proficiency and general emotional competence, to reveal individual differences. In this way, the study aims to contribute to existing research by closely examining patterns of bilingual performance across two different tasks, as related to linguistic and psychological factors. Specifically, the study aims to answer the following three research questions:
If emotion perception is shaped by language, then we expect to see a correlation between individuals’ language proficiency and their emotion recognition across different native languages and cultural backgrounds.
If second-language proficiency offers an advantage in emotion recognition due to better developed language categories, then we expect to see an effect of language proficiency that is independent of general emotional competence. In contrast, if bilingualism is independently linked to emotional intelligence, then we expect to find that emotional intelligence is better able to explain the difference between participants.
If higher second-language proficiency affects emotion recognition due to increased conceptual granularity of emotion concepts, then we expect to see a difference in performance between a task that rewards fewer, rigid categories (forced choice), and a task that allows for nuanced responses (free labelling).
Method
Design
This study used an experimental design, with forced-choice and free-label FER accuracy as the two dependent variables, and trait emotional intelligence and English lexical knowledge as the two independent variables. An online questionnaire was created and administered via Qualtrics, which consisted of four direct measures: two emotion recognition tasks, an English verbal knowledge test, and an emotional intelligence questionnaire. The questionnaire also included indirect, self-report measures of L2 English proficiency for comparison.
Materials
The first two tasks tested the participants’ FER, using images from the Karolinska Directed Emotional Faces (KDEF, Lundqvist et al., 1998). The KDEF is an open-access set of colour photographs of 70 amateur actors aged 20–30, posing facial expressions for the six basic emotions, which was developed for the purpose of emotion recognition research. In a validation study (Goeleven et al., 2008), the set has been found to have a mean recognition rate of 72%, which is comparable with similar materials used in FER research, as well as good test–retest reliability of 88%. Front-facing pictures from three male (AM06, AM25, AM29) and three female actors (AF01, AF03, AF14) were selected for use in the current study. Sample pictures are shown in Figure 1.

Sample KDEF images (image IDs in parentheses) for (a) anger (AF01ANS), (b) happiness (AF03HAS), (c) sadness (AM25SAS), and (d) disgust (AM29DIS).
The third task was the Lexical Test for Advanced Learners of English (LexTALE, Lemhofer & Broersma, 2012), which is a short measure of English proficiency based on a yes/no lexical decision task. LexTALE consists of 20 nonwords and 40 words of systematically varied input frequency, which participants have to classify as real or not. The test has a high reported reliability of .67–.81, and has previously been found to offer better predictions than self-reported English proficiency in experimental studies (Lemhöfer & Dijkstra, 2004; Lemhöfer et al., 2008). Appendix 1 includes all the items used in the study.
However, since LexTALE only measures vocabulary, additional indirect measures of L2 English proficiency were included in the study. Specifically, participants answered questions related to five factors commonly identified in research as predictive of bilingual language proficiency: self-rated proficiency, reported standardised test result (CEFR or IELTS), age of acquisition, length of residence in an English-speaking country, and daily exposure to English. These questions served as an additional measure to be used for comparison with LexTALE. Appendix 2 lists all the questions used.
The final task was the short form of the Trait Emotional Intelligence Questionnaire (TEIQue-SF, Petrides, 2009). TEIQue-SF is a list of 30 items describing different facets of emotional intelligence (e.g., ‘I can deal effectively with people’, ‘Others admire me for being relaxed’), which participants have to rate on a 7-point Likert-type scale ranging from (1) – Completely Disagree to (7) – Completely Agree. The items are selected from the full-form TEIQue which has a high reported reliability (Cronbach’s alpha = .89 and .92 for females and males, respectively) (Cooper & Petrides, 2010).
The decision to include LexTALE and TEIQue-SF was based on the fact that both measures have been used in recent research on emotion recognition in bilinguals (Alqarni & Dewaele, 2020; Dewaele et al., 2019; Lorette & Dewaele, 2015, 2020).
Participants
Participants were recruited online through social media and the survey exchange portal SurveyCircle.com. The advert included the study title and a link to the questionnaire. In total, 351 participants started the questionnaire, but 95 did not complete all questions – a drop-out rate to be expected in a fairly extensive study with no participant rewards. Therefore, the final data set included responses from 256 subjects. Of these, 72 were male, 175 were female, and 9 identified as non-binary or did not wish to specify their gender. Table 1 presents age distribution in the different gender groups.
Summary of demographic data of the participants.
The participants represented 41 different native languages, of which the most common ones were Arabic (N = 35), Polish (N = 25), German (N = 25), Spanish (N = 22), French (N = 19), and Chinese (N = 18). Overall, the sample was thus culturally and linguistically diverse. Table 2 presents the list of all the native languages spoken by the participants.
Native languages of the participants.
Procedure
After clicking on the advert link, participants were directed to Qualtrics, where they first read the information sheet and were asked to indicate their agreement to participate on the consent form. After that, they completed a demographic survey, which asked for non-identifying personal information such as age, gender, and native language, as well as the five self-report questions related to L2 English proficiency.
Next, they were presented with the two FER tasks: half of the participants started with the MCQ task, and the other half with the Free-Label task. The pictures were counterbalanced between three male and three female actors, such that each of the two tasks included the same six actors, albeit modelling different emotions. For example, the actor portraying ‘anger’ in the MCQ task could be portraying ‘fear’ in the Free-Label task. This was meant to reduce the chances that the difference between tasks could be attributed to specific actors. Each participant saw all six emotions twice, each once per task (for 12 pictures altogether), presented in a random order within the task. Both tasks included the same prompt ‘What is this person feeling?’. In the MCQ task, participants had to select the emotion label from a drop-down list of the six emotion words, ordered alphabetically. In the free-label task, participants had to type the label into the text box.
Following the FER tasks, the participants completed LexTALE, with test items presented in a fixed, pseudo-randomised order (specified by test authors), each on a separate page. Finally, they filled in the TEIQue, with all the items presented simultaneously on one page. All the data was thus collected in a single session, which took on average 15 minutes to complete. After the session, participants were presented with the debrief sheet. The study procedure was evaluated and approved by the Northumbria University Ethics Committee.
Analysis
The data were analysed using IBM SPSS Statistics version 28, to reveal the relationship between the two independent and two dependent variables by performing correlation and multiple regression. The first independent variable was L2 English proficiency as measured by LexTALE score, computed as prescribed in Lemhofer and Dijkstra (2004), that is, as the average % correct on the two types of items: words and nonwords, to correct for the uneven number of these items on the test (40 vs 20). The second independent variable was trait Emotional Intelligence, as measured by TEIQue-SF, calculated according to the scoring key (Petrides, 2009).
The dependent variables were scores on the MCQ and Free-Label FER task. To score the Free-Label task, we followed the procedure used by Widen and Russell (2008). Two native speakers were independently presented with the list of all responses and asked to blindly categorise them into the six basic emotion categories (or as not categorisable). The inter-rater agreement obtained in this way was .71. In the cases where the raters disagreed, a third native rater was employed to resolve the conflict. Out of the 193 different responses elicited in the Free-Label task, 100 were judged as not directly compatible with any of the categories (a full list can be found in Appendix 3). The remaining 93 were judged to match at least one of the six categories, and all of these are presented in Table 3 (in their original spelling). These labels were then treated as ‘correct’ responses to their respective emotion pictures. Both FER task scores were thus calculated as the number of correct responses out of 6.
A list of the labels obtained in the Free-Label task which were judged by native speakers of English as belonging to one of the six basic emotion categories.
In addition, an exact match score was computed for the FL task, to reflect the number of times that the participants used the specific target label (i.e., one of the six labels used in the MCQ task, for example, ‘anger’, ‘sadness’).
After performing qualitative analyses to discover trends in the data and validate parametric assumptions, correlation was computed between the dependent and independent variables, to investigate potential relationships. Then, multiple regression analysis was performed to evaluate the relative contribution of each of the factors, using two models – one for each independent variable.
Results
Descriptive statistics
Table 4 summarises the descriptive statistics for the dependent and independent variables. The Z-scores of skew and kurtosis did not exceed the threshold of 3.29 recommended for medium-sized samples of 50–300 (Kim, 2013), except for the somewhat elevated negative skew of LexTale and Free Labelling scores. The details of each of the variables are discussed separately in the sections that follow.
Descriptive statistics for all variables.
SD: standard deviation; FER: facial emotion recognition.
Facial emotion recognition
Table 5 presents descriptive statistics for the participants’ performance on the two FER tasks. Presentation order of the tasks was randomised to control for priming effects, and we conducted independent samples t-tests to compare the performance of those who completed the MCQ task first and those who started with the FL task. There was no significant difference between the two display order groups on the MCQ scores (t(254) = .796, p > .05, Cohen’s d = .100, 95% CI [−146, 346]) and the FL regular scores (t(254) = 1.944, p > .05, Cohen’s d = .244, 95% CI [−.003, .490]) (Cohen, 1988).
Means and standard deviations of scores on the two FER tasks by display order.
However, there was a significant difference between the groups on the FL exact scores (t(254) = 4.918, p < .001, Cohen’s d = .617, 95% CI [.365, .868]) such that the participants who started with the MCQ task had significantly higher exact FL accuracy than those who were given the FL task first, which was a medium effect size. This is unsurprising, since the exact scores were based on the participants’ use of the exact target labels in the free-labelling task. Predictably, the participants who just saw the labels in the MCQ task were more likely to use them in the FL task.
Next, to determine whether the FL task penalised higher-proficiency speakers for higher category granularity, we compared the regular and exact FL FER scores. Specifically, we wanted to see whether higher L2 proficiency would lead to lower exact scores, as it could be predicted that more advanced speakers were likely to have broader vocabulary, and might therefore be less likely to use the exact target labels. However, there was no effect of L2 proficiency as measured by LexTale on the exact FL FER scores (r =−.008, p = .895).
Given that the difference between the exact and the regular FL FER scores was shown to be mostly attributable to the order of task display, rather than any language-related factors, the exact FL match scores were not used in further analyses. Furthermore, since there were no significant differences in performance between the task order groups in terms of either MCQ or regular FL scores, all remaining analyses were performed on the entire cohort, collapsing across display order.
To compare the participants’ overall performance on the two FER tasks, we conducted a paired-samples t-test. As evident from Table 5, accuracy rates for the free-labelling were slightly lower than for the forced-choice task, which is to be expected. Paired-samples t-test revealed that the MCQ scores were significantly higher than the FL scores (t(255) = 11.360, p < .001, Cohen’s d = .710, 95% CI [.572, .847]). This was a medium effect size.
As in previous research, participants’ performance was not consistent across emotions. Table 6 presents the proportion of correct responses by emotion and by task in the entire sample.
Per cent correct responses by FER task and by emotion.
As evident from Table 6, the two emotions that were consistently correctly recognised were Surprise and Happiness – both eliciting over 90% correct responses in both tasks, followed by Sadness, which reached over 80% accuracy across tasks. On the other hand, the emotion that was recognised with the lowest accuracy was Fear, which did not exceed 30% in either task. The remaining two emotions, Anger and Disgust, were usually correctly identified in the MCQ task (93% and 95%, respectively), but had much lower recognition rates in the Free-Labelling task – 61% for Anger and 42% for Disgust.
These recognition rates are somewhat higher that those reported for the entire KDEF database (Goeleven et al., 2008), which is to be expected, given that only a small selection of stimuli from the database was used in our study. However, the low recognition rates for Fear are compatible with those reported by Goeleven and colleagues, who found a 43% hit rate across all stimuli in the set. Figure 2 presents scores across tasks in this study, with the previously reported overall KDEF scores for comparison.

Per cent accuracy in MCQ task, FL task, and Goeleven et al. (2008) forced-choice task using stimuli from the same corpus.
Table 7 presents number of errors by target emotion and response, across tasks and stimuli. The emotions that were most commonly mistaken for one another were Surprise and Fear (211 responses), followed by Anger and Disgust (124 responses). Fear and Disgust were confused in 89 instances. Finally, errors involving swapping between Sadness and Anger (43), Sadness and Disgust (42), and Sadness and Fear (49), appeared at similar rates. The remaining emotions were only substituted for one another rarely or not at all.
Number of errors confusing one emotion for another across tasks.
English proficiency
The LexTALE scores of lexical knowledge in English had a slight negative skew, which can be seen on the histogram in Figure 3. Unsurprisingly, this was due to the task being relatively easy for our participants. Since the ad specifically invited proficient speakers, this distribution was to be expected.

A histogram of LexTALE scores.
In addition to LexTALE, we also included indirect measures of L2 English proficiency. The first question asked participants to estimate their proficiency on a 1–6 scale. Figure 4 presents the distribution of responses to that question, showing that the majority of the participants rated their ability as Very Good or above.

A histogram of responses to the question ‘How would you rate your English proficiency?’, on a scale of: 1, limited; 2, modest; 3, good; 4, very good; 5, fluent; 6, near-native.
This corresponded to the reported standardised test results. Out of the participants who had completed such tests, the great majority passed them at either Advanced (21.9%) or Proficient level (28.1%). Figure 5 presents the distribution of responses to this question.

A histogram of reported standardised test results.
Furthermore, we asked the participants about their age of acquisition and the longest time they had spent living in an English-speaking country. The vast majority of the participants reported learning English before the age of 12 (91.8%), although only 23.4% spent more than 5 years in an English-speaking country, while 26.2% never visited an English-speaking country for more than a month. Nonetheless, 84% of the participants indicated using English multiple times per day in their daily lives, or using English as their main language. Again, this was to be expected from a sample recruited on the internet, via an English-language website.
Trait emotional intelligence
Trait Emotional Intelligence was measured using TEIQue-SF. The scores were mostly normally distributed. A Pearson’s correlation revealed a relationship with age (r = .183, p < .01, two-tailed), suggesting that Emotional Intelligence increased with age, which was a small effect size. There was no significant difference in scores between men (M = 4.57, SD = .69) and women (M = 4.56, SD = .75). These means were comparable (albeit slightly lower) to the population averages reported by Petrides (2009) for men (M = 4.95, SD = .61) and women (M = 4.82, SD = .57).
Correlations
Given that the parametric assumptions were not met in the data, we conducted Spearman’s nonparametric correlation to investigate the relationship between the variables. The results are presented in Table 8. As expected, the two dependent variables were significantly correlated (r = .141, p < .05), suggesting that performance on MCQ and Free-Labelling task was related, although the effect size was small, according to Cohen’s criteria (r < 0.3). Furthermore, the MCQ score (but not the FL score) was positively correlated with LexTALE (r = .120, p < 0.05), which was also a small effect size.
Spearman’s correlations between all variables.
Correlation is significant at the 0.05 level (two-tailed).
Multiple regression
To further explore the relative contribution of each of the two independent variables to task performance, multiple linear regression was performed in two models, with LexTale and TEIQue scores as predictor variables and MCQ score as outcome in the first model, and FL score in the second model.
The MCQ model was significant (RSq = .31, F(2, 253) = 4.04, p < .05) and according to Cohen’s recommendations for effect sizes in multiple regression, this was a large effect size (> .25, Cohen et al., 2003). However, as predicted from correlation analysis, only LexTale score was a significant predictor in the model, as illustrated in Table 9.
Beta coefficients with 95% confidence intervals, and tests of significance of the two predictors and MCQ FER performance.
CI: confidence interval.
As predicted from the previous correlational analysis, the FL model was not significant (RSq = .002, F(2, 253) = .248, p > .05).
Effects of other measures of L2 proficiency
Since LexTALE was a significant predictor of MCQ performance, we wanted to investigate whether indirect, self-reported measures of L2 English proficiency would also show an effect. For ease of reference, we repeat the self-report questions below:
How would you rate your English proficiency? (Prof)
What is your most recent test result? (Test)
How old were you when you started learning English? (AoA)
What is the longest period of time you have spent in an English-speaking country? (Exp1)
How often do you use English in your daily life? (Exp2)
Table 10 summarises Spearman’s correlations between direct and indirect measures of proficiency.
Spearman’s correlations across all measures of English proficiency.
The columns with values related to the FER tasks highlighted in grey for ease of reference.
Correlation is significant at the 0.05 level (two-tailed).
Correlation is significant at the 0.01 level (two-tailed).
As might be expected, all but one indirect L2 proficiency measure correlated significantly with each other. The only exception was AoA, which was not related to any of the other measures. In addition, all indirect proficiency measures except AoA correlated significantly with LexTale, however, none of them appeared to be related to the performance on either of the FER tasks.
Effects of L1 alphabet
Given that LexTALE appeared to be unique among the L2 proficiency measures in its relationship with FER, we conducted a post hoc group comparison to investigate the possibility that the impact of LexTALE scores is related to the fact that it relies heavily of reading ability (unlike any of the indirect measures of proficiency). We divided the participants into those whose native language uses the Latin alphabet (N = 161) and those whose language does not (N = 95). This was meant as a shorthand estimate of reading ability, as those whose native language uses a different alphabet than English might be expected to perform worse on a test such as LexTALE, which includes nonwords with minimal distance from real words (e.g., ‘alberation’, ‘spaunch’). To examine the potential influence of native alphabet category, we conducted an independent samples t-test, comparing the MCQ performance of the two groups of participants. The descriptive statistics are presented in Table 11.
Descriptive statistics of task performance on MCQ FER and LexTALE across participants from the Latin-alphabet and non-Latin alphabet group.
There was a significant difference between the groups on the MCQ task (t(254) = 2.152, p(two-sided) = .032), which was a small effect size (Cohen’s d = .278, 95% CI [.023, .533]). There was also a significant difference between the groups on the LexTALE scores (t[ p(two-sided)]< .001), and this was a large effect size (Cohen’s d = .894, 95% CI [−.126, .381]).
There was no statistically significant interaction between LexTALE and native alphabet (p = .880). In both language groups those with higher LexTALE scores performed better at the MCQ task.
Discussion
This study set out to investigate the relationship between second-language proficiency, FER, and trait emotional intelligence in unbalanced bilinguals. Specifically, we wanted to examine the interactions between (1) FER and second-language proficiency, (2) FER and trait Emotional Intelligence, and (3) FER in MCQ versus free-labelling tasks.
Is FER performance related to L2 proficiency?
We hypothesised that if emotion perception is shaped by language, then we would find a correlation between individuals’ second-language proficiency and their emotion recognition ability. We collected indirect measures of proficiency, as well as a conducted a direct test of lexical knowledge (LexTALE). While all of the L2 proficiency measures were correlated (as expected), it was only the LexTALE score which appeared to have an effect on facial recognition ability. In particular, LexTALE performance was positively correlated with accuracy on the forced-choice MCQ task, but not on the FL task. Therefore, although multiple regression analysis did suggest a relationship between FER performance and L2 proficiency, it was only one specific measure of proficiency (LexTALE), which appeared to correlate with one specific measure of FER (MCQ).
One possible explanation for the apparently unique nature of LexTALE as compared to other measures of L2 English proficiency comes from a post hoc examination of group differences between speakers of different native alphabets in our study. Bilinguals whose first language used the Latin script significantly outperformed their non-Latin-script counterparts on both LexTALE and – to a lesser extent – the forced-choice FER task. Indeed, it is known that LexTALE relies heavily on reading ability, which is unsurprising given the nature of the test. Lemhofer and Broersma (2012) report that LexTALE scores were most highly correlated with other measures of reading ability, and less so with other language skills (e.g., speaking). Interestingly, they also report a difference in performance between their Dutch and Korean participants, such that the Korean participants’ LexTALE scores had lower correlation with other proficiency tests than did those of the Dutch participants. Since Korean uses non-Latin script, this resembles the pattern of our results, suggesting that LexTALE reflects specifically reading ability.
However, this still did not explain the unique impact of LexTALE scores on MCQ FER rates in our data, as both Latin- and non-Latin alphabet groups showed this effect. While both tasks relied on reading ability, the significance of reading skills in the MCQ FER task is unlikely given the distinctiveness of the six emotion labels. While LexTALE nonwords often differ from real words by a single letter (and therefore require precise reading), the six emotion labels are all very different from one another. Unless there is a specific deficit in visual processing, slight differences in reading proficiency are not expected to affect differentiating between written words as different as, for example, ‘Anger’ and ‘Disgust’ (one of the most common errors) in advanced learners of English, especially when accuracy (rather than, for example, response time) is concerned. In order for reading proficiency to directly affect a forced-choice emotion label task, the participants’ command of English would need to be quite restricted, as the six basic emotion words are high-frequency basic lexical items. While the non-Latin participants did report lower overall proficiency, over 90% of them still reported at least ‘good’ command of English. In addition, if it was the case that some participants had such low proficiency in English that they were unable to select correct emotion labels, this would be expected to affect both tasks similarly – or perhaps be even more evident in the free-labelling task. Instead, no significant difference between the groups was observed on the free-labelling FER task.
Perhaps a more plausible interpretation is that LexTALE simply served as a proxy for cultural distance between the groups, since Latin versus non-Latin script languages can be partially mapped onto the broadly-defined West-East cultural divide. Although the mapping is, of course, imperfect, as some European languages use non-Latin script (e.g., Greek), and vice-versa (e.g., Tagalog), it does nonetheless reflect a general cultural difference. It is possible that these cultural differences could affect emotion perception, as reported in previous research (Russell, 1994). If so, then the relationship between LexTALE and MCQ FER task could, in fact, be an artefact of analysis, with both variables reflecting an effect of cultural differences.
Is the relationship moderated by individual differences in emotional intelligence?
We wanted to determine whether there was an effect of trait emotional intelligence on the FER tasks, and ensure that any language proficiency effect was not in fact moderated by any changes in emotional intelligence, which may be linked to bilingualism. However, the only measure we collected which seemed significantly related to trait emotional intelligence was age. We found no effect of trait emotional intelligence on FER performance, and it also was not related to any measure of language proficiency.
This result contrasts with the recent findings by Alqarni and Dewaele (2020), who reported a significant correlation between performance on a forced-choice emotion recognition task and trait emotional intelligence in bilinguals (but not in monolinguals). While they used the same measure of trait emotional intelligence (TEIQue), their emotion stimuli were dynamic rather than static (i.e., video rather than photographs), and included additional context. It is possible that the role of trait emotional intelligence can only be revealed in a more complex emotion perception task. Given that our stimuli were static photographs, it is possible that the task involved simple category matching which did not tap into emotional intelligence.
Another possibility is that the difference was due to Alqarni and Dewaele’s bilingual sample, which also had higher TEIQue than their monolingual participants. However, it is worth noting that the TEIQue scores obtained in our study were similar to Alqarni & Dewaele’s monolingual scores, and as such were lower than the population averages reported in the normative sample (Petrides, 2009). Considering that Dewaele (2021) also failed to detect a relationship between L2 proficiency and trait EI in three multilingual samples, it is likely that any bilingual advantage in the trait is an effect of sampling.
Is the relationship consistent across tasks?
We speculated that any potential within-subject differences in task performance could offer an insight into the nature of the link between second-language proficiency and emotion recognition. Specifically, if higher language proficiency affects emotion recognition due to increased granularity of emotion concepts, then we would expect to see a difference in performance between a task that rewards fewer, rigid categories (forced choice), and a task that allows for nuanced responses (free labelling). Indeed, there was a difference between the two FER tasks, such that (a) participants performed significantly better on the MCQ task, and (b) only the MCQ task, but not the FL task appeared to be related to the LexTALE score.
Nonetheless, qualitatively, the tasks appeared to be comparable, in that performance was correlated across tasks within subjects, and error patterns were also similar in both. Happiness was consistently recognised with high accuracy, which is a common finding across studies (Russell, 1994). It is not unlikely that this is due to the fact that it is often the only emotion with positive valence – as was the case in our study. Since the difference between positive and negative valence is the most basic emotional distinction, it is expected that a single happy expression will be easily distinguishable from five negative ones. Conversely, Fear was the lowest-recognised of all the emotions in both tasks. The low hit rate confirms the findings reported by other research using stimuli from the same database, where Fear was the only emotion recognised with below 50% accuracy (Goeleven et al., 2008). As is common in FER research, Fear was overwhelmingly mis-categorised as Surprise (cf. Russell, 1994).
While Happiness, Sadness, and Surprise had similar, high recognition rates across tasks, Anger and Disgust were recognised with high accuracy in the MCQ task but significantly less so in the free-labelling task. In general, data from multiple studies suggest that Fear, Anger, and Disgust are the least recognisable and show the highest variation across cultures (Russell, 1994). This is further confirmed by our data, where these three emotions showed the lowest recognition rates in the free-labelling task. Our findings also suggest that this may be due to the fact that they can be confused for one another, as Anger and Disgust were the most commonly confused emotions in both tasks.
Therefore, the general patterns of FER across tasks did not appear to be due to across-subject differences, but rather reflected commonly identified patterns in emotion perception research. While performance on the MCQ task was related to the LexTALE scores, it is likely that this did not reflect a genuine causal relationship. We also failed to detect any low-proficiency advantage in the Free-Label task, which could be due to higher rigidity of emotion categories, in that the scores were not related to proficiency, regardless of whether only exact matches or synonymous responses were counted as correct. It is possible that such differences could be seen at lower levels of proficiency and therefore would require a more heterogeneous sample, while almost all of our participants had at least good command of English.
General discussion and conclusion
Data from our diverse, multicultural sample of unbalanced bilinguals with different L1’s and varied levels of L2 English proficiency did not lend support to the hypothesis that higher language proficiency per se leads to improved FER and/or an increase in trait emotional intelligence. While we did find an effect of proficiency in L2 lexical decision task on forced-choice FER task, the nature of that relationship is unclear, and it may be confounded by the fact that reading proficiency is related to native script reading, which in turn tends to overlap with cultural distance.
This complex pattern of results coming from a complex population highlights the challenges of evaluating the relationship between two multimodal skills, such as language and emotion recognition. The long and controversial history of FER research has made it clear that different tasks and stimuli can produce dramatically different recognition rates, and language scores in different domains are also known to vary – sometimes widely – within learners. Since both skills involve the integration of information from different sources (visual, auditory, contextual), and therefore both can be measured using different tasks, it is quite possible that many of the facilitatory or inhibitory effects reported in previous research reflect task demands rather than an interaction between the two skills as a whole. Such interpretation seems particularly likely given the high disparity of results across studies.
Perhaps this complexity is one reason why the constructivist claim that language shapes emotion perception still lacks direct experimental support. On the one hand, the hypothesis cannot be evaluated unless the specific components of language proficiency and FER which are expected to be linked are clearly and precisely defined. On the other hand, once such definitions are in place, there needs to be evidence that there is not already overlap in the skills they require (e.g., visual processing, emotional acculturation), which could explain the transfer. Finally, the exact nature of the mechanism must also be specified and measured. For example, while it is intuitively plausible that increased second-language proficiency could affect emotion perception through increasing the granularity of emotion categories, our data from the Free-Label task failed to support the notion that this increased category flexibility leads to different performance in a FER task. Similarly, while it is also plausible that the effects could be moderated by Emotional Intelligence, when compared across studies, it is clear that trait EI is also quite sensitive to sampling differences, and its effect was absent in our data.
Our study had clear limitations, in that it failed to control for differences in cultural and linguistic backgrounds, which were later revealed as potentially important through post hoc analyses. Future research should continue to investigate diverse bilingual samples, while utilising measures of emotional acculturation, so that the effects of language and culture can be disentangled. Furthermore, in order not to overload our participants, our study was limited in the number of FER trials. Future research should compare a wider range of tasks and stimuli. Finally, even though our sample was varied in terms of English proficiency, it is possible that the overall levels were too high to reveal true differences in performance across different proficiency levels. In the future, more lower-proficiency participants could be included to address this issue.
In conclusion, while the current study did not find evidence to support the initial hypotheses about the relationship between language proficiency, trait emotional intelligence, and FER, it revealed a complex array of factors which may contribute to understanding why previous research has often found mixed results. Our results illustrate that relying on homogeneous samples, as well as across-subject design may significantly overestimate any potential effects of language proficiency on FER. As with other research on potential cognitive effects of bilingualism, this is a reminder that bilinguals form a majority of the world’s population, and are therefore a tremendously diverse group in terms of culture, education, and language context, all of which are likely to further affect emotional competence and experience. It is hoped that future studies will elaborate on the precise nature, structure, and interaction of these factors in emotion recognition.
Footnotes
Appendix 1
Items in LexTALE and correct response (y/n)
platery (n), denial (y), generic (y), mensible (n), scornful (y), stoutly (y), ablaze (y), kermshaw (n), moonlit (y), lofty (y), hurricane (y), flaw (y), alberation (n), unkempt (y), breeding (y), festivity (y), screech (y), savoury (y), plaudate (n), shin (y), fluid (y), spaunch (n), allied (y), slain (y), recipient (y), exprate (n), eloquence (y), cleanliness (y), dispatch (y), rebondicate (n), ingenious (y), bewitch (y), skave (n), plaintively (y), kilp (n), interfate (n), hasty (y), lengthy (y), fray (y), crumper (n), upkeep (y), majestic (y), magrity (n), nourishment (y), abergy (n), proom (n), turmoil (y), carbohydrate (y), scholar (y), turtle (y), fellick (n), destription (n), cylinder (y), censorship (y), celestial (y), rascal (y), purrage (n), pulsh (n), muddy (y), quirty (n), pudour (n), listless (y), wrought (y).
Appendix 2
Self-report English proficiency questions.
Appendix 3. Labels obtained in FL FER task judged as incompatible with any emotion category by native speakers.
| adsa | don’t know | pain | unpleasant |
| agree | don’t mind about sth | pensive | unsatisfied |
| attentiveness | doubt | petulance | was |
| befuddled | doubtful | Please can we get this passport picture taking over with already | weirdness |
| betrayed | embarrassed | powerlessness | wronged |
| bored | embarrassment | resentment | yawning |
| boredom | emotion | sadly amazed | anguish |
| concentration | fake happy | scepticism | awe |
| confused | fake smile | scream | awed |
| confusing | feat | selfish | concern |
| confusion | focus | serious | desperation |
| constipation | focusing | skepticism | disapproval |
| curious | frustrated | sprays | disapproving |
| day | frustration | stern | distraught |
| dear | grim | strange | distress |
| demonserate | grumpiness | stress | hate |
| determination | grumpy | stressed | nostalgic |
| determined | guilt | stubbornness | scandalised |
| disagreement | hanger | supposed | wonder |
| discomfort | heat | suspicion | worried |
| discomfrot | hope | suspiciously | worry |
| discuss | horrible | tense | wow |
| disguise | interrogation | tension | |
| disguised | kind | thinking | |
| dismissal | mean | uncertainty | |
| disturbed | neutral | uncomfortable |
Acknowledgements
The author would like to thank Andriy Myachykov, whose tips were invaluable in designing this project, as well as Lynne Duncan, who offered support and inspiration during the writing up stage. Thanks are also due to the many anonymous participants who freely donated their precious Internet surfing time to completing the tasks.
Ethical considerations
This study and its protocol have received full ethical approval from Northumbria University College of Reviewers. Respondents gave written consent before starting interviews.
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Data availability statement
Full data set is available from the author on request.
