Abstract
The present study investigated whether adult learners of second language (L2) can automatically activate emotional connotation during emotional word recognition as compared native (L1) users and whether L2 use plays a significant role in it. The automaticity of activation was measured through the emotional Stroop task. In this task, emotional words and neutral words were displayed in two different colors, and the participants were asked to indicate the color by button press. Results showed a delay in L2 learners’ response to emotional words (the emotional Stroop effect) without significant differences from L1 users’ response, indicating comparable automaticity in activating emotional connotation in performing the task. Further analyses on the effect of L2 use revealed its significant role in increasing the emotional Stroop effect. Specifically, L2 learners with higher amount of L2 use in daily life produced a significant emotion Stroop effect comparable to L1 users, while L2 learners with lower L2 use did not. We discuss the importance of L2 use in actual context in automatic processing of L2 emotional words, especially among adult learners who began L2 learning in adulthood in a case of underrepresented languages as L2.
I Introduction
People use emotion words to describe (she is sad) or express (I feel bad) what they experience, and automatically associate emotion-laden words (e.g. breakup) with certain emotions. Automatic activation of emotional word meanings is essential in human communication. 1 As individual communication of emotion in diverse languages as an additional language (second language; L2) 2 increases in the era of globalization, whether L2 users are able to understand emotional words as automatically as first language (L1) users is drawing much attention.
This issue has been actively investigated via behavioral (e.g. Anooshian and Hertel, 1994), physiological (e.g. Harris et al., 2003), and electrophysiological (e.g. Conrad et al., 2011) responses particularly in the past two decades. However, the comparison of emotional word processing in L1 and L2 has produced conflicting results (for reviews, see Caldwell-Harris, 2014; Pavlenko, 2008). For instance, in terms of bilinguals’ autonomic reactivity to emotional words, Harris and colleagues’ study (2003) measured skin conductance responses from proficient Turkish–English bilinguals while listening to emotional expressions in L1 and L2. They found larger skin conductance amplitudes in L1 than in L2. This finding supports the commonly held notion that explicitly knowing the emotional word meaning in L2 does not necessarily result in sensing it as intensively as in L1 despite high L2 proficiency. Similarly, stronger pupillary responses to emotional stimuli appeared in L1 than in L2 during word recognition (Toivo and Scheepers, 2019) and sentence reading (Iacozza et al., 2017).
On the other hand, other studies reported comparable emotional effects in L1 and L2. For example, Ayçiçegi-Dinn and Caldwell-Harris (2009) examined highly proficient L1-Turkish–L2-English speakers on a recall task and reported that their memory of emotion words was superior to that of neutral words both in L1 and L2 with a comparable magnitude of effects.
The same inconsistency also exists in L2 studies that employed the emotional Stroop effect, the method used to examine automatic activation of word meanings for many decades (for review, see MacLeod, 1991). Stroop paradigms are based on the conflict of two dimensions: word meaning activation and color judgment, and participants are asked to name or respond to the color of a stimulus word while ignoring its meaning. ‘More automatic processing interferes with less automatic processing, but not vice versa’ (MacLeod, 1991: 189). In an emotional Stroop task, when emotional word meanings are activated more automatically, it slows down color judgement, which is referred to as the emotional Stroop effect. This effect is believed to reflect automatic and early lexical processing; the underlying mechanisms are related to a generic interrupt system that acts early and in an automatic fashion when the stimuli carrying information vital to survival (e.g. threatening, negative emotions) or reproduction (e.g. infant, sex) prevent the disengagement of attention and thus delay task-relevant processing, color naming in this case (Algom et al., 2004; Fox et al., 2001; Larsen et al., 2008; Quan et al., 2020). Therefore, the more automatically and strongly the word meanings are activated, the harder it becomes to suppress them, resulting in more interferences. In this sense, the comparison of emotional Stroop effects between L1 and L2 users can provide more direct clues regarding how automatically semantic information of emotional words is activated between L1 and L2 processing. Whereas, in studies using tasks such as rating stimuli for pleasantness (e.g. Harris et al., 2003), the linguistic material is likely to be processed to a greater extent at a conscious level (Eilola et al., 2007).
While the emotional Stroop effect consistently persists in L1 processing throughout the previous studies, it does not in L2 processing. For example, slower responses elicited by negative emotion words were significant in L1 but not in L2 in Winskel’s study (2013), while such patterns appeared in a reverse way (i.e. significant interference in L2 not in L1) in Sutton et al.’s study (2007). In other studies, however, the bilingual speakers produced a comparable emotional Stroop effect in their L1 and L2 (Eilola and Havelka, 2010; Eilola et al., 2007).
Multiple factors may have contributed to these contradictory findings. One of them is the participant language profile, particularly the extent of L2 use in an actual context. For example, more reduced emotional Stroop effects in L2 than L1 usually came from the participants learning L2 as a foreign language in a classroom setting (e.g. Winskel, 2013), while a comparable emotional Stroop effect between L1 and L2 was found among those using their L2 extensively in daily life. The participants in Eilola and Havelka’s study (2010) resided in the L2 environment and reported using L2 English more often than L1 Greek in daily life. Likewise, the participants in Eilola and colleagues’ (2007) study residing in the L1 environment reported using their L2 English daily, e.g. listening to music (88.2%), watching TV (64.7%), and browsing web pages (67.6%). Taken together, whether or not a bilingual speaker resides in L2-speaking countries, the automatic activation of emotional word meanings seems mediated by how often they use the L2, particularly in meaningful communication outside of classroom. Such a significant role of L2 use in emotionality growth of L2 was found in previous studies using other paradigms (e.g. Degner et al., 2012; Puntoni et al., 2009; Tenderini et al., 2022). Yet, more detailed estimations of L2 use beyond frequency or the length of residence in L2 environments are needed for real use (Pavlenko, 2017; Tenderini et al., 2022).
To this end, the present study proceeded to examine the role of L2 use, which was calculated in hours, in the emotional Stroop effect among adult learners of L2 Korean, a population that had not been examined in research on this topic. Our two research questions with hypotheses are as follows:
Research question 1: Are adult learners able to produce emotional Stroop effect in L2 Korean comparably with L1 Korean users?
Hypothesis 1: They are expected to produce an emotional Stroop effect similar in magnitude to that of L1 users based on previous findings obtained with the same paradigm among younger L2 starters (e.g. Eilola et al., 2007).
Research question 2: Does L2 Korean use influence the magnitude of the emotional Stroop effect in adult learners?
Hypothesis 2: The magnitude of the emotional Stroop effect produced by adult L2 learners is expected to change as a function of the amount of L2 use such that adult L2 learners with more L2 use would show a stronger emotional Stroop effect than those with less L2 use.
II Methods
1 Participants
First, our power analysis for priori sample size estimation was conducted using the pwr package in R (R Development Core Team, 2013). The minimum number of participants required for our multiple linear regression models with up to 6 coefficients apart from the intercept (accounting for participant group, word condition, and L2 use predictors along with all possible interaction among coefficients) with 95% statistical power, an alpha level of .05, and a medium-size model f2 = .4 was approximately 52. Thus, we had 60 participants (female = 47) for this study, who passed our L2 Korean proficiency test (Lee-Ellis, 2009) out of 94 initially recruited in Korea and US. 3 They completed a language background questionnaire adapted from Lee-Ellis (2012). Their native and dominant languages were Chinese (n = 32), English (n = 23), French (n = 1), German (n = 1), Japanese (n = 1), Mongolian (n = 1), and Thai (n = 1), while their second languages were Chinese (n = 2), English (n = 35), French (n = 1), and Spanish (n = 22) in addition to Korean as they indicated. Importantly, they reported beginning learning Korean at age 18 years or older (range: 18–30 years) and were not heritage speakers. Therefore, they were all sequential, unbalanced late multilinguals with Korean as a weaker language compared to L1s.
Given the purpose of the study, their L2 Korean use outside of classroom was measured in as detailed a way as possible through a questionnaire in a format of Excel. Participants provided an estimate of (1) weekly hours of spontaneous L2 use in the manner of both exposure and expression for each of the various daily situations such as online chatting, face-to-face interaction, watching television or movies, reading for pleasure and (2) the length of such rate of L2 use (how many months). The total amount of cumulative L2 use was then calculated for each participant as shown in Figure 1. The cumulative amount of L2 use outside of classroom varied from 116 to 24,848 hours with a mean of 4,168 hours (SD 5,043.8) from the time of beginning studying Korean to the time of testing. Table 1 summarizes the L2 Korean users’ language backgrounds.

Example of the estimation of second language (L2) use.
Second language (L2) Korean learners’ language backgrounds.
Note. Accuracy on Korean C-Test ranges from 60% to 94%.
In addition, 36 participants (female = 16) whose native and dominant language was Korean served as the control group. Their age at testing was 30.2 on average. All had normal or corrected-to-normal vision. They gave informed consent before the study and were paid upon their completion.
2 Materials
Fifty emotion and 50 neutral words were first selected mainly from Korean textbooks (e.g. Cho et al., 2000a, 2000b). They were given to five Korean instructors to rate their difficulty, familiarity, and valence using a Likert scale (e.g. 1 = for beginners to 5 = for natives, for difficulty). Then, 40 words in each category rated as 1 or 2 (for advanced beginners) were used in a pilot test with 34 L1 Korean users. A total of 30 Korean negative emotional words (e.g. 싫어하다 dislike) and 30 neutral words (e.g. 마시다 drink), which most reliably produced the emotional Stroop effect in the pilot study, were chosen as the final stimuli (Appendix 1). All words in the emotional and neutral conditions were matched for word frequency, word length (all ps > .10), part of speech. 4 In the emotional Stroop task, these 60 words appeared in each of the two colors (blue and green), resulting in a total of 120 trials. 5
3 Procedure
Participants were tested individually. They were instructed to press a blue-colored button (the right shift key on the keyboard) with their right index finger if the word appeared in blue and the green-colored button (the left shift key) with their left index finger if the word appeared in green as quickly and accurately as possible.
A given experimental trial proceeded as follows: a fixation mark (*) appeared for 1,000 ms in the center of the screen at the beginning of each trial, followed by a word. The word remained on the screen until a participant’s response or a timeout in 2,000 ms (Figure 2). Participants first completed 10 trials in the practice session. Once participants reached 80% accuracy in the practice session, they moved to the main test session. All passed the practice session at once.

The stimulus sequence within a trial of the emotional Stroop task.
The emotional and neutral words were presented in separate blocks in order to prevent the confounding lingering effects caused by emotional words on neutral words in a mixed-block design (e.g. Sutton et al., 2007). 6 The order of word blocks was counterbalanced across participants. Thus, half of the participants viewed the emotional word block first then the neutral word block, whereas half did the task in the reverse order. Within a block, each word appeared once in each of two colors on a white background in random order. DMDX (Forster and Forster, 2003) was used for the display of stimuli and the collection of data.
After the experiment, all participants indicated the words whose meanings they did not know on the list of words presented in the task. Data for unknown words were removed individually from analyses. All of the participants were unaware of the purpose of this experiment.
III Results
A total of 96 participants (36 L1 and 60 L2 Korean users) responded to 120 trials in the emotional Stroop task, yielding a total of 11,520 data points. The data were then cleaned by removing outliers (based on the 2 SD criterion) and data for incorrect responses or unknown words. This accounted for a total data loss of 10%. Table 2 displays the means with standard deviations (SDs) of the reaction time (RT) data and the accuracy rate in each condition and group.
Mean reaction time (RT) with standard deviation (SD) in parentheses and accuracy rates for the neutral and emotional conditions for the first language (L1) and second language (L2) groups.
Note. * significance at α = .05.
In data analysis, we focused on the RT data, as accuracy rates were all very high in all conditions (> 98%). Reaction times were log-transformed (base 10) to approximate normality (Baayen, 2008), then mixed effects regressions were run with the lme4 package in R (R development core team, 2013). Several analyses were done, first to compare L1 and L2 groups, then to examine L2 use variable within L2 group and, lastly, to compare low and high L2 use groups with L1 group in terms of emotional Stroop effect. In all cases, we ran and compared two models in which a by-participant random slope for condition was either included or not. 7 The inclusion of a by-participant random slope for condition significantly improved the model fit, so the results reported below are based on the best fit models. The analysis results can be found in Table 3 (a |t| value of 2.0 or higher indicates a significant effect at α = .05 level).
Results of regression analyses of the participants’ reaction time (RT) data.
Note. * significance at α = .05. Both a Akaike Information Criterion (AIC) and b Bayesian Information Criterion (BIC) are the criteria for the model fit; the smaller these are, the better the models fit. For more information about AIC and BIC, refer to Field et al., 2012.
1 Comparison of L1 and L2 Groups
First, we ran mixed-effects models on L1 and L2 groups’ response times to determine any group difference in terms of emotional Stroop effects. The omnibus model, Model 1, had fixed-effects of Word condition (a dummy variable with reference level ‘neutral condition’), Group (a dummy variable with reference level ‘L1 group’), and interaction effect between them. The random effects consisted of by-participant and by-item intercepts. 8 The best fit model [x2(2) = 141.06, p < .001] showed a significant main effect of word condition [t = 2.2]. The participants responded to emotion words more slowly than neutral words. However, there was no main effect of group or a significant interaction of the two variables [|t|s < 1.2] seen in Part 1 of Table 3.
2 The role of L2 use in emotional Stroop effects
In order to determine whether the magnitude of the emotional Stroop effect was influenced by L2 use, we further examined L2 group alone. To see this predictor’s contribution to the emotional Stroop effect, Model 2 had fixed effects of Word condition (a dummy variable with reference level ‘neutral condition’), L2 use (a continuous variable; centered), and their interaction effect and random effects consisting of by-participant and by-item intercepts. 9 Results indicated that the L2 use and word condition interaction effect was positively significant [t = 2.7] (Part 2 of Table 3).
3 High and low L2 use groups compared to L1 group
To confirm the role of L2 use in the emotional Stroop effect, we identified two groups of L2 participants that significantly differed in the amount of L2 use. The high group included 15 participants who belonged to the top 25% of the participants in L2 use, and the low group was 15 participants at the bottom 25%. To compare the high and low L2 use groups to the L1 group in terms of emotional Stroop effect, we ran Model 3 with fixed effects of Word condition (a dummy variable with reference level ‘neutral condition’), Group (a dummy variable, with reference level ‘L1 group’), and their interaction, and with random effects of by-participant and by-item intercepts. 10 The high group showed a significant emotional Stroop effect of 21 ms with no significant difference from that of the L1 group [t = 0.7]. In contrast, the low group responded to the emotion words 12 ms faster than neutral words, and the difference of 12 ms was not significant, resulting in a significant difference from that of the L1 group [t = −2.4]. In sum, the low L2 group’s reversed pattern on an emotional Stroop task was significantly different from that of L1 group [t = −2.4]. Such distinctive patterns of high and low L2 use groups as compared to that of the L1 group are visualized in Figure 3.

The emotional Stroop effect among high and low second language (L2) use groups compared to the first language (L1) group.
IV Discussion
This study examined to what extent adult learners, who started learning L2 during adulthood, automatically activate L2 emotional word meanings as compared to L1 users via an emotional Stroop paradigm. In addition, the role of L2 use on the emotional Stroop effect in L2 was examined. Consistent with our hypothesis, the results, first, showed that the emotional Stroop effect was not significantly different between adult L2 learners as a whole group and L1 users. Second, adult learners with more L2 use in daily life showed an emotional Stroop effect similar to that of L1 users while those with less L2 use did not. This finding supports those of the previous studies (Eilola and Havelka, 2010; Eilola et al., 2007), which showed that bilinguals appear to access the semantics of emotional words fast and automatically even when they have started learning the second language after early childhood. The present study further extends this case even to adult multilingual learners’ acquisition of L2 emotional words whose conceptual links are allegedly even harder to become strengthened than those of young L2 starters. It is known that early learned language which develops with the emotion regulation system can be more tightly attached to emotions than those learned later in life (Bloom and Beckwith, 1989). In this sense, our findings suggest a promising indication that even adult L2 learners may also be able to understand emotional word meaning in L2, as automatically as L1 users do. These are also supported by electrophysiological evidence showing similar magnitudes caused by emotional words in L1 and L2 by late bilinguals despite the later peaks in L2 than in L1, as reported by Conrad et al. (2011) and Opitz and Degner (2012). They interpreted that processing emotional words in L2 may be qualitatively equivalent with some delay to that in L1 due to the processing cost.
To determine the role of L2 use in developing automaticity in L2 emotional word processing, we calculated accumulated amounts of L2 use in real-life contexts by itemizing L2 use in specific situations. The results showed that the high L2 use group was associated with L1-like patterns in processing emotional word meanings, whereas the low L2 use group was not. Although this should be interpreted with cautions as the calculation is a rough approximation solely based on a participant’s memory rather than verified data, this can be used as a reference for setting up a goal in learning a L2 for adult learners.
We are aware of the seemingly conflicting finding reported by Ponari et al. (2015). They showed that participants’ country of residence (L2 immersion or non-immersion) and frequency of L2 use did not affect their performance in emotional word processing. The discrepancy may have multiple causes. It can be attributed first to how L2 use was measured in these studies, a dichotomic distinction of immersion and nonimmersion contexts in Ponari et al. vs. an estimate of the number of hours of actual L2 use in real-life situations in the present study. Second, they reported the effect of the frequency of L2 on reaction times rather than on the magnitude of the emotional word effect. Finally, the processing of emotional words in the two studies was assessed differently, as a facilitative effect for emotion words in a lexical decision task vs. as an inhibitory effect in a color judgement task in the present study. How experimental tasks as well as other variables such as age of acquisition of L2 (e.g. older adults) may moderate the role of L2 use or emotional word processing is a worthwhile issue to be pursued in future research.
The findings of the study also raise the issue of whether the variable of L2 use is correlated with other factors such as L2 proficiency and length of residence (LoR) in L2-speaking country. As such, we also examined the group differences in terms of L2 proficiency and LoR between the high- and low-L2 use groups, and our explanatory analyses showed no significant group differences in these two. 11 Thus, at least as far as these participants were concerned, the amount of L2 use is not necessarily associated with L2 proficiency nor LoR in L2 speaking country. Rather, it can vary independently of these variables. Moreover, when we tested any possible effects of L2 proficiency or LoR on the emotional Stroop effect, none of their main effects nor the interaction effects with the emotional Stroop effect appeared significant (|t|s < 2.0). Therefore, such a significantly different performance on the emotional Stroop task between the high- and low-L2 use groups reported in this study should be attributed to L2 use rather than other confounded factors, such as L2 proficiency and LoR. 12 To our best knowledge this is the first study to demonstrate that the total number of hours of L2 use outside of classroom plays a crucial role in developing automaticity in adult L2 learners’ emotional word processing. This finding expands the importance of the linguistic environment (e.g., social pressure and opportunity to use language) on children’s emotion word acquisition found in Ahn and Chang (in press) to adult learners.
The lack of L2 proficiency effect 13 suggests that despite their advanced L2 proficiency, adult L2 learners may not establish strong connections between emotional words and their connotations as strongly as those of L1 users initially. However, with increased L2 use in actual contexts outside of the classroom, such connections become stronger, up to the magnitude of those in L1 users, at least, when assessed by the emotional Stroop effect. This finding corroborates with the previous studies suggesting the positive role of increased L2 exposure in emotional word processing through different paradigms (e.g. Degner et al., 2012; El-Dakhs and Altarriba, 2019; Iacozza et al., 2017; Sachs and Coley, 2006).
The importance of L2 use over L2 proficiency in automatic processing of L2 emotional words can be accounted for by the language-specific episodic trace theory of language emotionality. Its idea is that ‘words in a language are stored together with emotional content in episodic memory traces such that perceiving words in a language activates records of emotional experiences featuring those words in that language’ (Puntoni et al., 2009: 1019). That is, more frequent L2 use creates more episodic traces associated with L2 words in memory, strengthening the bond between emotional words and corresponding states of mind, which then can more automatically activate emotional contents related to the L2 words. That is confirmed by the empirical finding that emotionality ratings of L2 words could be enhanced by frequently associating those with emotional experience memories in L2 (Puntoni et al., 2009). In contrast, just knowing L2 emotional words cannot guarantee a strong bond between these word forms and an individual’s emotional state. When L2 emotional words are used in real-life, as opposed to instructional, contexts, they can become fostered and embodied enough to be automatically connected to their emotional states of mind (Pavlenko, 2012). Thus, the importance of L2 use in emotionality growth among L2 users, including adult learners of less commonly taught languages, should be highlighted for both education and research.
Footnotes
Appendix
Word stimuli for the emotional Stroop task.
| Emotional words | Neutral words | ||
|---|---|---|---|
| Korean | Translation | Korean | Translation |
| 걱정 | worry | 가족 | family |
| 교통사고 | car accident | 건물 | building |
| 무서움 | fear | 날씨 | weather |
| 불편 | inconvenience | 설거지* | dishwashing* |
| 불행 | unhappiness | 아침 | morning |
| 슬픔 | sadness | 요리 | cooking |
| 실망 | disappointment | 은행 | bank |
| 실수 | mistake | 음식점 | restaurant |
| 피로 | fatigue | 이야기 | story |
| 나쁘다 | to be bad | 주말 | weekend |
| 바쁘다 | to be busy | 가깝다 | to be close |
| 비싸다 | to be expensive | 가볍다 | to be light |
| 시끄럽다 | to be noisy | 같다 | to be the same |
| 심심하다 | to be bored | 깊다 | to be deep |
| 아프다 | to be sick | 다르다 | to be different |
| 어렵다 | to be difficult | 비슷하다 | to be similar |
| 외롭다 | to be lonely | 빠르다 | to be fast |
| 죄송하다 | to be sorry | 작다 | to be small |
| 지치다* | to be tired* | 조용하다 | to be quiet |
| 틀리다 | to be wrong | 걷다 | to walk |
| 힘들다 | to be tough | 공부하다 | to study |
| 거짓말하다 | to lie | 기다리다 | to wait |
| 때리다 | to hit | 노래하다 | to sing |
| 떨어지다 | to go down | 마시다 | to drink |
| 모르다 | to be unaware | 빨래하다 | to do laundry |
| 미워하다 | to hate | 인사하다 | to greet |
| 싫어하다 | to dislike | 여행하다 | to travel |
| 울다 | to cry | 운동하다 | to do exercise |
| 잃어버리다 | to lose | 운전하다 | to drive |
| 헤어지다 | to break up | 전화하다 | to make a phone call |
Note. * These two items were deleted from analyses, given that L2 speakers identified them as unknown in the post-vocabulary check list.
Acknowledgements
We thank Dr Caldwell-Harris for her insightful comments on the earlier versions. We also thank our editor and anonymous reviewers’ helpful comments throughout. With their help, we could substantially improve our typescript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Ann G. Wylie Dissertation Fellowship and the Second Language Acquisition program at the University of Maryland. The funding sources were not involved in any part of the study design, data collection and analysis, interpretation of data, the writing of the manuscript, or the decision to submit the manuscript for publication.
