Abstract
Aims and objectives:
When bilinguals use their weaker language, they tend to apply different rationalities and have less immediate emotional reactions (the so-called foreign-language effect). The aim of this study is to test whether bilingual politicians differ between statements they make in their first (L1) or in their second language (L2). More specifically, we hypothesize that using L2 might be associated with less polarizing political language.
Method:
We analyze a corpus of more than 2,000 interactional turns uttered by 10 Swiss politicians in national TV debates in French and in German. All politicians appear both in French and in German, but one of the two languages is their L1, respectively. The politicians’ statements are assessed regarding their polarizing potential by a large language model (LLM) and by human raters.
Data and analysis:
Descriptive and inferential statistical analyses are then carried out to unveil individual and collective patterns of difference across L1 and L2 in the data.
Findings:
The results suggest a robust foreign effect in the analyzed data. It appears that engaging in political debates in non-dominant languages coincides with more factual and less emotional and less polarizing language.
Originality:
This study is the first one to investigate the foreign effect in politics, the first one to analyze not lab-elicited but natural conversational data, and the first one to present an operationalization of the classification of political polarizing speech for the use with LLMs.
Implications:
The study suggests that a foreign language effect is not confined to lab settings but extends to socially relevant contexts such as authentic political debates. There is a moderate positive correspondence between human and LLM ratings, which implies that LLMs may cautiously be used to classify large corpora of texts.
Limitations:
Further research is necessary to test and increase the validity and reliability of LLM-based text classifications.
Introduction
We are currently witnessing a wave of autocratization and democratic backlashes in many countries worldwide (Nord et al., 2025). This crisis of democracy (Schedler, 2023) correlates with increasing societal and political polarization. Recent examples include the heated debates over vaccination during the COVID-19 pandemic, which in many countries put considerable strain on social and family ties. While diversity of viewpoints on social and political issues is the backbone of liberal democracy, democratic governance is endangered when healthy pluralism degenerates into deep ideological polarization and when opponents are habitually described in hostile or hateful terms. Democracy requires a collaborative mind-set that relies on civility (Bejan, 2017), consensus seeking, and a minimum of trust in institutions. The language used in political debates reflects and shapes the state of democracy.
In many countries, political debates within and outside the branches of government are monolingual, usually in the official language of the state or institution. But there are also contexts where politics is at least in part performed bilingually and multilingually in that some actors use languages other than their L1 both orally and in writing. This holds true for supra-national institutions such as the European Union (Ringe, 2022) or the United Nations, but also for multilingual countries such as for example Belgium, Luxembourg or Switzerland. In these institutions and countries, some speakers routinely make use of both their dominant language and other languages in their multilingual repertoire.
Psycholinguistic evidence (Keysar et al., 2012) suggests that L2 use may foster a foreign-language effect, that is, for instance, lower degrees of emotional involvement and different types of rationality tend to be applied when taking decisions in a second or foreign language (see Purpuri et al., 2024 for a recent overview). Based on these findings of an L2 effect and amid increasing polarization in traditionally consensus-democratic multilingual Switzerland, this paper explores possible effects of Swiss politicians’ L1 or L2 use on their way of speaking when debating as guests in popular political TV talk shows (Arena and InfraRouge, respectively).
The main goal of the paper is to test the hypothesis that L2 use has an influence on the degree of polarizing language produced by the politicians. The second goal is to put to the test a newly developed method that uses artificial intelligence (LLM) to analyze a large quantity of data.
Literature review
For the analyses discussed in this paper, we first draw on the literature on bilingualism/multilingualism and the cognitive effects of the use of the non-dominant language. This effect is commonly referred to as the foreign language effect (Purpuri et al., 2024). Even though all the individuals analyzed in this contribution are multilinguals in the sense that they regularly use more than two languages or dialects, we focus on their use of two specific languages, German and French. For each of the ten politicians whose utterances were analyzed in this study, either German or French is the dominant language (henceforth L1), while the other is a fluently used but non-dominant language (henceforth L2). Given that both languages are national and official languages of Switzerland, we prefer not to use the term foreign language but will stick to L2.
Foreign-language effect
There is ample research showing that participants who are asked to solve tasks in their L2 tend to behave, on average, differently than participants doing the tasks in L1. Keysar et al. (2012) have shown that L2 use diminishes the loss-aversion bias, Costa, Foucart, Arnon, et al. (2014) have shown that other heuristic biases related to risks and uncertainty are equally affected. In a seminal study, participants’ decisions in the face of ficticious moral dilemmas catering to the deontological («You shall not kill») vs. utilitarian (kill one person to save five people) rationalities have been shown to be affected by L2 use (Costa, Foucart, Hayakawa, et al., 2014): In L2, there is a tendency towards more utilitarian choices (see also Corey et al., 2017; Hayakawa et al., 2016; Maschio et al., 2022). Even though some studies do not find the same effects for specific variants of the tasks (cf. Muda et al., 2024 for variants of moral dilemmas that involve self-sacrifice), the evidence overall on the foreign language effect seems robust.
There are different explanations for the underlying mental mechanisms that yield the effect found in the L2 data. The most convincing explanation that is also in line with work on the strength of emotional responses and associations in L1 vs. L2/Lx (Dewaele et al., 2025) is that our embodied responses to words and propositions in an L2 are less immediate than to equivalent stimuli in L1. Processing input in L2 thus unavoidably puts higher demands on the cognitive system, which in turn entails that processing is slower but also more systematic (the literature also refers to Kahneman [2013] and the notion of fast vs. slow thinking), hence also less influenced by the notorious cognitive biases known from the much faster L1 processing.
While this evidence seems robust, it is interesting that the studies cited above overwhelmingly focus on reception and do not involve the analysis of L2 production. There are a few exceptions, for example, Kyriakou et al. (2023), who show that when justifying moral decisions, bilinguals are more emotional in their L1 compared to their L2. However, this and the two similar other studies (Kyriakou & Mavrou, 2023; Kyriakou et al., 2024) rely on specifically elicited data for the sake of the experiments. Our present study focuses on verbal production data that was not specifically elicited in a lab situation but consists of real-life data from bilinguals doing politics.
Exolingual mode
Verbal interaction among multilinguals does not only influence the specific constructs discussed above. When people with asymmetrical multilingual repertoires interact, even if the interaction takes place in only one language, they are in what de Pietro (1988) calls the exolingual mode. In this mode, experienced multilinguals automatically use interactional strategies of reciprocal adjustments (p. 71), they use non-verbal back-channelling strategies to make sure comprehension was achieved, they ask clarification questions, repeat and rephrase utterances, and translate words or larger chunks. Example 1 from the corpus analyzed in our contribution illustrates such behaviour: Here, the L1 German TV anchor and a L2 German (L1 French) member of the federal government interact in a German-language TV debate, and the anchor uses a colloquial expression in German (Wischiwaschi), which he also translates into French (c’est n’importe quoi) to make sure the interlocutor understands it.
: Mit Verlaub Frau Bundesrätin, das ist doch Wischiwaschi, c’est n’importe quoi
: Euh das habe ich verstanden auf Deutsch, Wischiwaschi ja ja. Nein, absolut nicht. Es ist die Verantwortung, und es ist auch klar, es bringt Transparenz. (Arena, SRF, 31.5.2024)
: With all due respect, Mrs. Federal Councilor, is that not wishy-washy, that’s nonsense
: Er I understood that in German, wishy-washy, yes, yes. No, absolutely not. It is the responsibility, and it is clear, it provides transparency
Provided mutual comprehension and cooperation are the participants’ shared goals, the interacting partners in the exolingual mode typically adjust their way of speaking to make sure comprehension is possible despite the different bilingual or multilingual dominance configurations in the repertoires.
The language of polarization in politics
Politics is done largely via the verbal canal. Also, the outcome of political activities are typically textual products (bills, executive orders, party programmes, state of the union speeches, etc.). In the face of increasing political polarization in different regions (McCoy et al., 2018; Pierson & Schickler, 2020; Wendler, 2016, for the Swiss context cf. Freiburghaus & Vatter, 2019; Wyss et al., 2015), scholars have attempted to capture the linguistic manifestations of political discourse that is potentially widening the gaps between political, social, ethnic or other fractions of modern societies. Based on their review of several case studies, Donohue and Hamilton (2022) propose a framework for the analysis of polarizing language. They mention, among other things, (mostly negative) emotional, strong language, provocative metaphorical framing (e.g., immigration as a war, invasion), the absence of nuanced argumentation and the appeal to group identity (and by consequence to inter-group differences).
Wyss et al. (2015) focus on the dimension of cognitive complexity (CC) in political discourse, that is, the quantity and quality of perspectives and problem dimensions considered, or ‘the degree of perception multidimensionality of a given problem’ (p. 641). They find a decrease in cognitive complexity in immigration debates associated with the rise of the Swiss People’s Party (SVP) in the 1990s, but resilience to this communication style in the language used by members of other parties, and higher degrees of CC in the smaller of the two chambers of the federal parliament (‘chambre de réflexion’).
Other studies that aim to uncover different political attitudes in increasingly polarized Western democracies have also looked at emotional language. Wicke and Bolognesi (2025) use various NLP tools, among which sentiment and subjectivity analyses, to reveal differences in the Trump-Harris TV debate during the 2024 presidential campaign. They also associate negative sentiment with more polarizing stances, as does the study by Bor et al. (2023). Synthesizing the studies from political science and linguistics discussed in this section (Bor et al., 2023; Donohue & Hamilton, 2022; Wicke & Bolognesi, 2025; Wyss et al., 2015), we propose an operationalization of politically polarizing language along five distinct but related dimensions commonly identified in the literature. Statements, speeches or other texts in political discourse may be analyzed regarding whether they are
factual vs. person-oriented
consensus-oriented and constructive vs. emphasizing disagreement
nuanced vs. advocating a single perspective
not appealing primarily to emotions vs. appealing to (mainly negative) emotions
looking for common ground vs. emphasizing group differences.
Evidently, in polarizing utterances or texts, we expect to find more of the elements on the right of each of the five dimensions (e.g., person-oriented, emphasizing group differences).
Politics and institutional multilingualism
If increasing political polarization is a generalized tendency and if polarization is both induced by and manifests itself in political verbal discourse, the question arises whether doing politics in a L2 might have an attenuating effect. As we discussed earlier, the literature on the foreign language effect (Keysar et al., 2012) suggests that people are less emotional in their L2 and apply different, slower, less gut-related rationalities when doing tasks in their L2.
As far as our knowledge of the literature goes, there is no empirical study on this question. However, we are aware of at least two authors who, drawing on the psycholinguistic evidence from the foreign language effect, argue that having political and/or social debates in what we have called exolingual mode, that is, involving participants using their L2s, could have an interesting effect on the quality of the debates. Ringe (2022) argues, based on interview and observational data gathered within the EU administration, that different actors think that the widespread use of English as a lingua franca leads to a higher quality of deliberation (p. 115), to less emotionally charged interactions. Moreover, if an actor indeed uses polarizing language, it seems that the potentially escalating effect is mitigated by what the author considers lenience that is partially due to multilingualism – the actors simply tend to «disregard politically charged language» (p. 114). The author also acknowledges that using English as an L2 comes at the price of fluency and elaboration: [M]ost people are unable to express themselves in a foreign language with the same competence, ease, and spontaneity as would be the case in their mother tongue. (p. 113)
In his discussion, Ringe (2022) explicitly refers to the literature on the foreign language effect as a possible explanation of the hypothesized effect (p. 115).
A second author who mentions the possibility of a (largely positive, useful) foreign language effect on how people interact in conflictual situations is Detey (2023). Again, based on evidence from second language learning and specifically on the foreign language effect, the author hypothesizes that in international, lingual franca–based interactions, superfluous conflicts tend to be avoided, and the discussions and debates tend to be more goal-oriented: [I]nternational communication in a lingua franca [‘langue véhiculaire’] can sometimes be much simpler (and more effective) than national communication” (Detey, 2023, p. chapter 7 [ebook], translation RB)
Detey makes a distinction between inter- or supra-national communication and national communication, which is due to his construal of a monolingual national context. In multilingual national contexts, however, there is no reason to assume that an analogous effect of exolingual communication might not also be detectable.
The Swiss context: institutional multilingualism and political polarization
Switzerland is an officially quadrilingual country (German, French, Italian, and Romansh are national languages) and de facto much more multilingual, with a high proportion of immigrants speaking other languages. 1 Moreover, the country’s political system is shaped not only by a high degree of federalism which gives considerable autonomy to the federal states (cantons), but also by the frequent use of direct democratic action. As the subtitle of Cormon’s (2015) book suggests, the Swiss vote on «almost everything». In fact, several times each year, there are popular votes either in the form of referenda against legal parliamentary projects on both the federal and the cantonal levels or as so-called initiatives on the federal level to add, delete, or modify constitutional articles. Moreover, the national government is always composed of members of the four largest parties. The seven members of government are expected to continuously find ad hoc compromises despite their different political views and to defend the majority view of the government even if they are known to disagree with it (see Vatter, 2008 on the dynamics within the Swiss consensus democracy).
As regards the national parliament, in the large chamber (conseil national), simultaneous translation from and into the three working languages German (spoken by about 61% of the population), French (about 23%) and Italian (about 8%) is provided, whereas the Romansh speakers, who are a tiny national minority (about 0.5%), usually are fluent in German. In the small chamber (conseil des états), no simultaneous translation is provided and MPs are expected to understand each other’s respective languages (see also Müller, 2019). In the numerous debates accompanying direct democratic processes on the national level (cf. Schröter, 2022), MPs and members of the government are often invited to discuss the current political issues, and sometimes they debate in their L2. For example, Swiss German politicians are invited to the public Francophone TV and radio network RTS to answer questions and participate in debates, and Francophones appear in German-language programmes in the German-language network SRF. The same applies to the smaller Italian-speaking minority, but given the large demographic asymmetry, the likelihood of having an Italian speaker speaking French or German is much higher than the likelihood of having French and German speakers express themselves in Italian (see Berthele, 2024 for more on demographics of the languages and tensions between the communities).
Thus, the official quadrilingualism of Switzerland allows for ample interactions among speakers of different languages who use, at least occasionally, not only their dominant language but also other languages of their multilingual repertoires. Politically, recent studies on Swiss politics suggest a strong polarization of both the party system and the national parliament. In fact, the social-democratic left-wing and the national-conservative right-wing parties of Switzerland’s tripolar party system are considered more politically polarized than equivalent parties in neighbouring European countries (Vatter, 2008; Freiburghaus & Vatter, 2019; Wyss et al., 2015).
However, even among prevailing polarizing tendencies, when compared to neighbouring countries, Swiss politics, in particular on the level of parliamentary debates (Bochsler et al., 2015), remains highly consensus-oriented. As Schröter’s (2024) corpus-analysis shows, notions such as ‘consensus’ and ‘compromise’ are positively connotated in both the French- and German-speaking part of the country. In fact, Switzerland has traditionally been labelled a prime example of a consociational democracy (Vatter, 2008) with its elements of multi-party government, federalism, minority votes, power sharing and elite collaboration, among others. Thus, the Swiss context is an interesting example to study from both a political and multilingual viewpoint – and an obvious choice to investigate a potential L2 effect in politics.
Research questions
As argued above, the Swiss context seems a privileged site for the exploration of the interplay of politics, polarization and multilingualism. Since we are not aware of any studies that have explored this link so far, this paper aims to answer both a thematical and a methodological research question:
Based on the literature, we will put a directed hypothesis to the test:
Method
To answer these questions, we compiled a corpus of turns of selected multilingual politicians in TV debates both in French and German and submitted this corpus to a LLM with a prompt asking about the polarizing potential. To answer RQ2, we had a subset of the statements also rated along the same dimensions of polarization by a panel of human raters.
Corpus
We collected subtitle files of TV broadcasts of political debates in French and in German (RTS: Radio-Télévision Suisse Romande, French-language public television; SRF: Schweizer Radio und Fernsehen, German-language public television). The participants in these debates, except for scripted introductory remarks by the anchors, interact spontaneously and off the cuff and do not read out prepared statements. Due to a facilitated access to many subtitles from the French-speaking programme InfraRouge, we have more subtitles from the largest Swiss minority language French and fewer from the majority language German (see Table 1 for details).
Number of turns by programme and language status.
We concentrated on the two most popular political debate programmes, Arena aired by the German-speaking public TV network and InfraRouge aired by the French-speaking public TV network. We manually identified ten politicians who regularly appear in both programmes and thus express themselves in both French and German. We then compiled a corpus consisting of about 2,100 turns or utterances (we use the terms synonymously) produced by these ten Swiss politicians, who are all active on the federal level, in one of the two chambers or as members of the government. Individuals and their verbal contributions were included in the corpus if they met the following criteria: the speaker took the floor at least five times in a given broadcast, had a clearly identifiable first language (either French or German), demonstrated sufficient proficiency in their second language (German or French) to participate in broadcasts in both languages, and appeared repeatedly in both national TV political talk shows – SRF Arena (German) and RTS Infrarouge (French).
The corpus is based on professionally produced subtitle files. These subtitles do not constitute exact transcriptions of the spoken discourse; rather, they represent condensed and linguistically simplified versions of the original utterances. Trained professionals listen to the live discussion and dictate the subtitles into a speech recognition system calibrated to their voice. The resulting text is then briefly reviewed by the subtitler and, if necessary, corrected before being displayed. Examples 1 and 2 illustrate the original spoken turn alongside its corresponding subtitle.
(2) [transcription of turn] Ja ich glaube man hat verstanden, Herr Rösti ist nicht aus der Grünen-Fraktion, aber das ist gut so, weil wir haben einen SVP-Bundesrat gewählt. Und unser System funktioniert genau so: Er wird seine Meinung im Bundesratszimmer im Bundesratszimmer. Er wird dort kämpfen, das ist kein Jassklub, wo man eine schöne und nette Diskussion hat. Es hat harte Diskussionen im Bundesratszimmer, die kämpfen für Ideen. Und dann schauen sie, dass sie zusammen eine Lösung finden. Und dann vertreten alle sieben Bundesräte die Politik des Bundesrates. (3) [subtitle equivalent] Albert Rösti gehört nicht zur Grünen-Fraktion, aber das ist gut so. Wir haben einen SVP-Bundesrat gewählt. Unser System funktioniert genau so: Er wird seine Meinung im Bundesratszimmer äussern. Es ist kein Jassklub, wo man eine nette Diskussion hat. Es gibt harte Diskussionen, alle kämpfen für ihre Ideen. Dann schauen sie zusammen für eine Lösung. Dann vertreten alle sieben Bundesräte die Politik des Bundesrates.
Translation of the subtitles: Albert Rösti is not part of the Green parliamentary group, but that’s a good thing. We have elected an SVP Federal Councillor. Our system works exactly like this: he will express his opinion in the Federal Council chamber. It is not a card-playing club where you have a nice discussion. There are tough discussions, everyone fights for their ideas. Then they work together to find a solution. Then all seven Federal Councillors represent the Federal Council’s policy.
The example shows that the content words and the gist remain unchanged, while certain elements, in particular repeated elements (such as ‘im Bundesratszimmer’), are eliminated. Using these subtitles comes at the price of a small loss of fidelity with respect to the exact wording, but it has the advantage of making it possible to process a much larger corpus than if the TV debates were to be transcribed manually. As discussed below in the section on reliability, we compared the machine-based evaluation of a small subsample of subtitles to verbatim transcriptions and the result confirms that the outcome is very similar.
These conversational turns stem from a total of 119 TV broadcasts aired between December 2009 and November 2024. We limited the minimal length to 5 words as shorter utterances are usually conversational management turns such as ‘Guten Tag, Herr Brotz’ (‘Hello, Mr. Brotz’). The mean length of a turn is 77 words (maximum 361 words, sd: 61).
OpenAI prompt
The debate turns described above were assessed in two ways. The total of the 1961 turns was sent to OpenAI via its API. A subset of the texts was assessed by human raters. Both assessments used the five dimensions of polarization discussed above.
OpenAI’s model gpt4o was used twice, all turns were submitted for evaluation once on 24 January 2025 and once on 15 February 2025. The temperature was set to 0 to make the model maximally deterministic. The internal consistency/reliability within the two sets of answers of the LLM are thus expected to be very high (see below).
The prompt used for each text was the following:
You are a helpful assistant. The following text is in French [or German, depending on the text]. Please rate this text for the following dimensions:
Factuality: Is the text factual (i.e., topic-oriented) rather than person-oriented?
Consensuality: Is the text constructive, that is consensus- and/or solution-oriented rather than seeking to emphasize disagreement?
Nuance: Does the text deal with different aspects in a differentiated way (e.g., arguments, views, perspectives and/or positions) rather than impose a single perspective?
Negativity: Does the text appeal to the audience’s negative emotions (e.g., feelings such as indignation, uncertainty, doubt)?
Divisiveness: Does the text emphasize and/or exacerbate differences between groups (e.g., political parties, professional groups, urban/rural, young/old, etc.)?
Please first answer for each dimension with a single word YES, NO, or with NA if you think the dimension cannot be assessed based on the text. Separate each answer with a <|> symbol. After this, please write the word JUSTIFICATION in upper case, and then state your reasons.
The model returns its evaluation in text form which could easily be transformed into scores (1, 0 or NA for each dimension). As an example, here is the first evaluation for the example text above (example 3):
(4) YES| YES| NO| NO| NO
JUSTIFICATION:
Factuality: YES. The text is factual as it describes the political process and the role of Albert Rösti within the Swiss Federal Council, focusing on the system rather than personal attributes.
Consensuality: YES. The text explains how the Swiss political system works in a constructive manner, emphasizing the process of reaching a solution through discussion and collaboration among the Federal Council members.
Nuance: NO. The text presents a single perspective on the political process, focusing on the functionality of the system without exploring different aspects or perspectives.
Negativity: NO. The text does not appeal to negative emotions, it is neutral and descriptive in nature.
Divisiveness: NO. The text does not emphasize or exacerbate differences between groups, it describes the political process in a neutral manner.
The scores for each of the five dimensions were stored in five columns containing either 0,1 or NA. The three dimensions factuality, consensuality and nuance were recoded so that the value 1 corresponds to the answer NO. In this way, the value of 1 for all five dimensions stands for the more polarizing option.
Evaluation of turns by human raters
In addition to the text classification task done by the OpenAI gpt4o model, a selection of 508 texts were rated by human raters. The human raters were first year law students at the University of Fribourg, either in a Francophone or in a German-speaking programme. Each rater got a personal selection of five texts in their respective study language. A sixth text was included at the end: Either this text was clearly polarizing on all dimensions or clearly not polarizing (as assessed by OpenAI and our team). Half of the human raters got the polarizing sixth text, and the other half, the non-polarizing. Figure 1 shows the task (French variant).

French variant of the task for the human raters.
This sixth text was added to select only answers of students who carefully read the texts and took the evaluation seriously and filter out those who just quickly ticked boxes to have a longer coffee break.
These filtered human ratings were used to calculate inter-rater reliability between OpenAI and humans (see below).
Results
Reliability
Reliability of the nominal response data was assessed using Cohen’s kappa (Cohen, 1960). First, the two sets of evaluations by OpenAI gpt4o of the same corpus across the five dimensions are tested. The coefficients are in the upper range of what is considered substantial (0.76, factuality) to almost perfect (0.88 group, 0.90 emotionality, 0.90 differentiatedness, 0.93 consensuality; see online Supplemental Material, section 2.1. for details). High coefficients are expected as the same model was used twice with the lowest possible temperature of 0. The LLM, with a few weeks of temporal distance, assesses the texts in a coherent fashion.
The reliability between the OpenAI evaluations and those of the human raters is larger than 0 for all five dimensions, which means there is throughout some positive association between humans’ and the machine’s responses. For one dimension, factuality, the reliability is low (Cohen’s kappa 0.14, which is conventionally considered slight agreement). The four remaining dimensions scores are fair (0.33 for group) to moderate (differentiated 0.43, emotional 0.49, consensus 0.51; see Supplemental Material, section 2.2 for details).
Any conclusions on differences regarding the factuality dimension should thus be drawn with caution due to the weak correspondence between the LLM-based and the human evaluations.
Finally, to assess the influence of the use of subtitles instead of verbatim transcriptions on our outcome variables, we also assessed 12 randomly selected subtitles and their verbatim transcriptions via the same procedure via OpenAI as described above. The output of the LLM was identical for the transcripts and their corresponding subtitles except for two instances of the dimension ‘factuality’, which is the dimension that has also rather low reliability when comparing human ratings to machine-based ratings. This is another reason why any difference for this dimension should thus be considered with much caution.
Polarization in L1 vs. in L2: descriptives
Table 2 shows the proportions of statements categorized as polarizing by the LLM as a function of whether they were uttered in the politicians’ L1 or L2.
Comparison of the proportions of polarizing statements, OpenAI evaluation.
The tendency goes in the expected direction, as all values for L1 are higher. Figure 2 shows the same difference between L1 and L2 but for each of the 10 politicians separately.

Proportion of polarizing statements per dimension and per politician (LLM ratings).
Figure 2 suggests that the expected tendency applies to most politicians, with one exception: Lisa Mazzone, a L1 French member of the Green party, seems to produce, in three out of the five dimensions, more statements rated as polarizing in L2 than in L1. All others show the expected pattern, while the L1–L2 differences in proportions within the individuals range from negligible to rather large.
As discussed above, the Swiss political system on the federal level involves a multi-party government that is expected to speak with one voice representing the majority’s point of view, regardless of the individuals’ political stances. Thus, the level of polarization might be affected by the status of a politician at the time of the broadcast (part of the government or not), which could also interfere with a possible foreign language effect. Five out of our 10 individuals, at some point in time, appear in their role as government members in our data (Elisabeth Baume-Schneider, Karin Keller-Sutter, Doris Leuthard, Alain Berset, Albert Rösti). We checked whether the L2 effect is cancelled for the subset of data produced whenever the individual was a member of the government (see Figure 4.4 in the Supplemental Material). As expected, the mean level of polarization is lower for this subset, which can be read as empirical evidence for the Swiss consensus model of government. The descriptive difference between L1 and L2 seems somewhat smaller for two dimensions (group emphasis, emotionalality), but also larger for at least one (consensus) in this subset. This does therefore not indicate that the effect completely disappears whenever a bilingual becomes part of the government.
As the reliability metrics suggest, the LLM and the human raters do not agree perfectly well on the ratings. Figure 3 is a plot of the proportions across L1 and L2 statements in the subsample analyzed by the human raters only. This figure does not distinguish between the individuals as we do not have enough texts per person in this subsample to make valid statements on the individual level.

Proportion of polarizing statements in L1 and L2 across the five dimensions, human ratings.
The Figure shows the expected tendency for four out of the five dimensions: When using L1, politicians are globally considered to be more polarizing. The dimension of group divisions does not follow the global tendency.
Inferential statistics
The descriptive analysis shows a tendency confirming H1 spelled out above: When politicians use their L2, their way of speaking is less polarizing. The differences are not equally large for all individuals. Moreover, other factors need to be considered: A topic of a specific broadcast might particularly foster polarizing language – in Switzerland, this would typically be discussions about relations with the European Union or about immigration.
To test the foreign language effect with the type of data we have at hand, different modelling strategies can be adopted. Either each of the five dimensions is tested separately (which presupposes that we test five independent hypotheses for each dimension), a global model can be fitted that accounts for the different dimensions, or the scores are summed up and an ordinal model involving such an overall polarization score representing all five dimensions is fitted to the data. In the Supplemental Material (section 5) all three strategies are documented. For the sake of brevity, we only discuss the second strategy here. The two other approaches do not show anything that would fundamentally contradict the following insights.
We fitted a logistic linear mixed effects model to the data with the lme4 library (Bates et al., 2014) in R using the following formula:
The variable n.InL1 is coded as −0.5 for no and +0.5 for yes. The random effects represent adjustments per individual (random intercepts) and random intercepts and random slopes for n.inL1 across the different broadcasts. While the former account for differences in degrees of polarization for each individual, the latter accounts – via the different broadcasts that are always dedicated to a specific political debate or topic – for the influence of debate topics on levels of polarization. We first tried to fit a model with the maximal structure including random slopes for both individuals and broadcasts, but due to singular fit we eliminated the random slopes per name.
In the fixed effects part, the model estimates first a separate intercept for each of the five dimensions. The model then estimates the average odds ratio to get a polarized statement for each dimension. To test H1, we first compare a null model without the predictor n.inL1 to the model specified above. The comparison confirms that adding this predictor that represents our foreign language effect leads to a significantly better model fit (Table 5.1 in the Supplemental Material). Furthermore, we can also examine the estimates of the interaction terms of our selected model, as they represent the effect of the use of L1 on the dimension (a slope parameter). If the odds ratios are above 1, this means that the likelihood of getting a polarizing statement is increased, if the estimate is around 1 or below, no increase or a decrease is predicted.
For the sake of readability of the article, we give only the model estimates that are relevant for our research question in Table 3. The full model output can be seen in the Supplemental Material (section 5.2).
Confidence intervals and the effect on the different dimensions of polarization if L1 is used; expressed in odds ratios.
The table shows that all estimates go in the expected direction of more polarization when L1 is used. The estimates of four out of the five dimensions are likely to be larger than 1 when considering the confidence intervals. We can thus conclude that overall, there is a foreign language effect in the expected direction, which attains what is conventionally called significance with a p level of around 0.01. The alternative models (see Supplemental Materials for full details on all models) suggest the same conclusion. If modelled separately, differentiatedness also attains significance (see section 5.1.3. in the Supplementary Materials).
Discussion
In this study, we analyzed a corpus of conversational turns by multilingual politicians in TV debates to test whether their use of L1 or L2 coincides with different levels of polarization. We identified five dimensions that, according to the literature, are related to the language of political polarization. We hypothesized, based on the literature discussed above, that on top of the foreign language effect detected in receptive or productive research via elicited responses in a lab environment, there could also be a foreign language effect in the spontaneous production of multilingual individuals. Asking human raters and prompting an LLM to assess these five dimensions of polarization yields a result that confirms H1, that is, the 10 politicians indeed produce less polarizing language when using their L2.
Whether or not the answer to RQ2 is positive depends on the threshold one wants to apply to the correspondence between human raters and the LLM output. This correspondence is not 0 or negative, that is the ratings from both sources seem to capture at least modestly something similar. Thus, we could conclude that RQ2 can cautiously be answered with yes. However, for more enthusiastic recommendations for the use of LLM prompts for the type of text categorization we are interested in here, higher kappa coefficients between the two data series would have been helpful.
Theoretical implications
Studies using production data are an important extension of the research on the foreign language effect. The production-based studies discussed in the literature review use data elicited in a controlled, usually lab setting. Our study shows that more naturally occurring production data may also be used to investigate L1 vs. L2 differences, and it suggests that the assumptions made underlying the foreign language effect also hold for spoken language data.
Ringe (2022) and Detey (2023) both hypothesize that multilingual verbal communication in a conflict-prone (political) context might be different, less conflictual and more consensus-oriented than in L1 settings. Our data overall confirm this assumption. It seems, therefore, that not only our ‘morals depend on language’ (Costa, Foucart, Hayakawa, et al., 2014) but also our politics and political culture do so.
Limitations
There is at least one individual in our sample that does not confirm this general tendency. Moreover, other factors such as institutional role (e.g., government membership, but also being the president of a party or a regular MP), or the specific topic of the debate undoubtedly have an impact on the degree of polarization. These factors could not be studied systematically with the data available to us but would certainly deserve the attention of political scientists. Moreover, the time of the debate, for example, whether there is an election coming up or not, might have an impact on degrees of the use of polarizing language.
Our interest, however, was on the difference within speakers across two of their languages. For bilingualism scholars it would be interesting to be able to measure the politicians’ language skills independently, as skills and degrees of dominance of the languages could be associated with the size of the foreign language effect. Even though such a proficiency and dominance effect is generally not confirmed by the literature (see Literature review), it could well manifest itself in production.
Future directions
In the light of the limitations of the sample and predictors that could be included in our analyses, more research is needed to bolster our assumption that the use of L2s has a de-polarizing effect on political discourse. Other contexts (national, international) where people use two (or more) languages should be explored, also to rule out that the effect found in our study is somehow specifically related to the Swiss context.
Moreover, in view of the relatively modest agreement between humans and the LLM used, the results are but a preliminary first attempt to investigate our main research question with automatic analyses. More research is needed on assessing the polarizing potential with both LLMs and humans. Instead of relying on BA students with no specific training for the task, the raters could be trained based on the evidence from linguistic analyses of political language. Moreover, based on valid human ratings, an LLM could be trained on a training set of data and then used to evaluate a test set. It is likely that the performance of the model (and the agreement with human ratings) will increase. However, given the results obtained based on the same corpus with pre-trained multilingual LLMs on the dimensions of subjectivity and sentiment (see Berthele, 2025 for details), we have reasons to believe that the tendencies that emerge in the present analyses will be borne out by more sophisticated approaches.
The multilingual repertoires and, in particular, the dominance relationship between L1 and L2 could be controlled more carefully in future studies, which was impossible in this study as these relatively prominent MPs and members of the government were not likely to come to the lab for psycholinguistic tests of their individual multilingualism or of their L2 competencies. A possible avenue to get a better grip on the repertoires could be metrics based on the prominent people’s L2 performance (e.g., again in the media) that do not require individual testing, such as for example fluency-metrics based on oral L2 use.
Conclusion
Being a public figure in polarized times requires a thick skin. Doing politics in an L2 means discussing (relatively) hot topics in a potentially conflictual setting. This is undoubtedly stressful and potentially face-threatening for politicians, in particular if one lacks experience. The data and analyses presented in this contribution suggest that the increased effort of engaging in debates in multilingual contexts might have interesting implications: Due to the psycholinguistic factors underlying the foreign language effect, and maybe also more generally thanks to specific communication strategies used by multilingually aware speakers in exolingual settings, political debate contributions in L2 tend to be less polarizing than if the same individuals would speak in their L1. Further research will have to show whether this finding from a specific Swiss political TV debate setting may be generalized to other multilingual contexts with different linguistic and political configurations. Should it indeed be as simple as that – using an L2 in political discussions to decrease polarization between ideological opponents – it might as well be worth trying.
Footnotes
Acknowledgements
We thank Marie de Piante (RTS) and Nicolas Félix (UNIFR) for helping us getting hold of and understanding the subtitles; and thanks also to Christian Wolf (UNIFR) who has been extremely efficient extracting the relevant text from the subtitle corpus. We thank our MA students Alina Ahafonova, Leda Bonzanini, Clélia Bertolini, and Bianca-Georgiana Nita (all UNIFR) for their help collecting the human ratings. Finally, we would like to thank our UNIFR colleagues Eva Maria Belser, Camilla Jaquemoud and Simon Mazidi from the law faculty for helping us getting the human ratings from their students.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We gratefully acknowledge funding from the Mobiliary Cluster for Resilience and from the Institute of Multilingualism at the University of Fribourg, Switzerland.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
We will publish the data set and scripts openly available via the project’s Open Science Framework page once all publications on this corpus are accepted.
