Abstract
Scholarly literature has demonstrated that hybridity transforms both legacy and new media, but that this change is not even. We treat social media platforms as arenas of remediation, where users share and add their own context to information produced by both media subtypes and compare social media conversations about migration in six European languages that include links to either traditional or new media during 2015–2019. We use a mix of computational and statistical methods to analyze 3.5 million (re)tweets and 500,000 links shared within them. We identify the main differences in agenda setting power, function, and tone present within tweets that include links to legacy or new media. Our results show that discourses are similar across languages but clearly different when remediating legacy and new media. Trust in legacy media is correlated with higher proportion of shared links from legacy media and reversely related to the proportion of shared links from new media sources. Considering the volume and timing of the remediated content, we conclude that legacy media retains its agenda setting power. New media linked content tends to cover migration in association to subjects such as Islam or terrorism and to express strong critical opinions against migrants/refugees. The language used is more toxic than in legacy media linked content. The tweets remediating legacy media articles covered topics like domestic or European politics, causes of refugee arrivals and procedures to give them protection. Thus, legacy and new media remediated content differs in both tone and function: toxicity is low and factuality high for content linking to legacy media, with the reverse being true for new media remediations.
Keywords
Introduction
Social media platforms emerge as arenas of hybridity, where legacy and new media compete for the attention of users, who remediate content from both media subtypes by sharing links and adding their own context. We explore whether the remediated content reflects the differences between legacy and new media by comparing social media conversations that refer to either traditional or new media covering the topic of migration and refugees during 2015–2019. To do so, we select three characteristics that set the two media types apart: agenda-setting power, tone, and function. We then compare along these three lines the attributes of tweets 1 that were linked to either legacy or new media across six European languages using computational and statistical methods. The languages under scrutiny—Danish, Dutch, French, German, Italian, and Swedish—represent diverse cultural and political landscapes, each with its own nuances in media consumption and production. The thematic unity of the corpus is given by a focus on migration in the European Union (EU) context. Being circumscribed to the same EU framework increases the cases’ comparability while preserving the variation in media systems and governmental orientation towards migration, granting our paper a wider generalizability.
We believe that a multi-lingual approach to this analysis unveils the ways in which hybridity may manifest across different sociocultural contexts, transcending national and linguistic boundaries. First, we isolate tweets containing links that we manually qualify as belonging to legacy or new media in each language. Second, we explore the Twitter content from the two types of outlets via topic modeling. Thirdly, we analyze the toxicity level of the same content to evaluate the writing style defining each media subtype. Fourthly, we perform statistical analyses that connect the content and tone of the migration coverage on Twitter to legacy and new media, respectively, to determine feature patterns distinctive of each. Finally, we compare our six language contexts to parse out commonalities and differences. Our study contributes to the theoretical understanding of the process of media hybridization as well as to the study of European migration coverage.
Hybrid Media
Hybridity is born at the intersection of the old and the new. Chadwick distinguishes between diluted and particulate subtypes (Chadwick, 2013, p. 18). Looking at the construction of political news, he sees the latter type as being more prevalent, with elements of both traditional and new media recombined to produce a new political news cycle. This recombination is not without friction: the two ways of informing about politics compete with one another for readers’ or users’ attention and trust (Colombo & Mascheroni, 2022).
The media landscape has evolved into a hybrid environment due to digitalization and widespread internet use, merging traditional media like newspapers, TV, and radio with digital formats accessible through tablets, websites, and social media platforms like Twitter and Facebook. This integration allows traditional journalism to coexist with and adapt to digital platforms, where journalists engage with audiences directly as online commentators with significant followings. Simultaneously, social media platforms facilitate the emergence of non-traditional news actors such as digital-born media outlets, blogs, and YouTube channels, creating a space for blended content. This process, known as remediation (Bolter & Grusin, 2000), enables the convergence, interaction, and user-driven reinterpretation of media content from both legacy and new sources, thereby equalizing their presence and impact within the digital media ecosystem. The structures and rewards of social media affect legacy and new media equally, which increases their comparability.
The Three Dimensions of Comparison
We identify three dimensions for comparing the use of legacy and new media on Twitter: agenda-setting power, tone, and function, detailed below.
Agenda Setting Power
Agenda-setting power refers to the presence and/or intensity of media content devoted to a particular topic (McCombs & Shaw, 1972). Legacy media traditionally sets the news agenda, determining which stories receive prominent coverage. In contrast, and based on the audience logic outlined above, new media often relies on bottom-up agenda-setting, where user-generated content influence the news agenda. The scholarly literature, while initially contending that social media intermediaries may reduce the role of mainstream media in favor of other sources, has now shown evidence of the maintenance of the agenda-setting power of legacy media even in social media settings. In experimental studies done in the US, Feezell shows that the social media accounts of traditional media channels were able to raise the salience of political news for users otherwise low in political interests (Feezell, 2018). And, while it appeared that social media was first in breaking news and cued broadcast and print media, a more fine-tuned analysis revealed that it was the legacy media accounts (and the politicians’ and parties accounts as well) that provided fast and relevant information (Harder et al., 2017). Moreover, looking at the agenda-setting process in a British political scandal connected to migration, legacy media played a crucial role in revealing the problem, and in creating what the authors called a “media storm” leading to policy decisions. Multiple legacy media outlets set the political agenda-setter by “initiating, amplifying, and sustaining attention to the issue” (Langer & Gruber, 2021). Even in the context of Hong Kong, where the journalistic reputation of traditional media is declining, Lo et al. (2021) find that legacy media cue and influence new media coverage but remain uninfluenced by it in return.
We thus formulate hypothesis 1, that tweets containing legacy media links are more numerous than those linking to new media and that the legacy media links would be shared mostly at the beginning of the period covered, initiating information flows on refugees’ and migrants’ arrival.
Trust in media is known to be important for the agenda-setting process, in particular at the individual level. People actively learn and build an opinion about which issues matter by consuming trusted media (Miller & Krosnick, 2000). Moreover, higher trust in legacy media is correlated to lower trust to digital-native media (Andersen et al., 2023). Thus, trust in media is an appropriate way to account for the differences between our different language contexts. We formulate Hypothesis 1a: the volume of legacy media links shared in a specific language correlates with public trust in legacy media from the countries where that language is spoken.
Tone
Due to their audience orientation (Blassnig & Esser, 2022), and to their need to carve a place for themselves in relation to the established media channels, new media often pursue editorial innovation or uniqueness through the selection of varied stories, perspectives, or themes, the adoption of diverse tones in news presentation, and the introduction or foregrounding of voices less frequently encountered in traditional media (Nicholls et al., 2016). New media channels focus on more narrow publics, defined demographically (e.g., youth, targeted by BuzzFeed, the now defunct Vice, etc.), or ideologically (e.g., groups that feel disenchanted with legacy media, such as those holding populist attitudes (Mitchell et al., 2018; Stier et al., 2020). Since populism is correlated with anti-outgroup feelings (Fawzi, 2019), including towards migrants, it is more likely that digital-native news media would display a more negative tone. Finally, hotly contested issues like migration attract toxicity online (Salminen et al., 2020). Thus, we formulate hypothesis 2, that the level of negativity in tweets discussing the European migration crisis is higher when new media links are shared. Due to the methodological problems in measuring comparable values of negativity across languages, we abstain from a cross-language hypothesis.
Function
Legacy media, characterized by professional journalism’s commitment to accuracy, truthfulness, impartiality, and independence, has established a trusted reputation through adherence to journalistic standards and a focus on newsworthiness, reaching balance between ethical considerations and economic motives (Bednarek & Caple, 2017). One of its primary functions is to provide verifiable and reliable information.
In contrast to traditional news organizations, new media outlets have had to forge new connections with their audience in their efforts to establish themselves on the news media market. New media had a clearer incentive to take advantage of digital technology’s opportunities to engage individual users and cater to their requirements (Loosen & Schmidt, 2012).
Moreover, new media outlets tend to have fewer financial resources than legacy media and may resort to alternative funding strategies. However, legacy media typically relies on advertising and subscriptions (or state financing, if public service), new media is more motivated to explore diverse revenue models, such as online advertising, paywalls, crowdfunding, and affiliate marketing (Nicholls et al., 2016). These alternative models may come in conflict with the traditional journalistic values of impartiality and independence. In turn, this may dissuade professionally trained journalists to join such new outlets and lead new media to rely on citizens or activists (depending on the profile of the channel). Online communities, blogs, and specialized news websites tend to offer news content tailored to specific interests and demographics, even though over time there has been a tendency, for some outlets, towards professionalization and embracing more legacy media standards and values (Nicholls et al., 2018; Thomas & Cushion, 2019). Because of its financing models, dependency on digital-only distribution channels and on attention economy, new media’s function may be less to inform and more to provide opinion and commentary 2 for a more specialized public. Thus, we want to explore if Twitter users reflect the two functions in their own recontextualization of legacy and new media content.
Thus, we formulate hypothesis 3, that legacy media links are shared in the context of more factual topics, primarily providing factual information, while new media links are most shared in the context of controversial topics, where opinions are expressed. To capture a possible variation across languages, we formulate hypothesis 3a, that factual topics are more common in countries where migration rates are low.
Media Discourses on Migration
We chose to compare Twitter remediation patterns on migration-related content. Migration has been at the top of news agendas across the European Union, especially in connection with what has come to be known as the European “refugee crisis.” Moreover, migration has been at the core of political agendas of European parties, especially for those with a nationalist orientation.
The study of the media reporting of migration-related issues, although extensive within different strands of communication research, is mostly anchored in specific national contexts. Comparative analyses which examine the discourse on (im)migration in different European countries are found wanting (Eberl et al., 2018).
Research on digital discourses of migration is also rare. A recent review of 119 articles revealed that newspapers were the primary source in most studies on migration discourses, whereas TV and social media were seldom included (Seo & Kavakli, 2022). The investigations of the interrelationships of media within hybridized media systems on polarization and/or opinion alignment of attitudes towards immigrants found that traditional media assert a greater impact on extreme and consistent positions than social media news (Iannelli, Biaggi & Meleddu, 2021). Conrad (2021) performs a frame analysis on a data set that combines traditional and social media from three countries but focuses only on the Global Compacts and uses a small-N approach. Moreover, he is only interested in the frames employed by populist and right-wing actors.
Existing research showcases a series of common topics (or frames) related to migration, and in particular to the “refugee crisis,” across media types. Greussing and Boomgaarden (2017) find that in six Austrian mainstream newspapers, the most frequent frames in which the “refugee crisis” was portrayed were the settlement/redistribution of incoming asylum seekers, criminality risk posed by them, economy (or the economic burden posed by the newly arrived), and humanitarianism (desire to assist, especially from the part of civil society). Also, present were topics of what the authors called background/victimization (the difficulties encountered by the refugees on their way to Europe), securitization (national security and border control), and labor market integration. Other common frames identified are the emphasis of the otherness of (im)migrants, for example, in economic or cultural terms, security threats and exploitation of social programs (Lawlor & Tolley, 2017), frames of (im)migrants which criminalize and victimize, victimization of migrants and/or construction of a threat (Famulari & Major, 2022), or anti-immigrant hate speech, in particular on social media (Nortio et al., 2021).
Going beyond one national case study, Heidenreich et al. (2019) gather print and online articles from several news outlets in five European countries (2015–2016) and perform an automated frame analysis. Their findings reveal many similarities with the Austrian case, with the most common lens through which migration was seen across the media outlets being the economy, followed by welfare, accommodation of refugees, and international humanitarian aid. Refugee camps, borders, as well as national and EU politics were also present.
To our knowledge, no research has performed a similar type of analysis on social media data. Several studies included social media posts in their analyses. However, they all focus on a specific type of actor within social media. Among those studies that include a social media component, Ademmer and Stöhr (2019) look at comments left on the Facebook pages of local and regional newspapers in Germany. They identify 100 topics, which they group in three cleavages: GAL/TAN, left-right, and dealignment (cf. Hooghe et al., 2002). They find that the first cleavage, characterized by an emphasis on culture and identity, clearly dominates the comments studied. The more traditional left-right cleavage between progressive and conservative politics is much less frequent, whereas dealignment, or the lack of a political leaning, is the least prevalent. While providing relevant insights, this study focuses on the microlevel of migration politics and is thus rather limited in scope compared to our undertaking.
In another article related to migration on social media, Heidenreich et al. (2020) analyze visibility and sentiment towards migration in the Facebook accounts of political actors across six European countries, between mid-2015 and the end of 2017. They find significant differences in the volume of migration discussions between national political spheres, as well as differences in the tone of extreme left- and extreme right-wing parties. However, the article does not include legacy or new media accounts.
Research Design
Our study takes a multi-lingual approach to investigating the relationship between legacy and new media sources on Twitter. The advantages of such a comparative approach are numerous; in fact, we subscribe to the entire Esser and Hanitzsch’s (2013, p. 4) list of reasons. Indeed, comparisons increase the generalizability and validity of our findings, while reducing the risk of overgeneralization and ethnocentrism.
While comparisons are good for all the reasons above, scientists need to be careful to bring together comparable phenomena. We argue that the topic of migration and its manifestation on social media are doubly fitting for comparisons. For one, migration is in itself about more than one setting, as it is defined as mobility across borders. In our specific European context, migration is even more transnational than elsewhere, since the European Union does not have, for the most part, internal border controls and the European migration and asylum policy allows refugees and migrants to move from their entry point country to the rest of the EU members. Thus, all EU member states are touched by the issue of migration, and especially so during periods of high influx of incoming people, such as the period we study here.
For the other, social media is a digital domain where information flows are unhindered by state controls, especially not in democratic settings. Social media is, in fact, an ideal site for performing comparative work because it keeps constant some of the technological attributes that otherwise would vary more widely across cases (Bossetta et al., 2017). This harmonization of the communication infrastructure is especially implemented in the European Union, whose legal and regulatory frameworks apply evenly across the language spheres we compare here.
Not all social media are the same, however. Twitter’s open network structure and public visibility allowed it to attract journalists and politicians and to build a reputation as a breaking news platform, in contrast to Facebook, dedicated to more personal content targeted at friends and family. For example, Kalsnes and Larsson (2018) find that Norwegian news about politics, migration, or foreign affairs where more shared on Twitter, whereas stories about children, health, and education were more shared on Facebook.
Case Selection
The languages we include here, Danish, Dutch, French, German, Italian, and Swedish, are not the same as national twitterspheres, because (apart from Danish and Swedish) they are vernaculars in more than one European state. For example, German is the official language in both Austria and Germany; French has the same status in both Belgium and France; and so on. Thus, we cannot draw direct parallels between the digital language spheres and the properties of party and media systems, or the orientation of national governments.
That said, it would still be useful for the comparison to present the landscape of national politics and media in the countries that correspond to some extent to the languages selected, since the criterion of including these specific cases was to obtain as broad as possible coverage of the migration discourse in Western Europe. Using Humprecht’s et al. (2022) classification, we observe that media systems covered by our Nordic and Germanic languages (Denmark, Sweden, Austria, Germany, and The Netherlands) all fall under the democratic-corporatist model. The remaining three countries where our selected languages are spoken (Belgium, France, and Italy) belong to a middle cluster, between the democratic-corporatist and the polarized-pluralist model typical of South/eastern Europe. 3
We consider how trust varies across the contexts corresponding to our chosen languages. Italy and France are countries where trust in public TV, radio stations, and the written press is relatively low, while countries like Swedes, Danes, and Germans have relatively high trust in these. Moreover, trust in new and alternative media is in reverse proportionality to trust in legacy media, as evidence from Sweden demonstrates (Andersen et al., 2023).
We also group the same countries by migration statistics, in line with Heidenreich et al. (2020). Sweden and Germany received most migrants, Belgium, France, and Italy were somewhere in the middle, and Austria and Denmark had the fewest migrants due to their very restrictive policies. This distribution ensures that we have enough variation in our language spheres to capture any relationship between political context and remediated content.
Operationalization
We translate the theoretical literature into specific operationalizations for our specific test domain, the Twitter messages that contained links to either legacy or new media. First, we consider the appropriate measure for the agenda setting power of legacy and new media on Twitter to be shareability. The social media era is defined by an economy where popularity and attention are the hardest currency, especially difficult to attain because of short attention spans and of fragmented audiences (Gillespie, 2018). Thus, looking at the proportion of links shared between legacy and new media is a useful approximation at the amount of attention and popularity each media type commands on Twitter about migration. Based on the reviewed literature, we approximate the power of intermedia agenda setting by looking at the overall volume of tweets that distributed links from one or the other media type as well as their timing, to capture a possible cueing effect.
The second dimension we explore is tone, more specifically negative tone as expressed in the content of tweets containing links from legacy and new media, respectively. We measure tone using a toxicity detection algorithm that picks up intensively negative speech in each of the six languages we collected. The third characteristic concerns the different functions of the two media subtypes. Since this aspect is related to the remediated content in which links appear, we capture this dimension via detecting the topics contained in Twitter messages that link to either traditional or digital-native media.
Data
We extracted tweets with migration-related content through the Twitter API, focusing on tweets referencing European actors during the time of the EU’s “migration crisis” (2014–2019). Tweets were selected based on two criteria: First, they needed to include direct references to the EU or its affiliate institutions, such as the Commission, Council, Frontex, Europol, or LISA. These references were identifiable if the tweet was a reply to, mentioned an EU actor, or included links to official EU websites or social media accounts. Second, tweets or retweets had to include linguistic variations of terms like “(im)migrant,” “refugee,” or “asylum.” The search query was constructed in six languages, considering language peculiarities and organizations related to migration and its associated terminologies (all queries available through our online repository).
We acknowledge that this query might have excluded tweets that did not explicitly mention the EU and its institutions. However, given the limitations of the API, we had to design the query such that it (1) limits the overall number of tweets returned while preserving the EU-context of the migration discourse and (2) includes search terms that are as similar as possible across languages. To enable topic modeling, and in line with studies on tweet analyses (Egger & Yu, 2022), hashtags and links included in the tweets were removed from the text but stored separately for later analysis. This was necessary to prevent topic models from being built around specific hashtags or links, allowing them to focus on the actual content of the tweets. Tweets with less than 4 words were deleted and all text was lowercased for the topic modeling. No further pre-processing was conducted in line with the recommendations for BERTopic. 4
For our operationalization of remediation, it is important to note that we only included tweets that contained at least three words other than a link or hashtag. Furthermore, we checked that the text provided by the API at the time did not automatically contain text coming from the link source (e.g., the headline of an article).
Data Per Language Group.
Methods
Our primary objective is to identify how the characteristics of tweets differ when they contain links to either legacy or new media, especially in the context of the European migration crisis. To test our three hypotheses, we analyzed tweets across three key aspects: the type of media link included, the tweet’s toxicity level, and its topic. Each language group was analyzed separately, using a Bayesian Network to quantify associations between these tweet characteristics. Bayesian Networks were chosen for their ability to handle uncertainty and complexity in multivariate data. We later synthesized these language-specific networks into one overarching Bayesian Network, highlighting commonalities across languages (Figure 1 outlines this structure). A separate subsection is devoted to discussing the multi-lingual nature of our data. Structure of the analysis.
Link Analysis
All embedded links in tweets were extracted, and their ultimate websites identified. Websites that appeared at least 10 times were classified by two independent coders into legacy or new media. We used a kappa score to ensure the reliability of the coding. Hosts were categorized only if both coders agreed; otherwise, they were labeled “other.” The Appendix (Table A1) provides an overview of the most common hosts within each language and their media type.
Toxicity Classification
Using the Google Translate API, we converted all (re)tweets to English, establishing a uniform base for toxicity classification. Toxicity scores were obtained through the open-source Detoxify classifier (Hanu, 2020), which was specifically trained on social media data. We recognize the inherent theoretical challenges in delineating toxicity from other concepts, such as hate speech, vulgarity, and incivility, especially given that what constitutes “toxicity” is subjective. However, our choice for this toxicity classifier is of a pragmatic nature in that Detoxify is widely used and has been trained on contemporary datasets, rendering it proficient in discerning modern internet slang and idiosyncratic Twitter expressions. Furthermore, benchmarking evaluations illustrate that Detoxify boasts high accuracy levels, balancing classification sensitivity and specificity, often surpassing other toxicity classifiers. Although the classifier also differentiates between various types of toxicity (e.g., sexual or identity-based), our analysis uses only the overall toxicity score for reasons of comprehensibility. We are thus not interested in toxicity per se but see it as a proxy for all kinds of hateful and counter-productive language within a discourse (as in Zhang et al., 2018).
Given that toxicity scores varied by language, we standardized them within each language group, enabling a more equitable cross-language comparison. This ensures that each toxicity score is relative to the mean toxicity level within that language group.
Topic Analysis
We employed BERTopic, an algorithm designed, among others, for tweets (Grootendorst, 2022), to identify topics separately within each language group. To prevent biasing the topic model with frequently retweeted content, we only trained it on original tweets. We chose BERTopic due to its capability to handle multi-lingual text data effectively. To avoid overrepresentation of viral content, only original tweets were included in the model training. Furthermore, we set the minimum number of documents that the algorithm had to assign to a potential topic for it to consider it an actual topic to be 50. This led to ∼100 topics for the smallest language group in our data (Swedish) and ∼200 for the largest (German).
To ensure comparability of topics across languages, two coders applied an inductive approach to consolidate language-specific topics into broader “themes,” which serve as higher-order topic categories. For instance, German tweets discussing political parties and Italian tweets discussing the rise of a populist party were both coded under the theme “domestic politics.” Themes in the context of our analysis should therefore be seen as higher-order topics. Once these themes were agreed upon, both coders independently assigned topics to these themes based on the topic description provided by BERTopic which contains the ten words most uniquely associated with the tweets under the topic and the five tweets that were most representative for the topic. A topic was only considered belonging to a theme if both coders assigned it to the same theme. This implies that many topics were not assigned to a theme because the topic was either very language-group–specific, coders did not agree, or because BERTopic identified very generic topics that did not contain one clearly identifiable theme. Across languages, about 50% of all topics were assigned to a theme through this process.
Bayesian Network
To explore the relationships between tweet topics, toxicity levels, and the types of links shared, we employed Bayesian Networks (BNs) as our analytical framework (Koller & Friedman, 2009). From a statistical standpoint, a BN can be likened to a system of regressions, represented as a directed acyclic graph. In this graph, each node has input variables, akin to predictors in regression models. However, unlike traditional regression models that typically focus on a single dependent variable, BNs allow for the simultaneous study of multiple dependent variables. In our case, these variables are the presence of a link and the specific type of link shared (legacy or new media). A key advantage of BNs is their ability to avoid some problematic assumptions of frequentist statistics when estimating coefficients (mainly those linked to p-values). This characteristic is particularly crucial in our case, where dozens of coefficients are estimated for each language without specific ex-ante hypotheses for each topic or language. Summarized, BNs lend themselves better for the explorative approach of our analysis than standard statistical modeling.
After generating the language-specific BNs, we aggregated all networks into one unifying BN. We followed the concept of a consensus model (Del Sagrado & Moral, 2003; Squazzoni et al., 2021), where the consensus BN only includes those associations between variables that were consistent in at least five of the six language-specific BNs. Since language specificities are not the focus of this paper, the language-specific BNs are only presented in the Appendix (Figures A2-A7).
Cross-Language Comparisons
In this subsection, we detail our approach to tackling the specific challenges when dealing with multi-lingual Twitter data. Our approach aims to identify cross-language patterns while respecting the unique characteristics inherent to each linguistic group.
For topic modeling, we took advantage of BERTopic’s capability to include pre-trained, state-of-the-art transformer-based language models. This enables the algorithm to consider the unique vocabulary and nuances of each language group, specifically within the discourse on migration and associated political topics. Using a standardized translation process for all tweets could lead to a loss of these linguistic nuances. To prevent bias toward larger language groups in our topic model, we kept our topic modeling language-specific. This ensured that topics from smaller language groups were not overshadowed by those from larger groups. Furthermore, we wanted to avoid that BERTopic would primarily cluster topics around individual language groups and, instead, identify topics dealing with themes that can be compared across the various language groups.
Since our focus is on a European issue, we made a distinction between domestic and European topics in our analysis. For example, a Dutch tweet discussing the German Bundestag needs to be interpreted differently than a German tweet discussing the same topic. In other words, the language in which a tweet is written provides vital context for the interpretation of its subject matter.
Upon identifying language-specific topics, our next challenge was the integration of these topics into a cross-language analysis. Rather than merging topics directly, we embarked on an inductive thematic analysis, associating language-specific topics with overarching themes discernible across all languages.
Our approach to dealing with the multi-lingual data differed when it came to toxicity. Initial qualitative assessments showed that baseline levels of toxicity varied significantly across languages. To address this, we use the statistical method of standardization for toxicity scores within language groups. This approach allows us to understand how a particular tweet’s toxicity level deviates from the average in its respective language. However, the Detoxify classifier was not trained on all languages in our data and we, therefore, chose to use the Google Translate API to translate all tweets into one common language (i.e., English). Previous research has consistently shown that translation of multi-lingual corpora to one common root language is a valid approach to achieve consistent classifications across those languages (Balahur & Turchi, 2014; Lind et al., 2021). We also abstain from looking at sub-dimensions of toxicity since certain kinds of profanity (e.g., sexual) are more common in certain languages and analyzing these sub-categories may therefore be problematic in a multi-lingual corpus.
To ensure the validity and robustness of our toxicity classifications, we employed two types of checks: Firstly, we randomly selected 100 tweets from each language group and had them assessed for toxicity by human experts. The correlation between the human-coded rankings and the classifier’s rankings (after translation into English) exceeded 0.85 across all six languages. This high correlation suggests a strong alignment between machine and human assessments, bolstering the validity of our method. Secondly, we sought to further validate the quality of translation and the reliability of the classifier. We translated the same 100-tweet sample from each language into German and utilized a German-specific toxicity classifier for assessment (ML6 Team, 2022). The correlation between toxicity rankings derived from both English and German-translated tweets exceeded 0.9, reinforcing the robustness of our classification approach.
Despite the validation steps taken, we recognize that using Google Translate for translating tweets may have introduced inaccuracies, particularly with idiomatic and culturally nuanced expressions. However, since our validation with human coding suggests that translation did not bias the relative order of tweets’ toxicity score, we argue that, in our case, relying on automated translation is a reasonable compromise between accuracy and the efficient processing of large volume, multi-lingual data. While we conducted validation procedures for translations and toxicity classifications within languages, cross-language validation would require a significantly different approach with human coders fluent in all six languages. Additionally, we recognize that perceptions of toxicity can vary culturally, complicating direct language comparisons. Our study, therefore, abstains from any direct comparison of toxicity scores across languages and, instead, focuses on identifying common patterns in discourse within the languages.
As discussed in the previous subsection, we use Bayesian Networks to synthesize our results into one coherent narrative. Here we followed a two-step process, where BNs were first created on a per-language level, and then all BNs were combined into an across-language BN. For the across-language BN, we conceptually treat our six language groups as a sample of languages and consider an association between two variables only relevant for the across-language BN if it is consistently found in at least five out of the six per-language BNs. This leaves us with an interpretable number of associations and results that are robust across language groups and found in the EU-wide discourse on migration.
Results
Before turning to testing our hypotheses, Figure 2 provides an overview of migration tweets over time. The data display two prominent peaks of Twitter activity: one in 2015 and another in 2018. The surge in 2015 corresponds with the year that the EU experienced for the first time a significant influx of migrants. The 2018 spike, on the other hand, could largely be attributed to the widespread discussions about the Aquarius. This rescue ship, carrying over 600 migrants, was denied docking privileges by both Italy and Malta. The ensuing controversy reignited the migration debate in Europe, and particularly in Italy, drawing substantial media attention. Distribution of tweets over time, within each language.
Legacy media as agenda-setters
In line with hypothesis 1, Figure 3 reveals that legacy media links were shared approximately three times more frequently than new media links and that legacy media links peek during the early phase of the migration crisis. Notably, the prominence of new media links gradually increased across language groups over the period suggesting that new media were primarily shared during the evaluative phase of the discourse. Percentage of all shared links during each year pointing to legacy (left) and new media (middle). The right panel shows the overall distribution of media types within the 40 most shared hosts in our data (see Table A1 for more details).
EU Countries With Languages Included in the Twitter Sample. Per Country, Its Rank Among the EU-27 in Terms of Trust in Various Media Types is Shown. Source: Eurobarometer (2022).
Together, new and legacy media accounted for 30–40% of all shared links. Another 30% of links were links to other twitter content and 10% were links to other social media sites. There was also no consistent time trend across languages groups regarding social media links and we, therefore do not discuss them further in our analysis.
Toxicity surrounding new media links
The shift towards more new media link sharing shown in Figure 3 coincides with an increase in tweet toxicity during the same period (Figure 4, left panel), suggesting the link between new media and toxicity also during remediation. To see whether toxicity is generally higher in tweets that contain links to new media, we compared the mean toxicity levels in tweets with new and legacy media links. We find clear evidence for hypothesis 2, that is, across all languages, toxicity is significantly higher whenever new media links were included (Figure 4, right panel). While Figure 3 suggests variations in toxicity across languages, our analysis was not designed to facilitate direct comparisons across languages due to potential biases in translation quality and cultural variations in the perception of toxicity, as mentioned in the Method section. Our approach prioritized identifying common patterns in the discourse. While cross-language comparisons might offer intriguing insights, they should be approached with caution, as the classifications might not uniformly reflect toxicity due to the aforementioned factors. Left panel, toxicity in tweets per language. Right panel, average toxicity and 95% confidence intervals of tweets with new or legacy media links.
Thematic context of link sharing
Overview of the 12 Most Common Themes.
Distribution of the 12 Themes, Ordered by Average Prevalence Across Language Groups. Blue Bars Indicate Percentage of Themes Within Language Group. Red Bars Indicate Toxicity of Tweets in the Themes (Scores are Standardized Within Languages).
The two dominant themes across the language clusters are domestic and EU politics, dealing with political actors and institutions. The least frequently discussed themes are connecting the migrants/refugees with Islam or with criminal activities. The overall distribution of the themes corresponds to a factual and descriptive group (the first seven on our list) and to a more opinionated, normative group (theme eight to 12). This is our first indication that our Hypothesis 3 about legacy media having an informative function and new media having a commentary function may be true. We present a more thorough test in the Bayesian model below, but before that we want to address some variations within (but not across, as our method does not allow it) language clusters.
While Danish and Swedish tweets had a strong focus on domestic politics, they had minimal emphasis on EU politics, indicating an inward focus in their discourse. Conversely, French tweets leaned heavily towards EU politics, signifying a broader European perspective. Themes such as “Crime/Terror,” “Contra/Protest,” and “Islam/Muslim,” often considered as polarizing, constitute only a smaller portion of the tweets within each language group. This suggests that while there are concerns associated with migration, the overarching discourse on social media might be more nuanced and multifaceted than commonly perceived. However, Table 4 also shows that these “fringe” themes included the highest levels of toxicity. German language displayed the highest level of toxicity relative to the non-toxic content, signaling that migration was a divisive subject in Germany and Austria. Also in Italian, toxicity was a more widespread tendency, extended even to factual topics. Danish, Dutch, and Swedish had the lowest internal toxicity, with opinionated topics being the only subjects covered in toxic language.
Table 4 helps us answer hypothesis 3a, that contexts where migration is lower contain more factual-related themes than contexts where migration is more of a social challenge, which we approximate by the presence of a large increase in the migrant/refugee population in the period studied. According to Heidenreich et al. (2020), migration numbers were highest in Sweden and Germany, lowest in Austria and Denmark, and moderate in Belgium, France, and Italy. The number of incoming migrants/refugees does not seem to correlate with the proportion of factual topics. The most factual topics appear in the language contexts with an average migration/refugee volume: France and Italy, Sweden and Germany, the countries with highest refugee volume, display rather different priorities (Swedish language content reflecting a concern with migration quotas and redistribution of refugees, German language without a clear dominant topic). This may be because German is spoken in two countries with widely different positions vis-à-vis receiving refugees (Germany—initially welcoming, Austria—very restrictive). Perhaps one commonality is that in both high refugee volume cases, critical, opinionated topics are slightly more frequent than in the lower volume countries. Beyond these traits, there is little to support hypothesis 3a, that factuality is higher in low-volume countries.
The complex interplay between tweet toxicity, theme, and link sharing is captured by a Bayesian Network (Figure 5). The relationships between themes are represented by directed arrows, with green indicating a positive association found across languages and red indicating a negative one. The width of these arrows represents the average strength of association across the language groups. For instance, the broad green arrow from the theme “Islam” to “toxicity” indicates that toxicity was particularly high in tweets with that theme. Furthermore, we see that a tweet from that theme was relatively likely to include a link to new media, but relatively unlikely to include a link to legacy media. Since Bayesian Network also models indirect relationships, we could capture the moderating effect that toxicity plays in the likelihood to share certain link types. Concretely, we see that the theme “Islam/Muslim” has a negative association with legacy media sharing and that this already existing tendency is amplified through toxicity. That is, high toxicity in tweets from that theme further decrease the likelihood of that tweet including legacy media links. Bayesian Network (consensus model across all languages). Themes in rectangles, link-types in circles. Green arrows indicate positive, red negative, associations between two variables. Width indicates average strength of association across the language groups.
In line with our previous bivariate analysis, the network shows that tweets with legacy media links are generally associated with lower toxicity. On the other hand, new media links are prominently featured in tweets that revolve around themes with the highest levels of toxicity like “Islam/Muslim,” “Crime/Terror,” and “Causes of Migration.” Our analysis thus provides evidence supporting Hypothesis 3, namely, that the two link types serve unique functions, where legacy media seem to be common around relatively sober discussions of events, while new media seem to be shared primarily in the context of polarizing debates that also only represent a minority of the overall migration discourse.
The results from our consensus Bayesian Network can be further understood and summarized through Figure 6. For the x-axis, we computed the media sharing tendency, that is, the probability of a tweet within a theme including a new media link minus the probability of including legacy media links. The figure shows, per theme, the average toxicity levels and the likelihood to include links across all language groups. We see a clear pattern where toxicity levels negatively correlate with the propensity to include legacy media links. Scatterplot of themes’ average (across all language groups) toxicity and media sharing tendencies.
Discussion and Conclusion
Legacy media was far more linked to than new media, at a ratio of three to one, and this phenomenon occurred even more intensely at the beginning of the period of high influx of refugees and migrants. We interpret this to mean that legacy media was considered a more relevant reference for Twitter users faced with the uncertainty of an unexpected event, and thus consider that legacy media retained its agenda-setting power in comparison to new media. Like for Langer and Gruber (2021), legacy media appeared to be used to initiate the social media discussion about migration and to sustain it over time.
The toxicity of remediated content is significantly correlated with the presence of new media links. Moreover, as the proportion of new media links shared increases during the period studied, so does the toxicity level, fitting with the picture emerging from Fawzi (2019). Since legacy news sources do not publish different content on different platforms (Hase et al., 2023), this longitudinal trend cannot be attributed to the particulars of Twitter, so the only explanation remains that remediated content originated in digital-native sources harbors more negativity.
There was a significant difference also in the type of subjects brought up in the remediated content referring to traditional versus new media links. Tweets that linked to legacy media covered subjects such as information about the EU politics, the distribution of refugees in the EU according to the Dublin Agreement, the EU agreement with Turkey to control/stave off the incoming refugees or migrants and the reception conditions in the ad hoc camps set up for them. These topics are similar to those found in legacy media by Heidenreich et al. (2019). Tweets containing new media covered some overlapping topics, such as national politics or the reasons why people were made to move, but primarily and exclusively covered two subjects: one connecting refugees and migrants with criminality and terrorism, the other problematizing their religious identities and seeing them as Muslims. A third topic, signaling protest and disagreement, fell in between legacy and new media remediated content, and since we could not detect the stances of the speakers, we cannot ascertain if the users raising a critical voice did so to criticize the insufficient help the migrants were given or the too permissive attitude of public authorities towards the incoming people on the move. This warrants us to say that legacy media appears to function as a provider of facts, of event-related information, whereas new media appears to focus on opinionated commentary regarding the events. Thus, their functions for the Twitter users remained distinct.
Our results need to be interpreted holistically. One of the most interesting take-aways is that remediation by Twitter users reproduces the difference in attributes and functions between legacy and new media. One potential explanation for this phenomenon may be that Twitter users take cues from the media they link to, exhibiting clear parallelism in terms of subject matter and format (Winter et al., 2016). The professional logic followed primarily by legacy media translates into more fact-oriented remediation in a moderate tone that does not contain acute negativity. New media that produced migration-related content appeared to cater to populist audiences (Stier et al., 2020) and to be embedded in negativity laden tweets that covered issues such as anti-Muslim sentiment, a topic that attracts toxicity (Salminen et al., 2020; Sandberg et al., 2023).
We did not find significant instances of negative remediation of legacy media content. Thus, those who share traditional media links do not criticize their content. Moreover, the presence of content leaning towards more factual topics and more moderate tone shows that Twitter is not an arena for vitriolic commentary and anti-immigration or anti-Muslim content. A potential future area of research would be to test via surveys whether the remediation reflected in Twitter content is a cue effect from the media or an expression of a performed public identity (e.g., users linking to a new media source would do so to demonstrate their political stance vis-à-vis migration and their belonging to a group of like-minded anti-immigration users).
The comparison across language spheres did not form a coherent picture. Some trends are similar across all six language clusters, such as the over volume and peaks of tweets about migration (volume peaking in 2015 and 2018), the overall trends for toxicity (upwards), as well as traditional media sharing. Sharing legacy media was most frequent in German and Danish (at peak constituting almost 30% of all remediated tweets, for Danish, and 29% for German), and least frequent in Italy, which could be explained by the level of trust in traditional media in the countries where these languages are spoken.
The sharing trends for new media were incoherent across our data, except being overall a low frequency trend (at most 11% of the tweets containing links, for the Dutch language). For Dutch and French, new media was remediated frequently at the beginning of the period but attenuated over time. For Danish was entirely the opposite, being very low in 2014–15, but climbing up throughout the period. German new media was hardly shared at all in 2015, but peaked in 2018 and afterwards declined, similarly to Swedish (where the volume peaked in 2017 instead). Italian stayed rather low throughout but had a peak in 2018. The difference in the new media ecosystems in each of these language spheres may be the most plausible explanation for the very different trends observed, something that future research may explore further.
Twitter is a platform where “hard” news about politics, economy, and foreign affairs tend to be discussed (Kalsnes & Larsson, 2018). News about migration, a “hard” topic, were thus present on this social medium for the entire period studied. The tone and content of remediated media about migration were not particularly toxic or opinionated. This could be explained by the Twitter demographics (professional journalists, politicians, civil society activists, and organizations) and by the public visibility setting of messages on the platform. The same topic of migration might be remediated differently on social media platforms that are structured in private groups (such as Facebook) or on encrypted messaging apps such as WhatsApp, where conversations tend to propagate non-factual information (e.g., conspiracy theories as per Theocharis et al., 2023; Reuters Institute for the Study of Journalism, 2022; Straub-Cook, 2018).
The clearest cross-language trend, valid across all six cases, is that toxicity is correlated with certain topics: Islam/Muslim, Criminality/Terrorism, and to some extent Protest and critique. In turn, these topics are the domain of remediated new media, which can be interpreted that new media serves the function of partisan commentary for anti-immigration voices and that the tone reflects their hostility towards refugees and migrants (especially those associated with Islam). Future work may investigate the types of accounts that remediate new media, the content of the linked sources, as well as the possible relationship with real-world events such as elections.
In conclusion, our extensive six language comparison reveals Twitter’s role not as a hub for negative commentary but as a space where factual discussions on migration prevail (at least in the presence of shared links), challenging stereotypes of social media discourse. The distinct patterns of media sharing, especially the higher sharing of legacy media in most languages and the fluctuating attention to new media, underscore the complex interplay between media trust, user identity, and the platform’s role in shaping public discourse. As we look towards future research, the exploration of user motivations for media sharing, the impact of Twitter on public opinion, and the differential remediation of migration topics across various social media platforms stand out as promising avenues for deepening our understanding of the digital public sphere’s influence on societal issues. This study lays the groundwork for further investigations into how social media platforms mediate discussions on critical topics like migration, offering insights into the dynamics of digital discourse about politics.
Supplemental Material
Supplemental Material - The Re-mediation of Legacy and New Media on Twitter: A Six-Language Comparison of the European Social Media Discourse on Migration
Supplemental Material for The Re-mediation of Legacy and New Media on Twitter: A Six-Language Comparison of the European Social Media Discourse on Migration by Mike Farjam and Anamaria Dutceac Segesten in Social Science Computer Review
Footnotes
Author Contributions
MF provided the analysis and ADS the theoretical framing. The rest of the work was shared equally among the authors.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/ or publication of this article: This article is part of PROTECT The Right to International Protection: A Pendulum between Globalization and Nativization? (
), a research and innovation project funded by the European Union’s Horizon 2020 Framework Programme and coordinated by the University of Bergen (Grant Agreement No 870761).
Data Availability Statement
Supplemental Material
Supplemental material for this article is available online.
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
