Abstract
An evolving body of research generally referred to as visual politics has brought the heavy research focus on linguistic modalities of political communication closer to parity with visual emphasis. The study reported here transcends this schism by joining momentum toward multimodality as an ontological departure point for research. We expanded an existing visual instrument into a multimodal one and provided evidence that it reliably captures character framing of political candidates (stateliness, compassion, mass appeal, ordinariness, and sure loser) in German, Polish, and United States commercial online news. We focused on election coverage in these countries because they represent three distinct political and media systems (democratic-corporatist, polarized-pluralist, and hybrid) of the Global North. The quantitative content analysis sample we used spans 2,688 online news stories with seven political candidates identified in 6,560 cases across six modalities (still images, moving images, frozen video images, text, audio, and superimposed text). We found support for the hypothesis that political and media system latencies affect how news media frame the character traits of political candidates in both visual and linguistic modalities. Specifically, the competitive tendencies of majoritarian democracies manifested more clearly as candidate-centered, simplistic, and polarizing character framing in US media content than in journalistic output of multiparty consensus democracies. For example, US news media were more consistent in their portrayal of election winners and losers than German and Polish news media, emphasizing stateliness, compassion, and ordinariness in the winner while unambiguously assigning the negative sure loser frame to the election loser.
Keywords
In contemporary multimodal online news environments, users move seamlessly between text, still images, audio files, and video snippets. Despite the ubiquity of multimodal news, framing research has overwhelmingly focused on either linguistic (Matthes 2009) or visual (Grabe and Bucy 2009) modalities, particularly in the context of election campaigns. These studies have developed reliable and validated instruments for measuring candidate framing, but they have mostly examined a single country and modality. Despite the valuable contributions of such unimodal framing analyses, they fail to capture the biformity of visual and linguistic modalities in multimodal news (Geise and Baden 2015) that shape evaluations of political candidates and electoral outcomes globally (Boomgaarden et al. 2016). Specifically, character trait framing has been shown to influence voter impressions of political candidates and how they vote (Ha and Lau 2015).
The contribution of the study reported here is twofold. First, we extended the visual character frame instrument of Grabe and Bucy (2009) to measure linguistic modalities and tested the co-occurrence of visual (still images, moving images, and frozen video images) and linguistic (text, audio, and superimposed text on frozen images) character framing of candidates who ran for the chief executive political offices in Germany, Poland, and the United States during recent elections (2019–2021). Second, we conceptually fitted the multimodal coding system for character frames to structural indicators of political (Lijphart 1999) and media system (Humprecht et al. 2022) influences on election campaign coverage. The countries we selected represent three distinct political and media systems (democratic-corporatist, polarized-pluralist, and hybrid), affording the comparative logic of a most different system design (Przeworski and Teune 1970). Comparison of the most used commercial news websites (newspapers, television, and native online) in these democracies can be expected to bear the traces of systemic differences. We chose online coverage because of the inherent multimodality of content on these platforms and the fact that voters in the three countries primarily use online sources for news information (Newman et al. 2023).
This work offers both methodological and theoretical contributions. First, we extended a validated visual character frame instrument for multimodal use and provided statistical evidence of its internal visual-linguistic consistency and robustness to capture framing incidents across modalities, story formats, and character frames. Second, employing this instrument in election coverage across three countries produced a point of evidence that macro-level systemic differences (Hallin and Mancini 2004; Humprecht et al. 2022; Lijphart 1999) are observable in media coverage of elections. We open with a discussion of why multimodal character framing of political candidates matters for the accumulation of knowledge about contemporary elections. From there, we explicate how multimodal character framing might be dependent on political and media system classifications and generate hypotheses at the intersection of these two theoretical positions. In the method section, we elaborate on the multimodal instrument we developed and report statistical evaluations of its robustness. Based on the reported results, we draw conclusions about the implications for political communication research.
The Case for Multimodal Studies of Elections
For more than four decades, political scientists have offered growing and layered evidence that the electorate cast their votes based as much (or more) on perceived character traits of political candidates than where they stand on policy issues (Ha and Lau 2015; Miller and Wattenberg 1985; Pfiffner 1994). In response, political communication scholars have built instruments to monitor the news media’s construction of candidate character and personality traits and identified commonly occurring frames (Gans 1979; Miller et al. 1998). These frames manifest as “interpretative packages” (Gamson and Modigliani 1989: 3) through selection of and emphasis on character traits that have mostly been measured in linguistic modalities. Honesty, competence, experience, dedication, and patriotism count among the most enduring character frames in news coverage (Holian and Prysby 2014; Jamieson and Waldman 2003). Yet, as a number of experiments have indicated, political leadership is entangled in visual displays of emotion and symbolic imagery during campaigns. For example, Willis and Todorov (2006) reported that voting decisions are shaped during 100 ms of exposure to video snippets of political hopefuls. In response to such evidence of visuals shaping voting decisions, a study on candidate character framing explicated, operationalized, and tested three major visual frames in US presidential election campaigns (Grabe and Bucy 2009). These are the ideal candidate (subdimensions of stateliness and compassion), populist campaigner (subdimensions of mass appeal and ordinariness), and a sure loser. The study reported here re-fitted this instrument for linguistic framing (see Supplemental material).
The ideal candidate frame refers to a duo of winning qualities, including the allure of stateliness that signals readiness for political command on the world stage and authentic emissions of compassion in love for god, family, country, and voters. This frame has been associated with well-documented leadership behavior of alpha males who display strength and power in addition to compassion and protective rituals to secure dominance (Schubert and Masters 1991). The populist campaigner frame is rooted in democratic rather than authoritarian strands of populism (Bugarič 2019). Built on the idea of government of the people, by the people, and for the people, this frame captures candidates as one of the people (ordinary) while also having extraordinary support among them (mass appeal). This is a limited take on populism given the global rise of authoritarian brands of populism over the last two decades. Yet, as Bugarič (2019) and Canovan (1999) remind us, populism is notoriously difficult to pinpoint because of its dynamic adaptation to environmental circumstances. It is important to note that the instrument designed to capture the populist campaigner character frame was not calibrated for detecting the anti-pluralist and institutional distrust leitmotifs that dominate authoritarian populism of politicians such as Orbán, Kaczyński, and Trump (Bugarič 2019). Finally, the sure loser frame captures candidate behavior that is distinctly unleaderlike, including weakness, anger, inappropriateness, and encountering disapproving crowds or small turnout of supporters on the campaign trail (Grabe and Bucy 2009).
This visual character frame instrument has been validated in election coverage in other countries, mostly in the Global North, in terms of journalistic and self-presentation of candidates (Lee 2016; Steffan 2020; Uluçay and Melek 2024). It exemplifies the evolution of visual emphasis in political communication scholarship that was seeded in a period (1980s to mid-1990s) of scholarly emphasis on linguistic manifestations of framing and stubborn dismissal of images as carriers of political information. Graber (2001) called this research sensibility the Gutenberg legacy and led concerted efforts (late 1990s to 2000s) to take images seriously as conduits of political information, including candidate character framing (Coleman and Banning 2006; Messaris and Abraham 2001; Moriarty and Popovich 1991).
Inspired by the visual turn in political communication research, a new phase is fomenting around multimodal approaches to political communication research to comprehensively account for the ways that images, sound, and text are co-occuring in emerging contemporary media formats. For example, newspaper stories typically include written text with still images; television news is per se audiovisual; and online news features both legacy media types in addition to new hybrid formats such as superimposing text on frozen images with ambient sound or music. Thus, news users who frequent online environments are exposed to multimodal information (Newman et al. 2023) in a growing array of legacy and emerging formats. Harcup and O’Neill (2017) offered evidence that multimodal news is more likely to be selected and evaluated as newsworthy than monomodal stories. We join the momentum toward multimodal research to understand how candidate character contests play out in three different political and media systems.
Multinational Comparisons of Multimodal Character Framing
Existing research has shown that macro-level systemic factors influence how the news media frame election campaigns (Dimitrova and Strömbäck 2012). We expect candidate character frames will be conduits of these systemic differences. First, the candidate frames we are measuring here are grounded in an ethological approach to studying leadership. From this perspective, dominance of a winner is at the core of apex leadership contests, which stand in sharp contrast to the egalitarian and coalition-prone sensibilities of democracies (Somit and Peterson 2001). How the candidates participating in these competitive political matches are framed by media operations and reconciled with democratic ideals can be expected to reveal the nuances of systemic differences. Second, political scientists often treat leadership contests as rituals that reveal widely held hopes, aspirations, and values of a society (Edelman 1985), which can be expected to manifest in election coverage and imprinted on the character traits of candidates competing for the highest leadership position of that system. Third, the character frames we are working with are conceptually compatible with political (Lijphart 1992, 1999) and media system (Hallin and Mancini 2004; Humprecht et al. 2022) classifications and thereby support the generation of hypotheses, as we will demonstrate later in this section.
Political and Media System Classifications
Lijphart’s (1999) typology distinguishes between majoritarian (concentrated power) and consensus (power sharing) democracies. Inspired by Lijphart, Hallin and Mancini (2004) proposed a typology for classifying media systems. It was recently updated by Humprecht et al. (2022) who clustered thirty countries into three groups (democratic-corporatist, hybrid, and polarized-pluralist) based on four dimensions. First, inclusiveness of the media market refers to the reach of news information to traditionally under-served groups, including women, ethnic minorities, and people at lower levels of the social hierarchy. Second, political parallelism accounts for the political neutrality of a country’s news media which influences political polarization in media content and use. Third, journalistic professionalism reflects the level of journalistic autonomy, public service orientation, credibility, and professional norms of news industries. Fourth, the role of the state indicates the degree and direction in which the state intervenes in the media sector. Germany, Poland, and the United States belong to different political and media system clusters, allowing for a different system design (Przeworski and Teune 1970) for testing hypotheses that online news media in these countries will produce differences in candidate character framing. Table A1 in the Supplemental material offers a summary classification of the three countries we studied, as a supporting map for the following discussion.
Germany (DE)
Lijphart’s (1999) idea of a consensus democracy is exemplified by DE’s multiparty parliamentary system, mixed-member proportional electoral system, and party-based politics. In terms of the Humprecht et al. (2022) typology, DE belongs to the democratic-corporatist cluster, characterized by an inclusive media market, low levels of political parallelism, high levels of journalistic professionalism, and a strong role of the state to facilitate pluralism in the media sector. The 2021 Bundestag election marked the end of the 16-year Merkel era with media coverage fixated on which of the three candidates—Olaf Scholz, Armin Laschet, and Annalena Baerbock—could adequately replace Merkel and represent DE on the world stage. Scholz and his Social Democratic Party won 25.7 percent of the election, compared to Laschet and Merkel’s Christian Democratic Union drawing 24.1 percent of the vote, and Baerbock’s Greens 14.8 percent. Scholz formed the so-called traffic light coalition, representing the colors of the respective parties: Social Democrats (red), the Liberals (yellow), and the Greens (green).
Poland (PL)
With a semi-presidential and multiparty parliamentary system that ensures party-list proportional representation, PL exhibits characteristics of consensus democracies (Lijphart 1992, 1999). Humprecht et al. (2022) classified PL as a polarized-pluralist. Compared to DE it has a smaller media market reach, higher political parallelism, lower journalistic professionalization, and less state support. PL’s 2019 parliamentary elections had the highest voter turnout (61.7%) since the fall of communism in 1989 (Markowski 2020). This rising tide of political participation was attributed to the polarizing policies of the ruling Law and Justice Party (PiS) that mobilized disengaged voters on both sides of the ideological divide. The PiS formed the United Alliance, a coalition with Sovereign Poland and the Agreement Party led by incumbent Prime Minister Mateusz Morawiecki. They won 43.6 percent of the vote, the largest winning margin in the history of the country’s parliamentary elections. The rival Civic Coalition, led by Małgorzata Kidawa-Błońska, received 27.4 percent of the vote and was criticized for the late announcement of her candidacy and poor fit for the coalition (Markowski 2020).
United States (US)
Lijphart (1999) classified the US as a majoritarian democracy with a presidential system, two major parties, majoritarian electoral outcomes, and candidate-centered politics. High concentration of power gives congressional majorities and the president license to shape policies. The hybrid media system cluster (Humprecht et al. 2022), to which the US belongs, is positioned between democratic-corporatist and polarized-pluralist on all media system dimensions. The 2020 US presidential election campaign took place at a time of high political polarization during the COVID-19 pandemic between Republican incumbent Donald Trump and Democratic challenger Joe Biden. Biden followed social distancing and masking guidelines and spent much of the campaign in his basement connecting with the electorate via social media. Trump ignored pandemic rules and held mass rallies in COVID-19 hotspots around the country. With more than 80 million votes, Biden received the most ballots cast for a candidate in US presidential election history.
At the Intersection of System Dimensions and Candidate Character Framing
Lijphart (1999), Hallin and Mancini (2004), and Humprecht et al. (2022) encouraged researchers to employ the typologies they created to trace the contours of systemic differences between countries and test if they manifest in media content. Here we follow their lead and in doing so turn a descriptive pursuit of content patterns into an assessment of systemic influences on media content. To test if specific character frames have roots in systemic factors requires theoretically grounded expectations. Even when collected data confirm expectations, triangulation through studying different countries, news outlets, and candidates over time would be needed to make substantive logical inferences about the influence of political and media systems on media content. In this sense, the study reported here can, at best, serve as one evidentiary point in this process.
A logical starting point for this study’s hypothesis testing is a comparison of how much emphasis commercial online news media placed on character framing of politicians (H1) and how polarized these framing occurrences were across media outlets (H2). First, it is reasonable to argue that character framing is a trademark of candidate-centered majoritarian politics associated with the US two-party system (Holian and Prysby 2014; Wattenberg 2009). One would expect less media emphasis on character framing in coalition multiparty political systems represented by DE and PL.
H1: The visual and linguistic character frame counts for US online news coverage of candidates will be higher than DE and PL frame counts.
Second, the boom in online news outlets has been linked to global increases in media polarization (Van Aelst et al. 2017). Among the three countries we analyzed, DE is expected to be the least polarized. Consensus democracies with coalition governance tend to avoid negative campaigning and media preoccupation with it, compared to majoritarian democracies. Moreover, DE has an inclusive media market, low levels of audience polarization (Fletcher et al. 2020), low levels of political parallelism, and high journalistic professionalism. In character framing terms that means DE media outlets would be least likely and US media most likely to diverge in coverage. As Humprecht et al. (2022) point out, the US media landscape was transformed over the past two decades into a polarized environment with media outlets aligning with political parties (see also Ad Fontes Media 2020). Although our study did not include media outlets at the extremes of the media landscape (see Supplemental Table A2), it is reasonable to expect media polarization differences between DE and US. Given our focus on commercial online media, we also expect more polarization in US than PL media, despite the higher political parallelism ranking of PL (Humprecht et al. 2022). In fact, Steppat et al. (2022) provides compelling evidence of higher levels of media polarization in the US than PL.
H2: US visual and linguistic character framing will be more polarized than DE or PL coverage.
The next step in the study’s hypothesis testing is to compare ideal candidate, populist, and loser framing.
The ideal candidate
At the heart of the stateliness frame is the focus on the pomp and ceremony of US campaign spectacles (Grabe and Bucy 2009). The historical shift from party to individual-centered political campaigning in the US transformed leadership contests into media-orchestrated celebrity performances (Bennett 1980; Kellner 2021) complete with confetti showers, gratuitous patriotic fanfare, and public hobnobbing with the ruling elite around the world. These displays of power and stateliness remind national and international audiences of the aggregation of power in the US presidency. Commercialization of news and political parallelism afford US journalists the degrees of freedom to frame the candidates they favor (and often formally endorse) in line with this character trait that counts as the winning one (Grabe and Bucy 2009). By comparison, both DE and PL are collaborative in democratic practice, take more issue and party-oriented approaches to campaigns, and lack the cultural appetite for the campaign spectacles that the US is known for. In particular, DE news media with low levels of political parallelism and high journalistic professionalism can be expected to show greater restraint in bestowing stateliness on political hopefuls.
H3a: The stateliness frame will be used more often in visual and linguistic modalities of US than in DE and PL coverage.
The diffusion of power built into proportional representation and coalition governance of consensus democracies arguably diminishes the journalistic preoccupation with candidate capacities for compassion (Vos and Van Aelst 2018). Candidate-centered, winner-takes-it-all, majoritarian US politics is therefore expected to foster a media logic that would be more likely to examine the compassion of candidates. Campaigns are known for carefully orchestrating opportunities to showcase the private lives of candidates and their capacity for emotional intelligence and connection to the electorate (Balmas and Sheafer 2013). The candidate’s family and religious affiliation evoke the idea of a surrogate father of the nation: “. . .protector, provider, and moral compass. . .” as Grabe and Bucy (2009: 104) put it. Moreover, affection displays along the campaign rope lines are standard procedures in US campaigns. In particular, compassion shown to children has become a cliched way of signaling benevolence in leadership (Murphy 2016). Highly developed professionalism of DE news media compared to the US and PL might temper the tendency of favoring personalized scrutiny of a candidate over issue-oriented news coverage (see Stanyer 2013).
H3b: The compassion frame will be used more often in visual and linguistic modalities of US than in DE and PL coverage.
Populist campaigner
The populist campaigner frame manifests in the subdimensions of mass appeal and ordinariness that present the candidate as deriving widespread support as a down-to-earth person who champions the causes of the people. It is reasonable to expect that these frames would surface most prominently in commercial news media coverage of consensus democracies with an inclusive media market, which DE most clearly exemplifies. First, DE is characterized by the distribution of executive power between coalitions and parties—the antithesis to majoritarian power concentration (Lijphart 1999). Second, the inclusiveness of the DE media system facilitates news accessibility beyond elite news consumers (Humprecht et al. 2022). Thus, in line with democratic populist sentiments, DE media are expected to use more populist campaigner framing than PL or US media in both mass appeal and ordinariness dimensions.
H4a: The mass appeal frame will be used more often in visual and linguistic modalities of DE than PL or US coverage.
H4b: The ordinariness frame will be used more often in visual and linguistic modalities of DE than PL or US coverage.
Sure loser
This frame comprises a gotcha approach to campaign coverage, featuring candidates in unflattering and unguarded moments. We expect that loser framing will be most prevalent in US online news coverage for the following reasons. First, weak state control with high media commercialization incentivizes sensational reportage. Loser framing adds drama to news storytelling and thereby draws media user attention. Second, loser framing is arguably a step toward identifying the loser of an election contest. Political parallelism in a free US media system provides impetus for journalists to indulge in power-brokering. Third, majoritarian two-party systems foster news framing of election campaigns as a strategic game of winning and losing (Dimitrova and Strömbäck 2012). The consensual coalition-forming democratic styles of multiparty and proportional representation in DE and PL are expected to divert the media logic of those countries away from loser framing. We therefore hypothesize that:
H5: The sure loser frame will be used more often in visual and linguistic modalities of US than DE and PL coverage.
Horse race election coverage is a dominant political news frame in US election coverage (Farnsworth and Lichter 2011), which likely contributed to socializing US news users into expecting simplistic winner-and-loser reporting of elections (Schmuck et al. 2017). This media logic is grounded in market-driven journalism in a majoritarian democracy with lower journalistic professionalism than in, for example, DE. US online news media are therefore expected to be more consistent across visual and linguistic modalities in the way they frame winning and losing candidates. DE and PL news coverage is expected to be more nuanced in presenting candidates in a mix of character frames across modalities, leading to the next hypothesis:
H6: US online news media will have higher consistency across modalities in frame probabilities for winning and losing candidates than DE and PL news media.
Method
Sample
Using the Reuters Digital News Report (Newman et al. 2023) six popular online news outlets were sampled for each country. Public service news channels play an increasingly peripheral role in the US and PL (Steppat et al. 2022) which directed our focus to commercial news outlets, including two quality daily newspapers, one popular daily newspaper, two TV channels, and one web native outlet (see Supplemental Table A2). The length of campaigns differed between countries, prompting a decision to sample the last six weeks of each election for cross-national comparability. For DE this was August 16 to September 26, 2021; for PL September 2 to October 13, 2019; and for the US September 30 to November 3, 2020. Stories were retrieved using the names of top candidates who ran for the chief executive office in the respective countries. For DE these were Scholz, Laschet, and Baerbock, for PL Morawiecki and Kidawa-Błońska, and for the US Trump and Biden. News items centrally focused on at least one candidate were sampled, resulting in 2,688 stories with 6,560 candidate cases across six modalities. Still, images (N = 1,831) were defined as photographs, typically appearing in the body of articles. Moving images (N = 1,013) are embedded videos that require a user to activate play mode. Frozen video images (N = 161) used with either voice-over narration or superimposed text is a relatively new format of online news. Other forms of visualization in online news coverage, such as infographics, were coded only if they referred to a candidate. The text modality (N = 2,487) refers to written material, whereas audio (N = 994) comprises voice-over narration of video stories. Finally, superimposed text (N = 74) accompanied narratives in video stories—often without audio narration. This is a format often used by Fox News (US) and TVN24 (PL).
Measures
The unit of analysis was the individual candidate shown or mentioned in a campaign story. Although there might be hundreds of potential frames to test for, the coders of this study recorded the presence (1) or absence (0) of 27 categories measuring visual character frames (Grabe and Bucy 2009) in each case. The instrument was also operationalized for linguistic measurement. The ideal candidate frame’s stateliness dimension was measured using seven categories whereas compassion was assessed with six categories. The populist campaigner frame’s mass appeal dimension was measured using four categories whereas ordinariness was operationalized in terms of five categories. The sure loser frame was measured through five categories.
Codebooks were developed for each of the three countries to accommodate cultural nuances and can be found in the Supplemental material. For example, the US campaign coincided with COVID-19 social distancing rules. Small crowds presented as following social distancing practices were coded as COVID-19 crowds instead of small crowds that are indicative of a lack of support. We also coded for differences in ordinary food, considering traditional national dishes in three countries, such as beer and sausage in Germany, pierogi and kotlet schabowy in Poland, and fried chicken in the United States.
Coding Procedure
The study followed a project language approach (Roessler 2012), using English as the lingua franca for the coding instrument, training, and reliability testing, as is commonly done in cross-national comparative content analyses (e.g., Strömbäck and Dimitrova 2011; Wettstein et al. 2018). Each country had a primary coder with the appropriate language proficiency and political knowledge to reliably collect the data. In addition, all three coders had English proficiency and political knowledge of the US. Testing the categories and training the coders extended over the course of five months. During this time, minor refinements were made to the codebooks. In line with the project language procedure, a random selection of 10 percent of the US sample was coded by all three coders. The Krippendorff alpha coefficient for the visual character frames was 0.95 and for the linguistic character frames 0.91 (see Supplemental Table A3). The data that support the findings of this study are available in the Open Science Framework repository at https://osf.io/dysr6.
Evaluation of the Coding Instrument
The viability of the multimodal instrument we adapted and applied here was assessed in a number of ways. First, our instrument was validated by coder reliability assessments that produced Krippendorff alphas of comparable size as reported earlier.
Second, was an assessment of the instrument’s success in capturing linguistic and visual framing incidents at comparable rates across modalities, story formats, and individual frames. There were 3,005 visual framing instances compared to 3,555 linguistic instances, offering evidence that the linguistic instrument was as efficient in capturing character framing as its visual counterpart. We also examined modal formats individually for frame counts. All six modal formats generated frame counts, albeit at different rates. In terms of visual modalities, full motion video (M = 3.77) generated more framing incidents per candidate appearance than frozen video images (M = 2.79) and still images (M = 2.29). Linguistically, textual content (M = 2.37) was more likely to generate frames than audio (M = 2.16) or superimposed text (M = 1.92). Importantly, character frames were captured in all modal formats which indicates the instrument’s multimodal pliability. Moreover, individual character frame occurrences were tested for modal consistency. As Figure 1 indicates, visual categories were about twice as likely than linguistic categories to capture the ideal candidate frame and its two subframes, stateliness, and compassion. Ideal candidate framing occurred twice as much visually than linguistically in DE and US content and three times more in PL, indicating modal imparity for this character frame in all countries studied. Populist framing in DE had modal parity, whereas US and PL news content carried populist framing more prominently in visual modalities. The loser frame, which appeared infrequently, was about seven times more likely to be detected by linguistic than visual categories for DE and US news media coverage. For PL loser framing was about four times more likely to be detected in linguistic modalities (see Supplemental Figure A1). From this one could conclude that there is more variance in modal imparity across character frames than across countries. The ideal candidate frame finds more expression visually than linguistically, whereas populism surfaces with more modal parity, and the loser frame is associated with linguistic dominance.

Visual and Linguistic Counts per Subframes.
Third, we tested the visual-linguistic consistency of character frames, calculating the odds ratio that a given frame would occur visually, given that it had occurred linguistically: for each frame f the ratio p(fv|fl)/p(fv|¬fl), where p(fv|fl) is the probability that a story, which contained the linguistic frame f would also contain a visual representation of f, and p(fv|¬fl) is the probability of a visual representation of f given that it did not contain a linguistic representation of f. An odds ratio greater than 1 indicates that the occurrence of a linguistic representation of a particular frame increases the likelihood of a visual representation of that frame in the same story. We also calculated converse odds ratios p(fl|fv)/p(fl|¬fv). From Table 1 it is clear that odds ratios are all larger than 1, indicating all frames have congruency in both directions. Mass appeal has the strongest (4.895) congruency, with the likelihood of its visual representation especially high if linguistic framing is present. This analysis offers evidence that the multimodal instrument we developed has internal consistency in detecting character frames across modalities. It also demonstrates the modal redundancy of character framing in our data set.
Odds Ratios for Character Frame Congruency.
Results
Additive indices for visual and linguistic character frames were created separately to produce continuous data (frame counts) for the three major frames and two subframes under investigation. One-way ANOVA procedures with Games-Howell post hoc assessments were used to test for modal and cross-national patterns in character framing. A number of other procedures were used to test polarization clustering and frame probabilities.
Prevalence and Polarization
The first hypothesis predicted that US journalists would employ character framing more often than their DE and PL counterparts. It found strong support in the dataset, Welch’s F(2,3536.40) = 63.05, p < .001. Games-Howell post hoc tests show significant differences between the US (M = 2.69) and PL (M = 2.13; p < .001), the US and DE (M = 2.59; p = .05), and DE and PL (p < .001).
We also examined modal variance across countries, where significant differences were detected. US (M = 3.06) media used visual framing of candidates the most, followed by PL (M = 2.82) and DE (M = 2.61), Welch’s F (2,1614.35) = 22.12, p < .001. Comparisons (N = 3,005) revealed significant differences between all three pairs: US and DE (p < .001); US and PL (p = .016); and PL and DE (p = .039). The analysis of linguistic character frames (N = 3,555) also produced significant differences, Welch’s F (2,2113,13) = 257.74. DE (M = 2.57) news media employed linguistic character framing more often than their US (M = 2.44) and PL (M = 1.51) peers. Pairwise comparisons showed significant differences between DE and the US (p = .041), DE and PL (p < .001), and US and PL (p < .001). The modal differences between the US and DE, in particular, offer tentative evidence that candidate-centered majoritarian politics of the US is associated with more visual character framing, whereas linguistic framing manifests more clearly in media coverage of coalition multiparty political systems, such as DE.
H2 predicted more polarized coverage in the US than DE or PL. To test this hypothesis, we measured the differences in frame probabilities for the top two candidates in every country. Given that we have only six media outlets per country, statistical tests are not appropriate. Yet, the visualization of the data offers apparent confirmation of H2 (see Figure 2). Overall, US media have a notably wider spread in covering two leading candidates than the other two countries, indicating higher polarization. Among the US outlets, Foxnews.com is a polar point on three of the five frames. Stateliness and sure loser frames drew the most notable polarization in US media, which also varied most noticeably from the other two countries. 1

Polarization of Media Outlets.
The ideal candidate
Hypotheses 3a and 3b predicted that US media would be more likely than DE or PL to use the frame’s two subdimensions. Pairwise comparisons of countries are summarized in Table 2.
Pairwise Comparisons of Countries by Character Frame.
Note. Entries are mean differences and lower and upper limits of the 95% confidence interval for each mean difference. Visual instances: DE = 1,271; PL = 635; US = 1,099; Linguistic instances: DE = 1,257; PL = 713; US = 1,585; *p < .05, **p < .01, ***p < .001.
H3a was partially supported, with US dominance in linguistic coverage of the stateliness frame. PL data produced simultaneously the highest visual (although not statistically higher than US) and lowest linguistic stateliness frame scores. DE visual stateliness coverage was significantly lower than the other two countries but DE linguistic frame counts were second highest. H3b was also partially supported in US visual coverage of compassion. DE and PL had significantly lower frame counts. Yet linguistically, DE media lead in compassion frame counts followed by the US and PL.
Populist campaigner
The fourth set of hypotheses postulated that mass appeal (H4a) and ordinariness (H4b) would be used more often in DE than in PL and US election coverage. H4a was not supported. Counter to prediction, DE news media were significantly less likely than US and PL outlets to invoke the mass appeal frame visually. DE had similar levels of linguistic mass appeal framing than the US, which had the highest counts. PL trailed the US and DE in linguistic mass appeal framing. Unlike H4a, H4b was supported. DE news media used the ordinariness frame in both visual and linguistic modalities more often than PL and US news media.
Sure Loser
H5 predicted more sure loser framing in the US than DE or PL media but did not find support in this dataset. In fact, the US and DE media employed the frame at near-equal levels in both modalities. PL data did not produce a single case of visual loser framing and the lowest linguistic frame counts for sure loser. H6 predicted that US online news media would be more likely to be consistent in visual and linguistic framing of candidates as winners and losers than DE and PL news media. To test this hypothesis, we calculated the probability that each frame would occur for a candidate in a story. As shown in Table 3, our data offer support for this hypothesis. Biden had the highest probability to appear in the stateliness, compassion, and ordinariness frames both visually and linguistically, while Trump had the highest probability to appear in the mass appeal and sure loser frames both visually and linguistically. By contrast, DE and PL data did not produce this pattern. For DE, Laschet was the only candidate with the highest probability of appearing both visually and linguistically in a frame. In this case, it was stateliness. PL data showed no candidate with the highest probability to be featured visually and linguistically in any frame.
Frame Probabilities for the Seven Candidates.
Note. Bold numbers indicate the candidate with the highest frame probability in each country’s election, per frame.
Discussion
The study reported here was inspired by the momentum of a multimodal turn in political communication research and contributes a viable instrument for capturing candidate character frames multimodally. As reported, there is evidence for optimism in recommending this multimodal research instrument for future studies. It showed the capacity for capturing character frames across six different storytelling modalities, produced acceptable coder reliability, and revealed the visual-linguistic redundancy of character framing using sampled material from three different countries. However, there are notable limitations to this instrument. First, it is effective at measuring co-occurrences of linguistic and visual character frames, but it does not capture the process whereby linguistic and visual modalities interact to produce meaning. In this sense, our instrument accomplishes what Dan et al. (2020) calls redundancy, which they distinguish from concurrency (different modalities of different frames interact with each other) assessed through association rules learning procedures. The Dan et al. (2020) study was conducted on video news framing. Our sample contains a mix of static and full motion material, making a congruency assessment inapplicable to a large part of our sample. Second, our coding instrument is calibrated for detecting democratic populism in character framing, which is vastly different from authoritarian populism, which is on the rise in democracies around the world. This anti-pluralist brand of populism, sometimes referred to as illiberalism, indulges rhetoric that offers simplistic answers to complex economic and political problems. It thrives on constructing enemies, targeting political and intellectual elites and fellow citizens at the margins of society, including immigrants, underrepresented identity groups, and scientists. By aggrandizing in-group nationalism, authoritarian populists corrode social tolerance and justice and fuel division and polarization. An instrument calibrated for capturing multimodal character framing of authoritarian populist leadership would require substantial conceptual and operational work, which appears urgently necessary for monitoring self-representation of candidates on social media.
Beyond the methodological contribution of this study, the results show that political and media system characteristics are associated with the multimodal character framing of candidates in commercial online news coverage. What we observed gives credence to the encouragement of Lijphart (1999) and Humprecht et al. (2022) to investigate how political and media system differences manifest in media content. In particular, it appears that majoritarian, two-party, and candidate-centered political systems such as the US inspire more multimodal character framing of candidates than consensus, coalition-oriented, multiparty systems do. In this sense, our findings confirm that the evolution toward candidate-centered politics in the US finds expression in the country’s media coverage of elections as more candidate-centered than in two other democracies. Moreover, the results suggest that candidate character framing is more polarized in the US than in German and Polish coverage. Specifically, stateliness and loser framing yielded the most visible polarization of commercial online US news media. DE and PL distributions offer a stark counter to US media polarization. With more data points, the distribution between media outlets could be tested statistically to produce more conclusive certainty. Yet, what we found aligns with expectations and existing evidence of historically high levels of political polarization in the US. Ironically, it seems that embedded in the majoritarian two-party structures of the almost 250-year-old US democracy are the seeds of polarization that, cultivated by media system dynamics, threaten the country’s contemporary social cohesion. By contrast, younger and predominantly consensus-oriented democracies such as DE and PL produced media systems that might be more robust against polarization.
Regarding ideal candidate framing, this study found that majoritarian politics combined with medium levels of political parallelism and journalistic professionalism likely accounts for the high stateliness and compassion framing in the US sample. By contrast, consensus-oriented systems with an inclusive media market like DE appear to be associated with ordinariness framing. These findings were predicted (H3a and H3b) and surfaced robustly in our dataset. It would not be unreasonable to chalk these divergences up to the medley of contextual and immanent forces that shape the characteristic ways in which societies perform public rituals around leadership contests. The competitive undertones of US majoritarianism combined with cultural proclivities for spectacle have channeled media focus toward personalized character contests in which the winner is publicly scrutinized for both dominance and compassionate leadership qualities. In this way, US political and media systems exemplify an ethological course in leadership duels. By contrast, media coverage of political matches in DE and PL featured less ideal candidate character framing and echoed the egalitarian and coalition-prone sensibilities that mark political processes in these two young democracies—which in Lijphart’s (1999) view, are more advanced democratic orders than the majoritarian habit.
Loser framing predictions based on political and media system dimensions did not clearly manifest in predicted ways. We expected US media to be the most prolific loser frame users, yet DE news media employed this character frame at near-equal levels. This unexpectedly high level of loser framing in Germany might be due to three highly covered public mistakes candidates made during campaigning. Baerbock inflated her resume and was accused of plagiarism. Laschet was caught on camera laughing inappropriately while the German federal president spoke at the scene of a flood disaster in West Germany. Scholz was involved in several financial scandals.
The loser frame did produce a number of other noteworthy results, despite the relative infrequency with which it surfaced in content. First, DE and PL framing is marked by multimodal nuance in covering election winners and losers. Yet, the competitive tendencies of a majoritarian two-party system such as the US manifested in the casting of candidates into neatly compartmentalized character frames with unambiguous multimodal emphasis on stateliness, compassion, and ordinariness for the winner (Biden) and multimodal consistency in bestowing the sure loser frame and mass appeal on the election loser (Trump). There is also a contextual explanation for Trump’s high probability of being covered as a candidate with mass appeal. With blatant disregard for COVID-19 pandemic precautions, Trump frequently held mass rallies to satisfy his hankering for large and approving crowds. By contrast, Biden took the rules of social distancing seriously and largely avoided mass rallies. A follow-up chi-square test confirms this interpretation: Biden was more likely than Trump to be depicted with small crowds that followed COVID safety measures in both visual, χ2(1) = 20.91, p < .001, and linguistic modalities, χ2(1) = 35.62, p < .001. Among COVID-19 deniers, Trump’s mass appeal frame might have carried a positive connotation. Yet, among voters who took COVID-19 seriously, these mass rallies might have appeared inappropriate and in line with loser frame qualities.
Second, we demonstrated here that the loser frame manifests more often linguistically than visually while the ideal candidate frame is more likely to show up visually than linguistically. This might indicate that candidates and their campaigns are more successful in controlling the news media’s visual framing than shaping what journalists write or say about candidates. Indeed, this imparity perhaps indicates the parallel visual success of image handlers (1) training candidates for nonverbal leadership behavior while avoiding the public blunders that drive visual loser framing and (2) orchestrating the visual symbolism of love for god, family, country, and voters. These findings align with recent studies about the changing ways in which political operatives aim to wield image control, including producing and distributing high-quality visual material directly to news media (Jungblut and Haim 2023). Yet, journalists seem to exercise more stewardship in framing candidates in the winning ideal candidate frame. Moreover, as watchdogs, journalists are more inclined to highlight the negative aspects of a candidate’s leadership potential. Campaign gossip is known to be part of the symbiotic relationship between journalists and campaign insiders and is routinely reported linguistically.
Third, visual loser framing was absent in PL coverage and linguistically lower than in DE and US news. This might indicate a chilling effect on negative framing given the widely reported attempts of the PiS to devitalize free media operations (Markowski 2020). The Reporters Without Borders annual World Press Freedom Index shows a declining trajectory in PL’s ranking from 18th in 2015, when the PiS lead government came to power, to 59th in the last report before the 2019 election (Santora and Berendt 2019). This drop in ranking is due to PiS attempts to impose control over the public broadcaster (TVP) and punitive action against independent media that criticize the government (Santora and Berendt 2019).
At least two noteworthy limitations of our study provide impetus for future work. First, we base our observations on one election in three countries of the Global North. Machine learning procedures, based on what we have established with human coding, would automate data collection and liberate researchers to pursue large election samples drawn over time and from more candidates and countries. This approach combined with daily tracking poll data of political support for candidates would revolutionize the ability of political communication scholars to monitor, model, and predict election coverage patterns and perhaps track how they influence voting outcomes. It would also provide statistical power for monitoring media outlet polarization on specific character frames. Second, our study provides a notable confirmatory data point in the triangulation process needed to make logical inferences about the influence of political and media systems on election content. In particular, we cannot exclude that contextual factors may have an influence on multimodal character framing of political candidates that interact with system variables. We are therefore cautious not to offer our results as conclusive evidence of causal relationships and call for cross-national and longitudinal analyses of candidate character framing to determine the influence of political and media systems. Computational methodologies would expedite this process of evidence gathering toward building the theory about how structural factors like political system dimensions reverberate through microlevel framing outcomes, observable in media content.
What we report here supports momentum on at least two emerging frontiers of political communication research. We contribute evidence that macro-level structural variance in political and media systems ripples through micro levels of media content—here detected in the character framing of political hopefuls running for office. In addition, this study crosses the Rubicon on the historical tug between visual and linguistic approaches to studying the content and effects of political messages. We argue for multimodality to become the ontological default for media politics research in the future. In an AI age where chatbots and deepfakes are projected to be employed in election contests, multimodal dexterity in media content and effects research could be instrumental to informing detection tool development, regulatory considerations, and media literacy efforts.
Supplemental Material
sj-pdf-1-hij-10.1177_19401612241285665 – Supplemental material for Multinational and Multimodal Character Framing of Political Candidates in Online News: Do Political and Media System Classifications Matter?
Supplemental material, sj-pdf-1-hij-10.1177_19401612241285665 for Multinational and Multimodal Character Framing of Political Candidates in Online News: Do Political and Media System Classifications Matter? by Dennis Steffan, Maria Elizabeth Grabe and Umberto Famulari in The International Journal of Press/Politics
Footnotes
Acknowledgements
The authors would like to thank Etienne Barnard for his support with data analysis.
Data Availability Statement
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
