Abstract
This study tests how a prominent artificial intelligence (AI) product failure influences public attitudes, focusing on Google Gemini’s generation of controversial images. Drawing on the AI alignment literature, we distinguish three moderation goals that differ in how far they depart from data-driven outputs: safety, bias mitigation, and aspirational imaginaries. We use focusing events research to explain how controversies make governance questions salient. In a preregistered experiment with 1756 participants, we tested responses to two image sets: American Founding Fathers (T1) and German soldiers from 1943 (T2). T1 significantly reduced support for bias-related and aspirational moderation and lowered trust in the company, but did not affect safety-based justifications or perceived political alignment. T2 showed the same directional pattern but did not reach significance; pooled results confirmed the main pattern. These findings show that visible product failures can affect public views on AI governance along dimensions most directly implicated by the controversy.
Keywords
Recent successes of artificial intelligence (AI)-enabled systems like ChatGPT, DALL-E, or Midjourney have raised public awareness of generative AI (Rauchfleisch et al., 2025b; Vogels, 2023). These systems, and others like them, have made using generative models easy and user-friendly. Increasingly, AI-enabled systems are used for search, access to the public arena, and as an interface to digital devices (Jungherr and Schroeder, 2023). This has raised the stakes of how these systems are governed, especially regarding how model outputs are moderated and adjusted. Unsurprisingly, concerns about company-driven AI moderation and suspected political or cultural motives have recently sparked controversy (Robertson, 2024). Concerns about the political impact of AI moderation are amplified by research showing substantial ideological differences in the political leanings of the outputs by different models (Buyl et al., 2024). Accordingly, AI moderation and its justifications are developing into an important political and cultural issue for negotiating AI use in society (Jungherr, 2023).
There is broad awareness that public-facing AI-enabled systems need moderation. Unmoderated model outputs might compromise safety or perpetuate social and political injustice (Bommasani et al., 2022), which has led to calls for audits and mitigation measures (Raji et al., 2020). Various technical approaches, including fine-tuning on curated data (Hu et al., 2021), reinforcement learning from human feedback (Ouyang et al., 2022), and system prompts (Touvron et al., 2023), allow developers to adjust model outputs. These adjustments are attempts to align AI outputs with developers’ and users’ goals and values, raising non-trivial questions about whose goals and values are prioritized (Gabriel, 2020; Korinek and Balwit, 2024). AI moderation thus carries an inherently political dimension. In public debate, the question of whose values AI-enabled systems follow is also gaining prominence (Douthat, 2024; Mowshowitz, 2024).
In this debate, cases in which AI-enabled systems fail by producing outputs and decisions glaringly at odds with fact, common sense, or moral expectations figure as focusing events (Birkland, 1998) in which commentators and pundits potentially align along the political–cultural battle lines. Public popularity and broad use have made their capabilities and failures widely visible. Controversial model outputs, prompts for their generation, and follow-up conversations are broadly shared and frequently lead to critical media coverage. Some public failures are driven by biases in models and training data, leading to outputs that reproduce inequalities and biases in society, including racism and sexism (Bianchi et al., 2023; Tao et al., 2024). Others are driven by decisions in the moderation of AI outputs by companies. One recent example is the February 2024 controversy over Google’s Gemini chatbot generating racially diverse images of historically white figures, including the American Founding Fathers (Robertson, 2024).
This study (n = 1756) tests the effects of two controversial and obviously factually wrong model outputs from that episode. We test the effect of two sets of visual responses to prompts asking the model to produce images of the American Founding Fathers (Treatment 1) and German soldiers from 1943 (Treatment 2). We consider the effects on respondents’ attitudes toward three different justifications of AI moderation: safety, bias reduction, and promotion of aspirational imaginaries of a better world, trust in the company providing the model (Google), and the degree to which the company is perceived as aligned with specific political or cultural goals. Using these real-world outputs, we find that exposure to Treatment 1 (Founding Fathers) significantly reduced people’s support for model adjustments aimed at reducing bias or providing aspirational imaginaries of a better world, compared with responses from a control group that received no information. The treatment did not affect support for model adjustment due to safety concerns. Exposure to the treatment reduced trust in Google as a company but did not affect people’s assessments of Google’s alignment with cultural values or political groups. Treatment 2 (German soldiers) did not have a significant effect.
Our findings contribute to the growing literature that foregrounds public attitudes and preferences as a key dimension of digital governance studies (Jungherr and Rauchfleisch, 2025; Kozyreva et al., 2023; Pradel et al., 2024; Rauchfleisch et al., 2025a; Rauchfleisch and Jungherr, 2024; Riedl et al., 2021, 2022). We show that public product failures, like the Google Gemini case, can serve as focusing events that shape perceptions and preferences toward digital governance. Prior work on technology controversies, such as the Cambridge Analytica episode (Weiss-Blatt, 2021), suggests that salient failures can come to dominate public perception and fuel demand for regulation (Jungherr et al., 2026; Jungherr and Rauchfleisch, 2024), a dynamic likely to extend to AI governance as public awareness grows.
AI alignment through content moderation in generative models
With AI’s move from computer science departments and research labs into the general public space, the question of AI alignment has risen in importance (Amodei et al., 2016; Bommasani et al., 2022). AI alignment is a research field that examines whether models follow the intent of their developers and providers or, for different reasons, deviate from that intent (Gabriel, 2020).
In the discussion of AI alignment for generative models, two important areas feature strongly: safety and fairness (Askell et al., 2021; Friedrich et al., 2025; Hao et al., 2023). This discussion focuses on how results provided by generative AI models might be unsafe or unfair and how model adjustments and the moderation of model outputs can mitigate these harms.
Safety and fairness are inherently political concepts (Barocas et al., 2023; Gabriel, 2020), which make alignment and the underlying motives of model adjustments and result moderation into issues of public contestation once adjustment practices, moderation results, and underlying corporate, cultural, or political motives capture broad public attention. Public controversies over product failures stemming from overly lax or overly invasive model moderation can serve as focusing events, crystallizing inherent political tensions.
AI moderation for safety, bias mitigation, and aspirational imaginaries
Drawing on the AI alignment literature (Jungherr and Rauchfleisch, 2025), we distinguish three goals of AI moderation that differ in how far they require outputs to deviate from purely data-driven results: safety, bias mitigation, and the promotion of aspirational imaginaries. These goals differ in how far they depart from data-driven outputs and, correspondingly, in their potential for public contestation. We develop our first three hypotheses from this framework. We expect that widely discussed product failures function as focusing events (Birkland, 1998) that make these otherwise abstract governance questions concrete and psychologically salient, prompting people to update their attitudes toward the moderation goals most directly implicated by the controversy. The product failure literature provides a theoretical basis for this selective updating. Expectancy disconfirmation theory suggests that when product performance falls short of user expectations, the resulting negative disconfirmation can generate dissatisfaction and broader distrust (Darke et al., 2010). Research on scandal spillover further suggests that such negative information does not affect all related evaluations equally, but spills over more strongly to judgments that are more closely related to the controversy (Roehm and Tybout, 2006). Applied to our case, a controversy centered on demographic overcorrection is likely to matter more for moderation goals such as bias mitigation and aspirational imaginaries, which are directly tied to representation, than for safety-oriented moderation.
The broad deployment of AI-enabled services to diverse audiences with varying usage motives, interests, and values underscores the importance of AI moderation. But the multiplicity of use cases, usage motives, and contexts provides challenges to one-size-fits-all model adjustments (Achintalwar et al., 2024; Bakker et al., 2022). The decision about whose values, motives, or contexts AI models should align with makes AI moderation not merely an engineering challenge but also a political one.
There is broad agreement that AI outputs should be grounded in fact; ungrounded outputs can cause significant harm, particularly in high-stakes contexts such as search or education (Maynez et al., 2020; Weidinger et al., 2022). However, there are cases where we may want to intentionally adjust or moderate AI models so that their outputs deviate from purely data-driven results. Such interventions may be motivated by concerns about safety, bias mitigation, or the desire to promote alternative imaginaries (Jungherr and Rauchfleisch, 2025). These motivations vary in the degree to which they are likely to receive public support or provoke contestation.
Safety-oriented moderation
At a basic level, concerns about safety and harm can lead us to want to deviate from purely data-driven outputs (Anwar et al., 2024; Phuong et al., 2024). This includes outputs that threaten public safety, for example, by generating instructions for terrorist attacks or pathogens. However, this also includes outputs that harm people, either by violating privacy or intellectual property rights. In these cases, moderating model outputs to deviate from the facts – as established by data-driven, model-based learning – is necessary. These safety-oriented adjustments are often seen as less contentious than other forms of moderation, though their boundaries and implementation can still be debated. Because safety-oriented moderation is broadly accepted and not directly implicated by the kind of controversy we study, which centers on demographic representation rather than harmful content (Heath, 2024; Robertson, 2024), it serves as a useful discriminant case. Research on product failures suggests that negative expectancy disconfirmation can generate generalized distrust with spillover effects beyond the focal failure, undermining evaluations of other, even unrelated, offerings (Darke et al., 2010). If such generalized backlash extends to AI moderation, we would expect support even for safety-oriented moderation to decline after exposure to a prominent failure. Alternatively, if the backlash is selective, targeting only the moderation goals most directly implicated by the controversy, safety-oriented moderation should remain unaffected. We test the generalized-backlash account:
Moderation for bias mitigation
Another goal of AI moderation is to mitigate bias for fairness (Barocas et al., 2023). AI models have access only to facts as represented in the data, not to facts as such (Smith, 2019). As a consequence, AI outputs are at risk of replicating biases present in datasets (Bianchi et al., 2023; Tao et al., 2024). Having AI trained on biased datasets can be harmful. One could think of bias here as systematic distortions in training data or model outputs that can produce unequal, stereotyped, or misleading representations. Moderating model outputs to mitigate bias would mean reducing such distortions in outputs rather than simply reproducing skewed patterns found in data. Moderating AI output to mitigate bias is likely to be widely seen as a sensible goal. Still, interventions to mitigate bias can create controversy about what counts as “bias” and what constitutes an appropriate correction when training data reflect selective coverage and existing social inequalities.
Because the Gemini controversy directly involved an apparent overcorrection of demographic representation (Heath, 2024; Robertson, 2024), a core concern of bias mitigation, the failure is especially relevant to this moderation goal. We therefore expect that exposure to the failure case will reduce support for this type of moderation:
Promoting aspirational imaginaries
A third position goes further, arguing that generative AI should be moderated in ways that do not present the world as it is but as it should be. In this case, AI should not be bound by facts but by how facts should be. There are long-standing arguments on social justice, social change, and solidarity that culture and discourse should not simply reproduce historical injustices, inequalities, and discrimination but instead should present aspirational imaginaries of society and the future (Benjamin, 2025; Rorty, 1989). Through narratives and images, culture and discourse can expand a sense of “we” to include groups formerly identified as “others” and sensitize people to injustices and discrimination, thereby extending solidarity. Along those lines, moderating model outputs could mean adjusting the distribution of a given variable, even if that distribution mirrors the distribution found in the world, if one sees that distribution as an expression of structural inequality, injustice, or discrimination subject to change. In contrast to bias mitigation, which aims to correct outputs so they reflect the world as it actually is, aspirational imaginary moderation would show the world as it should be in the eyes of the moderators. Outputs from generative AI might thus inspire a more inclusive and just society by being less bound by facts as they are, but by showing a world as it could or should be. At the same time, these interventions are likely most vulnerable to contestation, as they clearly use AI as a tool for social and political change, raising questions about the direction and legitimization of the change pursued.
Aspirational moderation is the most clearly value-laden of the three goals and the most directly implicated by the controversy, in which historically inaccurate outputs appeared to reflect a deliberate effort to present diversity beyond historical fact. Because the outputs can be read as an attempt to show the world not as it is but as it should be, the controversy is even more directly relevant to this goal than for bias mitigation. We therefore expect:
Product failures as focusing events and the effect on opinion formation
The discussion of AI moderation is nascent and often remains abstract. Still, public discussion of controversial model failures – outputs that obviously contradict fact, common sense, or moral sensibilities – can serve as focusing events that lead to public awareness and negotiation of underlying principles. Focusing events are sudden and unexpected, have large-scale impact, lead to strong media attention, and, in turn, inspire public debate and contestation that leads to political salience and potentially even government action (Birkland, 1998). Events like these, such as natural or industrial disasters, are important in shaping media, public, and political agendas. They serve as exemplars contributing to the public resonance of positions and frames within public discourse and can contribute to agenda building and agenda setting on issues of digital and platform governance (Baumgartner and Jones, 2009; Gamson and Modigliani, 1989; Walgrave et al., 2017). More specifically, work on platforms and platform governance shows that regulatory interventions follow public controversies and scandals (Liebig et al., 2024; Marchal et al., 2025). Clearly, events and subsequent media coverage matter in digital governance broadly. We expect similar dynamics for AI governance. In discussions of AI alignment, controversial and publicly discussed model failures can serve a similar function in discourse.
In line with research on focusing events and scandal-driven opinion formation, we treat widely discussed, concrete failure cases as cues that make otherwise abstract governance questions psychologically salient. In the real world, such failures often become widely discussed; in our experiment, we test whether exposure to the failure exemplar alone, without a debate context, is sufficient to shift attitudes. We expect people to form evaluations by attributing responsibility to the provider and by updating their beliefs about which moderation goals are legitimate or risky in practice. The expectancy disconfirmation framework (Darke et al., 2010) motivates our predictions about this updating process: when AI outputs visibly fall short of users’ expectations of factual accuracy, the resulting negative disconfirmation motivates reassessment of both the moderation practices that produced the failure and the company responsible.
Effects of AI moderation on company evaluation
Furthermore, we assume that beyond attitudes directly related to moderation justifications, there may be additional negative downstream effects on the perception of companies. Clearly, attitudes toward companies consist of many considerations. But marketing research has shown that severe product failures, dishonest behavior, or brand scandals can lead to attitudinal punishments for products (Ahluwalia et al., 2000), companies (John et al., 1998), and even companies providing similar products (Roehm and Tybout, 2006). Darke et al. (2010) show that negative expectancy disconfirmation, which occurs when product performance falls short of the expectations set by the provider, can generate distrust that generalizes beyond the focal product to the company responsible for the product. This spillover is especially likely when the product is central to the company’s portfolio. Given Google’s prominent positioning of its AI capabilities as a core part of its product strategy, a failure in Gemini is unlikely to be compartmentalized. This makes it likely that people experiencing a scandal related to AI moderation will express lower trust toward the company responsible. We thus assume:
Second, public debates in the adjacent field of content moderation on social media show that company interventions can be perceived as politically biased. For example, different Republican senators have in the past attacked Facebook, Google, and Twitter for allegedly censoring conservative voices on social media platforms (Romm, 2019). Furthermore, in reaction to these claims, President Trump issued an executive order to prevent online censorship (Trump, 2020), even though prior research could not identify a specific bias in content moderation (Jiang et al., 2019). Transferring these findings to AI indicates that interventions in AI outputs by companies might be perceived as politically biased as well. Moreover, research shows that some AI models have political biases (Buyl et al., 2024), which potentially adds to concerns about political interference in their outputs. In addition, research on attitudes toward content moderation in general (Rauchfleisch and Jungherr, 2024; Riedl et al., 2021) but also specifically on AI-supported content moderation (Wang, 2023) shows that political ideology plays a role in how moderation is perceived. We thus assume that, in the context of AI moderation, companies might be perceived—depending on the direction of those adjustments—as aligned with progressive values or with political actors:
Beyond perceptions of the specific company, moderation controversies may also feed into broader narratives about the relationship between the technology sector and political power. We thus expect:
The Google Gemini case
In February 2024, Alphabet/Google started to offer image-generation capabilities with their AI chatbot Gemini. Quickly, a public controversy arose about the internal adjustments of the model. When querying the model for examples of historical figures, such as the American Founding Fathers or German soldiers from 1943, the model returned historically obviously inaccurate sets of racially diverse people (see Figure 1). Contemporary reporting suggests that this was related to system-level instructions that were not visible to users, instructing Gemini to return diverse sets of people to corresponding user queries (Robertson, 2024). This was a likely response to widespread coverage of AI-generated images tending to return sets of Caucasian people while excluding others (Tiku et al., 2023). Nevertheless, the developers’ intervention to this bias overcorrected and produced model outputs that were obviously inaccurate and misrepresented history. Among technologists, this led to ridicule of the company (Thompson, 2024), and among political pundits, this was a new battle line in the political culture war of a company with suspected progressive values and a progressive political agenda trying to shape reality according to its preferences (Crimmins, 2024). This makes Gemini a promising case to test our hypotheses in a preregistered survey experiment.

(1) Treatment 1: founding fathers; (2) Treatment 2: 1943 German soldiers. Images generated using Google Gemini.
Methods
We test our hypotheses in a between-subjects survey experiment, which was approved by the institutional review board (IRB) of the University of Bamberg and preregistered at Open Science Framework (OSF). 1 We used two treatment groups (T1, n = 587; T2, n = 583) that were confronted with obviously historically inaccurate representations of the world generated by AI and a pure control group (n = 586) in which respondents were not exposed to any information but were surveyed for the different outcome variables.
In both treatment groups, we exposed respondents to an example from Google Gemini’s new image-generation feature (see Figure 1). After its launch in February 2024, many examples of biased image output were shared on social media. Even Alphabet’s CEO, Sundar Pichai, admitted that the images created by this new tool have shown bias and offended users (Heath, 2024). Most examples focus on prompts where users were asking for historical pictures. For our experiment, we selected two of these examples. In the first treatment group (n = 587), we exposed respondents to a screenshot showing a user requesting an image of the Founding Fathers of America. As a response, the tool provides four images that include a Native American Chief, an African American person, and an Asian person.
In the second treatment group (n = 583), we exposed respondents to a screenshot showing a user requesting an image of a German soldier in 1943. The images generated by the AI tool show four soldiers wearing clothes and helmets similar to those worn by German soldiers in 1943. One of the four soldiers is an Asian female, and one is an African male.
For both treatments, we used exactly the same interface, which is a screenshot from a mobile phone with the same prompt. The screenshots were introduced as “a case where AI is utilized in a new product launched by Google.” Thus, the only difference between the treatments is the context and the four pictures (for a replica of the treatments, see Figure 1 and Supplementary Information B.4).
Both treatments represent the same phenomenon but in different manifestations. Both examples are real, and thus, the treatments feature no deception. We use two stimuli as a form of topic sampling to assess whether effects generalize beyond a single exemplar rather than being tied to the idiosyncratic content of one case. Accordingly, we expected similar effects in the general tendency but did not expect identical effect strengths, as the two cases differ in cultural proximity and identity salience for a US sample. Importantly, the stimulus presents only the model output and does not provide information about public reactions, media coverage, or the cause of the output.
In a pre-study (n = 99), we tested whether the treatments participants had seen were perceived as realistic depictions of historical fact (1 – completely unrealistic; 7 – completely realistic). While both treatments were perceived as highly unrealistic, the Founding Father image (M = 2.02, SD = 1.39) was perceived as slightly less realistic than the German soldier image (M = 2.49, SD = 1.71). However, this difference was not significant, Welch’s t(92.43) = 1.50, p = .14.
In total, 1800 participants were recruited from the survey research company Prolific. We used US quota sampling on sex, age, and political affiliation (see Supplementary Information A.2). Participants had to be US-based and aged 18 or older to participate in the study. Participants were paid £0.75 (an hourly rate of £9; we ran the survey through Prolific’s European platform) for their study participation, which took around 5 minutes to complete. We decided to use a sample of 1800 participants as this would give us sufficient statistical power (power > .9) to test our hypotheses and research questions (see Supplementary Information A.1 for the simulation-based power analysis). Overall, the sample size allows us to identify small effects. As defined in the preregistration, 44 participants who failed a simple attention check at the beginning of the study were excluded, resulting in a sample of 1756. On the start page, we informed participants about their rights (e.g. that they could withdraw from the study at any time by simply closing the browser) and asked for their consent. None of the questions asked for personally identifiable information (see also the questionnaire in Supplementary Information B.1 for more details). At the end of the questionnaire, we debriefed all participants and explained the purpose of the experiment.
We randomly assigned participants to either the control group or one of the two treatment groups. First, participants were asked a few general questions about their experience with AI tools, and we showed them a general definition of AI. Participants in the control group continued directly with the questionnaire, whereas participants in the treatment groups saw the treatment and could click after 10 seconds to continue with the questionnaire. The 10-second delay imposed a minimum exposure; respondents could remain on the stimulus page longer before proceeding. 2 Comprehension checks administered immediately afterward indicate that participants attended to the treatments (see Supplementary Information B.3 for the exact treatment screens). We asked two multiple-choice questions with three answer options and an “I don’t know” option as a treatment check (correct answer for content of treatment: German soldiers = 99.3%, Founding Fathers = 98.7%; correct answer for the company launching the AI tool in treatment: German soldiers = 82.7%, Founding Fathers = 84.0%). Afterward, the questions measuring the dependent variables and sociodemographic questions followed. Participants were debriefed at the end of the questionnaire and could then directly return to Prolific.
We tested whether the randomization worked. We found no significant differences between participants depending on which of the three conditions they were assigned regarding age, F(2, 1753) = 1.608, p = .20; education, X2(2, n = 1756) = 1.63, p = .44; race, X2 (20, n = 1756) = 16.51, p = .68; income X2 (24, n = 1756) = 28.90, p = .22; and gender, X2 (4, n = 1756) = 0.49, p = .97.
We measured all items for our outcome variables on a 7-point scale (1 = strongly disagree to 7 = strongly agree). Each outcome variable was measured with two or three items that were then combined to a mean index (see Table 1 for an overview and Supplementary Information B.1).
Overview of variables used for analysis.
We developed three new indices, each consisting of two items, to measure preferences for AI model adjustments (see Supplementary Information B.3 for further validation of the scales via complete exploratory and confirmatory factor analysis). Items were designed to capture the three moderation goals distinguished in the AI alignment literature and refined in a separate pretest (n = 150). To identify respondents’ preferences for AI model adjustments for safety reasons, we asked the following questions: “AI developers should implement ethical safeguards within their models to prevent the generation of content that poses a threat to public safety” and “AI technologies should be designed to respect and protect individual rights and to prevent outputs that infringe on privacy, consent, or intellectual property.”
To measure preferences regarding adjustments with the goal of bias reduction, we used the following questions: “AI companies have an obligation to actively modify their models to ensure outputs promote equity and representation for all” and “AI companies should adjust the outputs of their models to safeguard marginalized groups from bias and discrimination.”
To measure preferences on adjustments with the goal of presenting aspirational imaginaries of the world, we used the following questions: “AI companies should ethically adjust the output of their models to present aspirational visions of the future” and “AI companies should ethically adjust the output of their models to promote universal human rights and values.”
To identify trust in Alphabet/Google, we used a slightly modified question battery from Ingenhoff and Sommer (2010) covering different aspects of trust in companies: “Google manages its digital products safely, avoiding unexpected problems”; “Google always puts ethics first and considers what’s right and wrong before making decisions”; and “Google is open about how its products work, especially when things go wrong.”
To measure respondents’ perception of Alphabet/Google’s supposed alignment with progressive values, we used two items specifically developed for our study: “Google is a company that is strongly aligned with progressive values” and “Google’s products are designed and moderated in ways that bias results according to progressive values.”
Similarly, we used a new index to measure perceptions of technology companies’ supposed alignment with political actors consisting of three items specifically developed for this study: “AI companies currently shape public opinion in a way that aligns with specific social values,” “There is a concerning level of collaboration between AI companies and governments to influence public opinion to benefit those in power,” and “Political actors and AI companies collaborate to influence public opinion in their favor.”
To measure support for free speech, we used two items from Riedl et al. (2021), which were adopted from Rojas et al. (1996). Prior experience with AI tools was measured using two items we specifically created for this study, which ask how often people have used AI tools in their jobs, leisure time, and private lives. We recoded education items as specified in the preregistration: master’s degree or higher to 1, and all others to 0. For an overview of all variables and the items’ wording, see Supplementary Information B.1.
As described in our preregistration, we use covariate adjustment for all our regression models (Lin, 2013) and include support for free speech, prior AI use, political orientation, age, gender, and education as covariates. We added these covariates because they have explanatory power for supporting content moderation in prior research (Riedl et al., 2021; Wang, 2023).
Data collection ran from 1 to 4 March 2024. As we use real examples that were also discussed publicly before the study, prior exposure within the control group might weaken the observed effects in our experiment. Thus, at the end of the questionnaire, we asked if people had already heard about the examples used as treatments in the experiment. Only 18.8% of respondents in the control group had heard of either case before participating in this study. We used that information for an additional analysis with an instrumental variable approach (also preregistered) that showed the same effects as reported in the other models (see Supplementary Information C). 3
Results
To better understand the baseline for the dependent variables, we first discuss the descriptive statistics for the variables measured in the control group (n = 586; see Figure 2). Overall, participants highly agree regarding the need for AI safety moderation (M = 6.12, SD = 1.11). However, support for AI bias reduction (M = 5.36, SD = 1.64) and support for AI presenting an aspirational version of the world (M = 4.87, SD = 1.63) received lower agreement scores.

Attitudes toward AI moderation principles in the control group (n = 586) as density plots.
Google received a mixed evaluation. Its trust mean score was around the middle of the scale (M = 3.94, SD = 1.45), and the company was perceived as slightly aligned with progressive values (M = 4.66, SD = 1.48). Technology companies, in general, were perceived as slightly aligned with political actors (M = 4.36, SD = 1.49). Descriptive statistics for all dependent variables broken down by experimental condition are reported in the Supplementary Information B.2.
We test our hypotheses using preregistered regression models with Lin’s (2013) covariate adjustment (see Figure 3). For this part, we use our full sample, which includes the control group and the two treatment groups (see Supplementary Information for the complete models). Regarding support for AI safety moderation, the data did not support H1. Neither the Founding Fathers (b = −0.03, p = .58, 95% confidence interval (CI): [−0.16, 0.09]) nor German soldiers (b = 0.03, p = .59, 95% CI: [−0.09, 0.16]) affected the support for AI safety moderation. However, people show lower support for AI bias reduction (H2) after exposure to the Founding Fathers example (b = −0.32, p < .001, 95% CI: [−0.50, −0.13]). Exposure to the German soldier treatment (b = −0.17, p = .07, 95% CI: [−0.36, 0.02]) did not affect the support for AI bias reduction. Regarding AI support for an aspirational version of the world (H3), we observe a pattern similar to that for H2. The Founding Fathers treatment led to lower support (b = −0.21, p = .02, 95% CI: [−0.39, −0.03]), whereas the German soldier treatment (b = −0.04, p = .65, 95% CI: [−0.23, 0.14]) did not affect people.

Treatment effects on attitudes toward AI moderation principles.
The Founding Fathers treatment also affected trust in Google. People exposed to the treatment indicated lower trust in Google (b = −0.19, p = .02, 95% CI: [−0.36, −0.03]). While the trust level was influenced, people did not perceive Google as aligned with progressive values (Founding Fathers: b = 0.02, p = .83, 95% CI: [−0.15, 0.18]; German soldiers: b = −0.10, p = .25, 95% CI: [−0.26, 0.07]) or technology companies in general aligned with political actors (Founding Fathers: b = 0.01, p = .90, 95% CI: [−0.15, 0.17]; German soldiers: b = 0.03, p = .71, 95% CI: [−0.13, 0.19]). 4 An additional specification curve analysis (Simonsohn et al., 2020) considering all possible covariate combinations showed that our findings are robust (see Supplementary Information C.4).
We conducted two additional analyses that were not explicitly preregistered. First, as an additional robustness check (not preregistered), we pooled both treatment groups. The pooled analysis mirrors the main pattern: no effect on support for safety moderation, but lower support for bias reduction and lower trust in Google (see Supplementary Information C.5). Second, to formally test the selective-backlash alternative to H1, we used an equivalence test for support of safety moderation. For all tests, we used Cohen’s d of 0.195 from the preregistration as the smallest effect size of interest for the upper and lower bounds of the test (ΔL = −0.195, ΔU = 0.195). We used Welch’s t-tests for the equivalence test. Both the German soldiers, ΔL, t(1166.86) = 2.76, p = .003, and the Founding Fathers treatment, ΔU, t(1165.9) = −2.32, p = .010, show a significant equivalence test (two one-sided tests). Thus, we can assume that the effect on support for AI safety moderation is negligible if we are interested in an effect of at least Cohen’s d = ±0.195.
Discussion
This study introduces a new perspective on AI moderation and, more broadly, AI alignment. We show that people hold varying attitudes on different motivations that drive AI moderation decisions, and that these attitudes vary to different degrees in light of external events, in our case, a public scandal about an AI chatbot producing obviously wrong and unrealistic results. For AI alignment research, our findings show that alignment is not only a technical challenge but also a matter of public perception. Support for moderation differs across goals such as safety, bias mitigation, and aspirational imaginaries, and some goals are more sensitive to controversy than others. This suggests that alignment research should pay attention not only to the values built into model outputs but also to how people perceive and judge these choices, especially after visible failures. This shows that discussions on AI moderation and AI alignment, more broadly, should systematically include attitudinal, psychological, and discursive perspectives. We see that people hold attitudes on these topics, adjust some of them in response to focal events, and update some of their beliefs about related actors. These attitudes likely matter for the acceptance of decisions in AI moderation and AI alignment, especially as AI-enabled systems increasingly figure more broadly in people’s lives.
Specifically, we see people who hold preferences on different motives behind AI moderation. The broadest support is for moderation for safety reasons, followed by moderation to reduce bias. AI moderation of aspirational imaginaries receives less support. Importantly, exposure to controversial model outputs does not negatively impact support for safety-driven moderation, which our equivalence test indicates as a negligible effect. This speaks against a generalized-backlash account in which any moderation failure undermines support for AI interventions across the board. Instead, the backlash is selective: support for bias reduction and aspirational imaginaries is negatively impacted, while safety-oriented moderation remains stable. This selective pattern is consistent with the informational content of the controversy: the stimuli highlight a perceived tradeoff between demographic representation and historical accuracy, which is most directly relevant to bias mitigation and aspirational goals, but provide little evidence about the system’s capacity to prevent harmful content.
We find that focusing events matter in attitude formation on AI governance. Clearly, publicly discussed controversies over model adjustments can shape public opinion on legitimate reasons for AI moderation and on attitudes toward companies that provide and adjust AI models. Thus, as personalized model alignments (Kirk et al., 2024) become more widely deployed and gain greater public awareness, people will encounter more model failures across settings. The perceived legitimacy of and public support for alignment decisions, and the (suspected) motives behind them, will therefore face significant challenges when alignment failures in one scenario carry over to others. Importantly, here, our findings indicate that not every instance of historical inaccuracy necessarily shifts attitudes, consistent with the non-significant results we observe for the German-soldiers stimulus. A plausible interpretation is that backlash may be more likely when a failure case is culturally and politically salient and widely framed as evidence of value-laden overcorrection, whereas less salient exemplars may not trigger the same updating. Because we do not directly measure perceived identity relevance or respondents’ interpretations of the underlying cause of the images, we treat this explanation as speculative and as a direction for future work.
Our findings also show that controversial model outputs impact trust in the respective AI company. People consider many aspects when assessing companies, but our experiment shows that moderation decisions reflect back on them. This finding aligns with prior research on the effects of product failures (Darke et al., 2010). Notably, Darke et al. (2010) show that distrust can generalize even to unrelated companies; our case, in which a failure in one of the company’s core products reflects back on its parent company, represents a comparatively direct link and thus a conservative test of this mechanism. The connection to this literature, especially regarding spillover dynamics, points to fruitful areas for future research. Temporal dynamics matter here. We are still early in the public discussion of AI moderation and, more broadly, AI alignment. The degree to which people become aware of the underlying processes and company policies might come to matter for the impact of specific moderation controversies on company assessments. Going further, we only tested the effects on the principles of AI moderation and company assessments. However, future research could also test the larger spillover effects of AI-related product failures on AI-enabled systems, AI companies, and AI governance more broadly. Early studies already point to experiences with AI-enabled disinformation leading to support for stricter AI regulation overall (Jungherr et al., 2026), raising the possibility that experiences with product failures have similar broad effects.
Our study has several limitations. One limitation is that we cannot directly observe how respondents interpreted the treatment (e.g. as an alignment/moderation decision vs a generic model error), which may moderate treatment effects. At the same time, contemporaneous public discourse around the Gemini controversy widely framed the incident as stemming from system-level alignment and content moderation choices (Heath, 2024; Robertson, 2024); this makes such an interpretation plausible, but we cannot verify it at the individual level in our data. We note that either interpretation, deliberate moderation choice or generic model error, is consistent with the focusing-events framework, as both make otherwise abstract governance questions concrete and psychologically salient. More broadly, while we draw on the expectancy disconfirmation framework to motivate our hypotheses, future research could directly test the mechanisms by which exposure to moderation failures leads to selective attitude change.
Importantly, we tested only two specific examples in the United States, each with a single case. Our findings are therefore specific along at least three dimensions that future research should expand. First, we test two specific, politicized examples for AI moderation. In this case, it is plausible to expect the prompt to produce a realistic output that accurately represents historical facts. However, for different usage expectations, such as artistic or imaginative expression, AI moderation might not yield the same negative effects. The contextual dependence of responses to AI moderation should therefore be systematically tested in future research.
Second, our case focuses on one specific company at a given point in time. This introduces a set of contingencies that should be systematically varied in future research. For one, the debate about AI and its expected societal effects is evolving quickly. Currently, AI-related product failures might have stronger effects when the debate is not yet matured and people are in the process of forming and stabilizing their opinions. Accordingly, product failures might have less of an effect at a later stage when AI uses and AI moderation are normalized in society. Similarly, perceptions of tech firms’ political alignment may shift with changing political contexts and elite cues. Accordingly, spillover effects of product failures might be contingent on these larger trends and, therefore, vary over time. Moreover, future research should test whether our findings reflect a novelty effect bias rather than durable preferences.
Third, we only tested the effects in the US Attitudes toward technology and American technology companies are likely to vary between countries. Accordingly, we should also be open to testing these effects in varying international contexts, especially at a time of rising geopolitical competition, if not conflict. Furthermore, the salience of historical accuracy is likely to vary cross-culturally, and what is considered sensitive in the US context may be seen differently elsewhere.
Fourth, we focused on general attitudes toward moderation, not governance preferences. People may agree on practices but differ on whether enforcement should rest with companies or regulators (Rauchfleisch and Jungherr, 2024; Riedl et al., 2021), a question future research should address more directly.
In combination, our findings show that there is a cost to (invisible) AI moderation, especially if it leads to recognizably biased results. Clearly, the answer cannot be to do away with AI moderation. Instead, it is important for companies, commentators, and academics to actively document and discuss principles, procedures, and techniques that allow justified AI moderation. AI-enabled systems and their governance need to be assessable by their users (O’Neill, 2022). Opaque moderation practices risk weakening public trust in AI-enabled systems and opening the door for pundits to use obviously misleading model outputs as evidence that AI companies are pursuing cultural and political agendas. This, in turn, can be expected to lead to a general loss of trust in companies, products, and AI safety procedures. Realizing the beneficial potentials of AI use for society thus demands a broad debate about the necessity, legitimate approaches, and opportunities for the contention of AI moderation. Without this debate, instances of moderation failures risk compounding and, over time, contributing to a broader climate of skepticism toward the technology and the companies providing it.
Supplemental Material
sj-pdf-1-nms-10.1177_14614448261449271 – Supplemental material for The politics of artificial intelligence alignment: Public reactions to AI moderation in the case of Google’s Gemini
Supplemental material, sj-pdf-1-nms-10.1177_14614448261449271 for The politics of artificial intelligence alignment: Public reactions to AI moderation in the case of Google’s Gemini by Adrian Rauchfleisch and Andreas Jungherr in New Media & Society
Footnotes
Acknowledgements
A. Rauchfleisch and A. Jungherr contributed equally to the project.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Adrian Rauchfleisch’s work was supported by the National Science and Technology Council, Taiwan (R.O.C.) (grant no. 113-2628-H-002-018- and 114-2628-H-002-007-) and by the Taiwan Social Resilience Research Center (Grant No. 114L9003 and 115L9003) from the Higher Education Sprout Project by the Ministry of Education in Taiwan. Andreas Jungherr’s work was supported by a grant from the Bavarian State Ministry of Science and the Arts, coordinated by the Bavarian Research Institute for Digital Transformation (bidt).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
Supplemental material
Supplemental material for this article is available online.
Notes
Author biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
