Abstract
While online hate speech has become a serious problem in multimedia environments, most studies in this area have examined text-based hateful content, with less attention paid to its other visual aspects. From a multimodal perspective, we conducted an online experiment (N = 799) to investigate how multimodal hate speech (i.e., text and images presented together to convey hateful meanings) on social media affected users’ prejudicial attitudes and prosocial behavioral intentions. The results showed that participants in the text-plus-image (vs. text-only) condition felt more sympathy, which led to less implicit prejudice toward the target group and more prosocial behavioral intentions. In addition, exposure to text-plus-image hate speech had an indirect effect on prosocial behavioral intentions through sympathy and implicit prejudice. The findings contribute to scholarship on online hate speech and provide insights into the affect heuristics that individuals rely on when processing multimodal information.
The prevalence of hate speech has been considered one of the biggest challenges facing social media platforms (Walther, 2022). Across the globe, the proportion of people who say they have been exposed to online hate speech ranges from 28% (adults in New Zealand; Pacheco & Melhuish, 2018) to 39% (15- to 30-year-olds in the United Kingdom; Hawdon et al., 2017) and up to 57% (14- to 19-year-olds in the United States; Harriman et al., 2020). Hate speech refers to any communication that disparages a person or group based on ethnicity, gender, religion, sexuality, political orientation, or other characteristics (Tontodimamma et al., 2021). When it prevails on social media, it not only damages victims’ physical and psychological well-being but also potentially triggers offline violence, such as mass shootings and ethnic cleansing (Laub, 2019; Saha et al., 2019).
Exposure to online hate speech influences observers’ attitudes and behaviors toward the individuals or groups being insulted and humiliated. There is growing evidence that online users exposed to hateful content exhibit more prejudice and fewer prosocial behaviors toward the target group (e.g., Soral et al., 2018; Weber et al., 2020; Ziegele et al., 2018). Despite these illuminating results, most of these studies have focused on text-based hate speech, leaving the role of images in this phenomenon largely unexplored. Given the multimedia nature of social media platforms, hateful content online is not limited to text but also includes images. Some hate speech employs visual elements strategically, as images are usually seen as a more direct representation of reality and therefore more persuasive than abstract forms of communication such as words (Messaris & Abraham, 2001). For example, memes, a form of multimedia message usually based on an image with some embedded caption text, have been increasingly used to generate and spread hate speech on social media (Sabat et al., 2019). However, there is currently limited understanding of how multimodal hate speech on social media affects users’ attitudes and behavioral intentions toward a target group. Does hateful content that combines different modalities, such as text and images, shape prejudice against the target group? Does multimodal hate speech influence prosocial behavioral intentions more than text alone? If so, what are the psychological mechanisms that could explain these effects?
To address these questions, the current study examined the effects of multimodal hate speech (i.e., text and images presented together to convey hateful meanings) on prejudicial attitudes and prosocial behavioral intentions. Given that attitudes may be explicit or implicit (Dovidio et al., 2001), we investigated prejudice using both explicit and implicit measures to gain a more complete understanding of the multimodal power of hate speech. Guided by the affect heuristic framework (Slovic et al., 2007), we explored the role of sympathy (i.e., an emotional response that results from comprehending the emotional state of another person) in mediating the effects of multimodal hate speech on attitudinal and behavioral responses. We focused specifically on sympathy because it plays an important role in explaining the psychological processes activated by exposure to hate speech and is closely related to prosocial behaviors (Batson, 1991; Eisenberg et al., 2014; Soral et al., 2018). This study fills a significant gap in the existing literature on the impact of multimodal hate speech and contributes to a deeper understanding of how affect heuristics operate in processing multimodal information.
Literature Review
The Influence of Hate Speech
Hate speech is generally understood as derogatory expressions directed at individuals and groups based on characteristics such as gender, race, religion, political ideology, and sexual orientation (Tontodimamma et al., 2021). It usually contains negative stereotypes; dehumanizes members of social groups by labeling them as inanimate objects (e.g., “scum”), animals (e.g., “rats”), or non-humans (e.g., “parasites”); and expresses violence or killing (Schäfer et al., 2023). While many forms of hate speech are relatively explicit and direct, hatred can also be expressed in more implicit or indirect ways. Implicit hate speech often goes beyond the meanings of the words themselves and uses figurative language (e.g., metaphors, irony, and sarcasm) to intentionally convey hatred toward a particular social group (Ocampo et al., 2023). As suggested by social identity theory (Tajfel & Turner, 1979), people aspire to maintain a positive social identity, and much of this positive identity derives from favorable comparisons among ingroups and related outgroups. This process may lead to intergroup discrimination in the form of favoritism toward an ingroup and negative orientations toward an outgroup, perhaps involving hatred and derogation (Haas, 2012). Several studies support the notion that individuals who engage in hate speech to denigrate an outgroup seek to promote positive self-identities and enhance their ingroup status (Costello et al., 2019; Leets, 2001).
The online environment can make individuals more susceptible to the influence of hate speech on attitudes and behaviors. As visual anonymity in most computer-mediated communication contexts leads users to see themselves and others as part of social groups rather than idiosyncratic individuals, they tend to adjust their opinions and behaviors in line with group norms, even if these norms encourage prejudice and antisocial behaviors (Postmes et al., 2000; Reicher et al., 1995). For example, Hsueh et al. (2015) found that participants exposed to prejudiced comments against Asian Americans not only showed more implicit and explicit prejudice toward Asians but also used more prejudiced expressions in their own comments. Ziegele et al. (2018) showed that exposure to hateful user comments against refugees on news websites reduced participants’ attitudes toward donating money to a refugee aid organization, which subsequently hindered their intentions to donate and the amount they actually donated. Likewise, Weber et al. (2020) found that the presence of hateful language in user comments negatively affected implicit attitudes toward refugees and thus impeded the users’ prosocial behaviors toward a refugee relief organization. Overall, these observations suggest that online hate speech can form a normative influence that consciously or unconsciously shapes audiences’ prejudicial attitudes and behaviors toward a target group.
Although enlightening, these studies focus primarily on text-based hateful content, while overlooking the role of multimodality in hate speech processing and its subsequent consequences. Indeed, few recent studies have examined the impact of multimodal hate speech. For instance, prior work suggests that visual forms of hate speech attract more attention and have a greater impact on audiences than textual forms (Schmid et al., 2022). However, this preliminary evidence does not address the influence of multimodal hate speech on observers’ attitudes and behaviors toward a target group. Therefore, the present study is an attempt to bridge this research gap by investigating whether adding visual elements in textual hate speech would be more powerful in shaping individuals’ attitudes and behaviors, and if so, why?
The Power of Images in Hate Speech
The idea that visual information has a stronger impact on cognitive responses than textual information has been well documented in the picture superiority effect (Nelson et al., 1976; Paivio & Csapo, 1973). This effect posits that visual elements of messages, such as images, enhance the recall and recognition of information more than text (Paivio & Csapo, 1973). The superiority of visual elements in memory can be explained by the dual-encoding hypothesis, which proposes two separate but interrelated mental subsystems specialized in processing different forms of stimuli: the verbal system for language and the visual system for nonlinguistic objects and events (Paivio, 1971). From this perspective, it is believed that text and images are processed independently by these two subsystems, and their combination may have additive effects on memory performance (Lee & Kim, 2016; Paivio, 1986). When messages contain both text and images, verbal and visual systems may interact and enhance memory, thereby facilitating message recall and recognition.
Beyond memory, a plethora of studies have extended the picture superiority effect to the persuasiveness of multimodal messages (Lee & Shin, 2022; Powell et al., 2015; Sparks et al., 1998). According to the modality–agency–interactivity–navigability (MAIN) model (Sundar, 2008), richer modalities may elicit a higher perceived realism of conveyed messages than leaner modalities, potentially leading to more credibility judgments. According to dual-process theories in social psychology (e.g., the heuristic-systematic model of persuasion), individuals process information in two distinct manners: one involves effortful thinking and systematic reasoning about message content, while the other is characterized by automatic, superficial, and thoughtless processing (Eagly & Chaiken, 1984). Given the limited attentional and cognitive resources of the human mind (Simon, 1956), most people try to reach conclusions with minimal mental effort. As a result, they resort to systematic processing only when they have sufficient ability to analyze information and strong motivation to reach an accurate conclusion (Chaiken & Ledgerwood, 2012). When they are not motivated and are able to do so, they rely heavily on peripheral cues in messages (e.g., argument length) and use cognitive heuristics (i.e., simple rules of thumb, such as if a message is conveyed by an expert, then it should be credible) to make judgments on content.
Previous research suggests that richer modalities tend to inhibit the systematic processing of message content, as information presented in multiple modalities requires more cognitive resources to encode and leaves fewer resources for other subprocesses of information processing, such as storage and retrieval (Lang, 2000; Sundar et al., 2021). In this sense, individuals typically process messages with richer modalities less systematically and rely more on heuristics to assess content. A prominent heuristic related to modality is the realism heuristic, which predicts that information with richer modalities is viewed as more credible because its content resembles the real world more closely (Sundar, 2008). The realism heuristic, as triggered by a multimodal presentation, may create an illusion of truthfulness by making information appear more authentic and convincing. Nevertheless, this contention has received mixed support from prior research, particularly when it comes to fake news and disinformation. Some studies found that false information presented in a richer format (e.g., video) leads to higher perceived content credibility and sharing intentions than a textual counterpart (Lee & Shin, 2022; Sundar et al., 2021), whereas other studies showed no significant difference in credibility perceptions between multimodal (i.e., deepfake videos) and unimodal (e.g., textual or auditory) presentations of disinformation (Barari et al., 2021; Hameleers et al., 2022). Such inconsistent results suggest the complexity of how online users process and respond to multimodal information. Given the contradictory findings about the persuasiveness of multimodal information and the relative lack of empirical evidence on hate speech, we explored the impact of multimodal hate speech on prejudice and prosocial behavioral intentions as a research question:
Research Question 1 (RQ1). What are the effects of exposure to text-plus-image hate speech (vs. exposure to text-only hate speech) on (a) explicit prejudice, (b) implicit prejudice, and (c) prosocial behavioral intentions?
The Mediating Role of Sympathy
Sympathy may play a crucial role in explaining the effect of multimodal hate speech on attitudes and behavioral intentions toward the target group. Sympathy is broadly defined as the “heightened awareness of another’s plight as something to be alleviated” (Wispé, 1986, p. 314). It is a prevailing affective reaction to the misfortune or suffering of others (Harth et al., 2008) and is often characterized by feelings of concern or compassion for others in need (Eisenberg et al., 2014). Unlike empathy, which involves the vicarious experience of another person’s emotions (feeling with others), sympathy focuses on feelings of concern for another person (feeling for others). Sympathy reflects an automatic response to another’s distress and a subsequent urge to relieve it (Vossen et al., 2017). It is particularly important in promoting prosocial behaviors, such as helping and sharing (Batson, 1991).
Sympathy for victims is an important emotional reaction to violent media content. When frequently exposed to a stimulus, individuals tend to show a decreased psychological and emotional response to the stimulus—a process known as desensitization (Wolpe, 1990). Prior research on media violence has demonstrated that individuals who repeatedly encounter violent media content become desensitized to violence, as evidenced by their decreased sympathy for victims of violence in other contexts (Linz et al., 1988, 1989). This may be because repeated exposure to media violence leads audiences to become callous to violence and reduces their responsiveness to subsequent depictions of violence. As a result, audiences may perceive violent content as less threatening and may feel less sympathy for victims than they initially did. Likewise, a similar desensitization process may be observed among people who are frequently exposed to verbal violence, such as hate speech. For example, Leets (2001) found that individuals who frequently encountered racist slurs displayed fewer emotional responses to such utterances and viewed them as less harmful and offensive. Soral et al. (2018) indicated that prolonged exposure to anti-refugee hate speech could reduce individuals’ sensitivity to such language and their sympathy for targets of hate speech. These findings suggest that exposure to hate speech may affect the ability to sympathize with targets through desensitization.
The potential impact of hate speech on sympathy may be particularly pronounced when exposed to textual hate speech with visual elements, such as images and photos. Due to their vividness and resemblance to reality, visuals are often perceived as more salient and processed more quickly and intuitively than text (Geise & Baden, 2015). Visual information appears to exert a persuasive effect through more emotion-driven heuristic processing than textual information (Powell et al., 2015; Sparks et al., 1998). In other words, the addition of an image to text may lead individuals to rely on emotions associated with that image to make judgments without scrutinizing the arguments in the text—a tendency known as the affect heuristic (Slovic et al., 2005, 2007). The term affect used here refers to “the specific quality of ‘goodness’ or ‘badness’ (a) experienced as a feeling state (with or without consciousness) and (b) demarcating a positive or negative quality of a stimulus” (Slovic et al., 2005, p. 35). The premise of using affect or emotions as a mental shortcut for decision-making is that people’s mental representations of objects and events can be associated with positive or negative emotions to varying degrees. When making a judgment, people may refer to a mental pool of these emotions, called an “affect pool,” either consciously or unconsciously. Emotions may act as cues that help people simplify judgmental tasks, especially in situations where judgments are complicated or cognitive resources are scarce (Finucane et al., 2000; Slovic et al., 2007). A growing body of research has linked multimodality with the affect heuristic to examine the effects of messages of different modalities on emotions and the resulting persuasive outcomes. For example, Powell et al. (2015) found that when images and text were presented together, textual and visual framing induced anger and fear, leading to stronger behavioral intentions. Lee et al. (2023) indicated that video-plus-text disinformation about the side effects of the flu vaccine elicited higher levels of anxiety, which, in turn, increased misperceptions about the flu vaccine. These findings suggest that textual hate speech alongside an accompanying image may be more effective in shaping sympathetic responses to the target group than text-only hate speech.
Sympathy elicited by multimodal hate speech may, in turn, affect individuals’ prejudice and prosocial behaviors. According to Frijda (1986), emotions produce a mental spotlight whose intense focus quickly induces a desire to behave in a certain way consistent with the particular emotion being felt. Sympathy, especially when triggered by witnessing the distress or plight of others, has been shown to improve attitudes and reduce prejudice toward stigmatized groups (Batson et al., 2002; Vescio et al., 2003). Given that sympathy is an other-oriented emotional response associated with the desire to reduce another’s suffering or need, it is likely to promote altruistic behaviors (Batson, 1991). Research has found that sympathy is positively related to a willingness to help others in need (Eisenberg et al., 1989) and donate money to charitable organizations (Baberini et al., 2015). On this basis, the feelings of sympathy that individuals may experience when viewing multimodal hate speech may, in turn, influence prejudice toward the target group and prosocial behavioral intentions. Therefore, we tested this possibility by asking the following research question:
Research Question 2 (RQ2). Does sympathy mediate the effect of hate speech modality (text-only vs. text-plus-image) on (a) explicit prejudice, (b) implicit prejudice, and (c) prosocial behavioral intentions?
Furthermore, the intention to engage in prosocial behaviors may be shaped by both explicit and implicit prejudice. Individuals who hold prejudicial attitudes toward a target (e.g., person or group) may automatically or unconsciously align their behaviors with such cognitions because negative associations about that target are well integrated into the mental system and easily activated by specific cues in context (Devos, 2008). In support of this rationale, Weber et al. (2020) found that exposure to hateful user comments against refugees increased negative implicit attitudes and ultimately led to a decrease in prosocial behaviors. In the current study, we integrated sympathy and (explicit and implicit) prejudice as antecedents of prosocial behavioral intentions and examined whether these variables mediated multimodal effects on prosocial behavioral intentions in a serial manner. Thus, the following research question was raised:
Research Question 3 (RQ3). Is there a serial mediation effect of hate speech modality (text-only vs. text-plus-image) (a) through sympathy and explicit prejudice on prosocial behavioral intentions and (b) through sympathy and implicit prejudice on prosocial behavioral intentions?
Method
Study Design
To increase the generalizability of the findings, we tested the hypotheses in the context of two target groups for hate speech that is prevalent online: Muslims and homosexuals. An online experiment with a 2 (modality of hate speech: text-only vs. text-plus-image) × 2 (target group of hate speech: Muslims vs. homosexuals) between-subjects design was conducted. The study protocol received approval from the ethical review committee of the first author’s university.
Participants
We conducted an a priori power analysis estimating a small effect size (f = .10) with 80% power (α = .05), which indicated that at least 787 participants would be required to detect the predicted effects. Therefore, we recruited 803 adults (aged 18 years old and above) residing in the United States from Amazon Mechanical Turk (MTurk). To ensure the quality and reliability of responses, we restricted participation to MTurk workers who had an approval rate of human intelligence tasks (HITs) of over 97% and had at least 1,000 HITs approved (Peer et al., 2014). Participants were also limited to Twitter users and those who did not belong to any of the groups targeted by the hate speech in the experimental stimuli. After excluding those individuals who gave identical responses to all the items (N = 4), the final sample size was 799 (55.2% males, 43.9% females, and 0.9% others; Mage = 39.50, SDage = 12.70).
Stimulus Material
Participants were presented with a screenshot of a fictitious Twitter post that incited hatred against the target group. To minimize the confounding effects that might result from the source of the post, a gender-neutral name (i.e., Robin Smith) and profile picture (i.e., landscape photo) were used (Figure 1). The number of likes was held constant across posts.

Example stimuli for the homosexual-related, text-only hate speech condition.
To manipulate the type of hate speech, we varied the social groups targeted by it in the research. Because Muslims and homosexuals are the social groups most commonly targeted by online hate speech in the United States (Anti-Defamation League, 2021), these two groups were selected as targets in the stimuli to enhance the ecological validity of the experiment. Participants in the [Muslims/homosexuals] conditions viewed a post containing hateful content against the corresponding group.
To manipulate the modality of the hate speech, we varied the presentation format of the posts. In text-only conditions, the post comprised only a textual message expressing hatred toward homosexuals (Figure 1) or Muslims (Figure 2). In text-plus-image conditions, the post consisted of images along with a caption to convey the hateful content (Figures 3 and 4). The meaning conveyed by the image essentially matched the content of the caption to enhance congruence between the image and the text.

Example stimuli for the Muslim-related, text-only hate speech condition.

Example stimuli for the homosexual-related, text-plus-image hate speech condition.

Example stimuli for the Muslim-related, text-plus-image hate speech condition.
A pretest was conducted to identify appropriate stimulus messages and captioned images. For each target group, we developed three hateful messages and selected several relevant images from previous research (Oboler, 2013) and from the website memes.com. We cropped images and added text to create the fictitious multimodal hate speech. A total of 64 participants were recruited from MTurk for the pretest study. 1 Participants rated each message and captioned image on three dimensions: perceived threat, perceived realism, and perceived visual–verbal congruence (for captioned images only). Based on the pretest results, two messages and two captioned images (one for each target group) were selected for the main study. The chosen messages and captioned images were matched on all three dimensions within and across the target groups (see Appendix A in the Supplemental Material).
Procedure
Because the Implicit Association Test (IAT) was incompatible with mobile or tablet devices, participants were asked to participate in the research with a desktop computer or laptop with a keyboard. Prior to the formal survey, the participants answered three screening questions about their social media use, religious beliefs, and sexual orientation. Participants who identified themselves as Twitter users, non-Muslims, and non-homosexuals were directed to the formal survey.
In the formal survey, participants were randomly assigned to one of four conditions. Participants in each treatment condition viewed a screenshot of a post and were instructed to imagine a situation in which they were browsing the internet from the comfort of their homes and saw the post in their Twitter feed. They were required to view the post for at least 10 seconds before proceeding to the next page. Afterward, they completed the IAT and the self-report questionnaire to assess the variables of interest. Finally, they were debriefed and thanked for their participation.
Measures
Sympathy
Participants indicated the extent to which they felt “sympathetic,” “compassionate,” and “empathetic” when viewing the post on a 7-point scale (1 = not at all, 7 = extremely). The three items were adapted from the study by Powell et al. (2015) to assess sympathy (α = .90, M = 4.12, SD = 1.82).
Explicit Prejudice
Explicit prejudice was measured in two ways. First, consistent with previous research (Hsueh et al., 2015), we used a 7-point feeling thermometer rating scale (1 = most warm, 7 = least warm) to ask participants to rate their feelings of warmth toward [Muslims/homosexuals], with higher scores representing more prejudicial responses (M = 2.98, SD = 1.44). In addition, five 7-point semantic differential items from the study by Eagly et al. (1991) were included to assess explicit attitudes toward the target group. The items included “good/bad” and “nice/awful.” The five items were averaged into a composite measure of explicit prejudice, with higher scores indicating more prejudice against the target group (α = .95, M = 2.68, SD = 1.45).
Implicit Prejudice
Implicit prejudice was assessed using Arab–Muslim and sexuality IATs (Greenwald et al., 1998). The IAT measures the relative strength of mental associations between target pairs (e.g., homosexuals vs. heterosexuals) and positive (e.g., lovely) versus negative (e.g., awful) attributes. The assumption underlying the IAT is that when participants are asked to categorize a target concept (e.g., heterosexuals) with an attribute dimension (e.g., lovely), they will do so more quickly if the concept is consistent with their implicit mental associations (i.e., if they implicitly perceive heterosexuals more positively than homosexuals). Conversely, they will respond more slowly if the concept is inconsistent with their implicit mental associations. According to Greenwald et al. (2009), the IAT sometimes performs better in predicting behaviors in socially sensitive domains (e.g., racial discrimination) than explicit self-report measures.
Following the procedure described by Greenwald et al. (2003), the IAT included seven blocks of trials in which participants were asked to categorize target concepts (e.g., homosexuals) and attributes (e.g., attractive) as quickly and accurately as possible. In each block, participants saw a series of words about target concepts or attributes at the center of the screen. They sorted the words as quickly as possible by pressing the correct response key on the keyboard. The “E” key represented the categories in the upper left corner of the screen, and the “I” key represented the categories in the upper right corner of the screen. Whenever participants responded incorrectly, they were shown an error message and required to press the correct response key to proceed.
Using an advanced IAT scoring algorithm developed by Greenwald et al. (2003), D-scores were calculated to represent the level of implicit prejudice (M = .14, SD = .44). Positive D-scores indicate stronger associations with others-positive/Muslims-negative (or heterosexuals-positive/homosexuals-negative) relative to Muslims-positive/others-negative (or homosexuals-positive/heterosexuals-negative), thus reflecting more implicit prejudice against Muslims or homosexuals. Negative D-scores indicate stronger associations with Muslims-positive/others-negative (or homosexuals-positive/heterosexuals-negative) relative to others-positive/Muslims-negative (or heterosexuals-positive/homosexuals-negative), representing less implicit prejudice against Muslims or homosexuals. A D-score of zero indicates no implicit prejudice.
Prosocial Behavioral Intentions
Participants indicated the extent to which they were likely to engage in prosocial behaviors toward the target group in the next 12 months on a 7-point scale (1 = not likely at all, 7 = extremely likely). The eight items were adapted from previous research (Hopkins et al., 2007; Oliver et al., 2012) to assess prosocial behavioral intentions, including “donating money to groups and organizations that provide services to help [Muslims/homosexuals]” and “forwarding a message supporting [Muslims/homosexuals] to others” (α = .96, M = 4.54, SD = 1.63).
Issue Involvement
Participants indicated the extent to which they felt the issues about [Muslims/homosexuals] as personally relevant on five 7-point semantic differential items from Zaichkowsky (1994), including “unimportant/important” and “irrelevant/relevant” (α = .94, M = 5.31, SD = 1.42).
Prior Knowledge
Prior knowledge was measured by asking participants how much they agreed with four items from the study by Flynn and Goldsmith (1999) on a 7-point scale (1 = strongly disagree, 7 = strongly agree). The items included “I know pretty much about [Islam/homosexuality]” and “I feel very knowledgeable about [Islam/homosexuality]” (α = .94, M = 4.68, SD = 1.54).
Political Ideology
Participants indicated their political views on a 5-point scale (1 = very conservative, 5 = very liberal; M = 2.82, SD = 1.34).
The full questionnaire for the main study is available in Appendix B of the Supplemental Material. Bivariate correlations between the variables of interest are presented in Appendix C of the Supplemental Material.
Results
Preliminary Analyses
A multivariate analysis of variance (MANOVA) was conducted to examine whether the effects of modality on the mediators and dependent variables differed between the target groups. The results indicated that there were no significant interactions involving the type (p > .05) for all the tests. Therefore, the data for both types of hate speech were combined in subsequent analyses to increase the statistical power.
There were also no significant differences across different modality conditions regarding age, t(797) = 1.30, p = .20; gender, χ2(2) = 3.67, p = .16; education, χ2(5) = 4.20, p = .52; ethnicity, χ2(6) = 5.52, p = .48; and political ideology, t(797) = .66, p = .51. This indicated that the participants in the text-only and text-plus-image conditions were relatively comparable in terms of these demographic characteristics.
Hypothesis Testing
To test RQ1, a series of analyses of covariance (ANCOVAs) were conducted to examine the main effect of modality on explicit prejudice, implicit prejudice, and prosocial behavioral intentions. Issue involvement, prior knowledge, political ideology, and age were entered as covariates in the analyses. The results did not reveal any significant main effects of modality on explicit prejudice [feeling thermometer measure: F(1, 793) = .31, p = .58, partial η2 = 0], explicit prejudice [semantic differential measure: F(1, 793) = .46, p = .50, partial η2 = .001], implicit prejudice [F(1, 793) = .001, p = .97, partial η2 = 0], and prosocial behavioral intentions [F(1, 793) = .65, p = .42, partial η2 = .001].
RQ2 and RQ3 were tested using Models 4 and 6, respectively, of the PROCESS macro (Hayes, 2018) with bias-corrected 95% confidence intervals (CIs) based on 5,000 bootstrap samples. Modality was dummy coded, with the text-only condition as the reference group (0 = text-only, 1 = text-plus-image). After controlling for issue involvement, prior knowledge, political ideology, and age, the results showed that sympathy had no significant mediating effect on explicit prejudice (feeling thermometer measure: B = −.01, SE = .01, 95% CI [−.04, .001]; semantic differential measure: B = −.01, SE = .01, 95% CI [−.03, .01]). Moreover, the indirect effect of modality on implicit prejudice via sympathy was statistically significant (B = −.01, SE = .01, 95% CI [−.02, −.0004]). Participants in the text-plus-image condition displayed more sympathy when viewing the post than those in the text-only condition (b = .26, t = 2.22, p < .05). Such sympathy, in turn, significantly predicted implicit prejudice (b = −.02, t = −2.61, p < .01). Likewise, sympathy served as a significant mediator in the effect of modality on prosocial behavioral intentions (B = .06, SE = .03, 95% CI [.01, .11]). Sympathy elicited in the text-plus-image condition significantly predicted the intention to engage in prosocial behaviors (b = .23, t = 10.54, p < .001).
Finally, the serial indirect effect of modality on prosocial behavioral intentions via sympathy and explicit prejudice was not statistically significant (feeling thermometer measure: B = .003, SE = .002, 95% CI [−.0002, .009]; semantic differential measure: B = .001, SE = .002, 95% CI [−.002, .006]). However, modality had a significant and positive indirect effect on prosocial behavioral intentions via sympathy and implicit prejudice (B = .002, SE = .001, 95% CI [.0001, .005]). As shown in Figure 5, exposure to the text-plus-image post resulted in more sympathy (b = .26, t = 2.22, p < .05), which in turn negatively predicted implicit prejudice (b = −.02, t = −2.61, p < .01). Such implicit prejudice negatively predicted the intention to engage in prosocial behaviors (b = −.32, t = −3.99, p < .001).

Path coefficients for the serial mediation model of prosocial behavioral intention.
Discussion
The current study examined how multimodal hate speech affects prejudicial attitudes and prosocial behavioral intentions toward a target group. The results showed that participants felt more sympathy when images and text were presented in combination to convey hateful content. This sympathy, in turn, led to decreased implicit prejudice toward the target group and increased prosocial behavioral intentions. Moreover, exposure to multimodal hate speech had an indirect effect on prosocial behavioral intentions through sympathy and implicit prejudice.
In light of the current evidence, exposure to online hate speech seems to elicit consistent levels of prejudice and behavioral intentions, irrespective of its modality. One possible explanation is that exposure to a single post containing hateful content may not be sufficient to directly trigger changes in attitudes and behavioral intentions. According to the dosage effect, a higher dose of messages has a stronger impact on information recall and cognition (Banerjee et al., 2011). Most prior studies on the impact of textual hate speech on prejudicial attitudes and prosocial behaviors have been conducted in the context of online comment sections, where participants read several hateful user comments (e.g., Hsueh et al., 2015; Weber et al., 2020; Ziegele et al., 2018). Exposure to multiple hateful messages (compared to a single one) may create a stronger normative influence that makes users more susceptible to the stereotypes conveyed by their content. Therefore, future research could examine whether variations in the number of hateful messages users are exposed to influence attitudes and behaviors and, if so, through what processes.
One important finding is that sympathy plays a pivotal role in mediating the effect of multimodality on implicit attitudes and behavioral intentions. An increase in sympathy elicited by multimodal hate speech may be explained by the curvilinear pattern of changes in desensitization to negative or aversive stimuli over time. Previous research has found that individuals tend to show more sympathy for victims after initial exposure to violent scenes depicted in the media; with repeated exposure, the psychological impact of media violence weakens, and individuals’ sympathy for victims decreases (Fanti et al., 2009). Likewise, after initially observing hate speech, individuals may begin to feel more sympathetic toward the target group, and this tendency seems to be particularly likely to occur when hate speech is presented in richer modalities. Consistent with the affect heuristic (Slovic et al., 2007), our results suggest that multimodality has the potential to facilitate reliance on emotional states (i.e., sympathy) that drive the heuristic processing of hate speech, thereby shaping implicit attitudes and behavioral intentions toward a target group. In the presence of hate speech that combines images with textual claims, individuals tend to rely more on the sympathetic feelings they associate with the target groups to form attitudes and guide behavioral intentions. Previous studies have demonstrated the applicability of the affect heuristic as a framework for examining the multimodal information processing of disinformation (Lee et al., 2023) and news coverage (Powell et al., 2015). The current study extends this line of research to the context of hate speech by shedding light on sympathy as a key affective mechanism underlying the effects of multimodal hate speech on implicit prejudice and prosocial behavioral intentions.
Our results also resonate with the literature on the critical role of sympathy in prosocial actions (Batson et al., 2002; Eisenberg et al., 2014). Because sympathy often involves taking another’s perspective, the other-oriented focus inherent in sympathy is likely to promote behaviors intended to benefit others (Eisenberg et al., 2014). In the context of hate speech, the process of sympathizing with members of a target group may enhance understanding of their feelings, shift attention to their needs, and increase concern for their welfare, thereby motivating prosocial behaviors toward them. This highlights the importance of sympathy training for online users to reduce prejudice and foster prosocial tendencies. Developing intervention programs aimed at enhancing online users’ abilities to recognize and understand others’ feelings, take their perspectives, and value their well-being may be an effective way to promote situationally sympathetic responses to individuals or groups targeted by hate speech.
In addition, the significant serial mediation effect enriches the current understanding of the affect heuristic by uncovering the role of affective reactions in shaping judgments and subsequent decisions. In line with prior work showing the indirect effects of multimodal information on perceptions or behavioral intentions through the operation of the affect heuristic (e.g., Lee et al., 2023; Powell et al., 2015), our findings suggest that people base their judgments and intentions about a target group of hate speech not only on what they think about it but also on how they feel about it. If they feel sympathetic to the target group, they are inclined to show less prejudice and more prosocial intention; if their feelings about it are callous, they tend to react in the opposite way—with more prejudice and less prosocial tendencies. This finding also supports the contention that affective reactions to stimuli are usually the very first reactions that occur automatically without extensive cognitive processing and can subsequently guide judgmental and decision-making processes (Zajonc, 1980). Given that reliance on emotions seems to be a faster, easier, and more efficient way to make decisions (Slovic et al., 2007), the sympathetic feelings evoked by multimodal hate speech might inhibit individuals from associating certain negative characteristics with members of the target group, thus undermining the basis of stereotypes and prejudice. It is also worth noting that the serial mediation effects of multimodal hate speech on prosocial behavioral intentions through sympathy and prejudice were only observed for implicit prejudice but not for explicit prejudice. The divergent pathways to multimodal effects shed light on the distinct roles of sympathy in predicting implicit and explicit attitudes toward stigmatized groups.
Despite the significance of the findings, several limitations should be noted. First, we operationalized multimodality as textual and visual modalities, but multimodality can also manifest in other formats, such as videos. The combination of varying modalities may create intricate and nuanced meanings and lead to different persuasive outcomes. This prediction was observed in a recent study showing that exposure to fake news videos elicited greater perceptions of source vividness than exposure to text-plus-image fake news (Lee & Shin, 2022). Future scholarship should investigate the effects of varying multimodal formats of hate speech on users’ psychological and behavioral responses. Second, our study was conducted in the United States, so the findings might not be generalizable to other contexts. Given that the meaning and potential effect of hate speech largely depend on linguistic, societal, and cultural contexts, our results warrant further replication in other countries and regions. Third, the intention to engage in prosocial behaviors was measured using a self-report scale, which may be subject to social desirability bias. Participants might overestimate their intentions to present themselves in a socially desirable way. Future research should validate the current findings with more objective measures of prosocial behavioral intentions.
Fourth, sympathy is not the only emotion important for understanding the dynamics of multimodal hate speech. There are many other emotional responses to hate speech that may shape prejudicial attitudes and prosocial behavioral intentions, such as anger (Lee, 2021), contempt (Bilewicz & Soral, 2020), and fear (Oksanen et al., 2020). Exploring these emotions could be a promising direction for future work. It is also worth noting that the way we measured sympathy focused on overall feelings when viewing the post without specifying the object of sympathy. Future studies using measures that explicitly mention the target group for which sympathy is expressed may provide more nuanced insights into the role of sympathy in the effects of online hate speech.
Fifth, this study operationalized hate speech in a relatively explicit form, with posts containing overtly offensive language. However, in reality, hate speech sometimes strategically uses disguised means to express hatred and incite violence. For example, in some cases, text or images alone are harmless, but when they are combined, the meaning they convey becomes derogatory and hurtful. To gain a deeper understanding of multimodal hate speech, more research is needed to examine the impact of its implicit forms. It is also important to note that in this study, the images of the homosexual condition portrayed the target group in a relatively positive light, whereas the images of the Muslim condition tended to portray the target group negatively. One may argue that the image of pigeons in the Muslim condition not only adds a visual layer to the textual content but also involves the use of humor to dehumanize Muslims. The interplay of cues presented by these images may produce more complex meanings and elicit stronger reactions than the images of couples in the homosexual condition. Although different visual depictions of attacked groups were not the focus of this study, we encourage future research to take these nuances into account and systematically examine their potential effects on prejudicial attitudes and prosocial behavioral intentions.
In addition, we acknowledge that the effect sizes found in this study were small. One possible explanation is that the social media posts were presented for a short time. A longer presentation time for the experimental stimulus may increase the observed effect sizes. This suggests that future research is needed to replicate our results with longer exposure manipulations. Finally, the current results were limited to short-term, single exposure to hate speech and its immediate effects. Previous research suggests that prolonged exposure to hate speech may affect individuals’ perceptions of such language because they may view it as morally justifiable and legitimate (Bilewicz & Soral, 2020). Additional studies focusing more on multiple exposures and the long-term effects of multimodal hate speech are therefore recommended. We also urge future research to measure prior exposure to online hate speech and control for it in the research design or analyses used, which may mitigate its potential impact on attitudinal and behavioral outcomes.
Supplemental Material
sj-docx-1-sms-10.1177_20563051241292990 – Supplemental material for The Power of Images: How Multimodal Hate Speech Shapes Prejudice and Prosocial Behavioral Intentions
Supplemental material, sj-docx-1-sms-10.1177_20563051241292990 for The Power of Images: How Multimodal Hate Speech Shapes Prejudice and Prosocial Behavioral Intentions by Sai Wang in Social Media + Society
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
Author Biography
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
