Concrete language enhances sharing of social media posts on Twitter,Reddit,and experimentally

Abstract

Concrete language, which has the property of readily evoking a mental image or sensory experience, has been extensively studied in language and is understood to facilitate processing speed, memory, and understanding. Previous research points to a preference for concreteness. Through a comprehensive approach that combines big-data analysis and experimental methods, we investigate the preference for concreteness in posts from two social media platforms—Twitter and Reddit—and in decisions made by participants in a two-alternative forced-choice experiment. In Study 1, we analysed data from 15 million Twitter posts. The overall results show that posts containing words that are more concrete are more likely to be retweeted. In Study 2, we scraped over 50,000 posts from Reddit across different subreddits and found that more concrete posts tend to have more upvotes (i.e., approval). Both studies also showed consistent effects for words acquired later in life and rated as more arousing. The magnitude of these effects varied across topics. Finally, to demonstrate the causal influence of concreteness, Study 3 is a pre-registered, controlled behavioural experiment where we asked participants to indicate which of two social media posts they would be more likely to share, with both posts constructed from a real post to have either higher or lower concreteness than the original. Participants preferred more-concrete statements, and the difference in concreteness between statements positively predicted choices. Our investigation sheds light on cognitive mechanisms underlying online information sharing and is consistent with an information competition theory, in which more easily processed information is preferred.

Keywords

concreteness psycholinguistics information-sharing upvotes

Introduction

Contemporary information markets are highly competitive, with numerous individuals and industries vying for consumer attention. This has led to proposals that information competition drives the market for more attention-grabbing information (Davenport & Beck, 2001; Hansen & Haas, 2001; Hills, 2019; Pilgrim et al., 2024). This competition for attention has been conceptualized as a form of “information overload,” where the sheer volume of content creates a noisy environment in which only certain types of messages succeed (Pothos et al., 2021). Here, “noise” refers not merely to literal interference but to the presence of irrelevant or competing content that dilutes attention and reduces the signal-to-noise ratio for any given message (Hills, 2019; Shannon, 1949). As modern information environments grow denser, “competition” refers to the growing number of information producers and the finite attentional resources of receivers (Evans, 2020; Pilgrim et al., 2024; Simon, 1996; Terranova, 2012). This imbalance forces communicators to adapt their messages in ways that lower cognitive load and maximize impact (Falkinger, 2008; Hefti & Lareida, 2021; Picard & McEwen, 2018). For example, it has been found that information with greater processing ease is better able to capture and hold attention (Berger et al., 2023). The increased use of concrete language—language that is specific, imageable, and easily processed is one of the proposed adaptations of communicators (Connell & Lynott 2012; Winkielman & Cacioppo 2001).

As the number of information producers increases, the kinds of messages they communicate need to change in order to successfully capture the attention of information receivers. Early research on the relationship between efficient information transmission and noise showed that as noise increases messages should alter their form to maximize information for the receiver (Shannon, 1949). If we take noise to include all information unrelated to the message of interest—either as intended by the producer or as preferred by the receiver—then modern information markets are rife with noise. Moreover, this noise is likely to be increasing, as there is ample evidence of an accelerating competition for collective attention even in the last decade (Lorenz-Spreen et al., 2019). Savvy information producers should therefore adapt their messages accordingly. Given this simple observation, one might ask how such theories can be extended to help us better understand what kinds of messages are more likely to be successful.

One way to approach this question is to ask what kinds of language are more readily processed, remembered, and reproduced. Experimental studies in this area have found that concrete language has an advantage in studies of perception, memory and speed of recall (Walker & Hulme, 1999). Concreteness refers to how easy it is to imagine or see the referent of a word in one’s mind’s eye (Brysbaert, Warriner, et al., 2014).

It describes the extent to which a concept can be directly experienced through the senses and mentally visualized. Words that refer to tangible, physically perceptible entities are considered concrete, whereas abstract words typically represent ideas or states that require linguistic explanation rather than sensory perception (Brysbaert, Warriner, et al., 2014). For instance, “laughter” is more concrete than “humour,” and “running” is more concrete than “movement.” Although abstract words can convey broader aspects and subtleties of information, concrete messages are more precise and more readily come to mind. To measure concreteness, researchers ask people to rate words with respect to their concreteness using a definition like the one above (Brysbaert, Stevens, et al., 2014; Paivio et al., 1968). People’s ratings tend to be remarkably reliable. Average ratings are also excellent predictors of processing, memory, and speed of recall, three features that are critical to information sharing and reproduction (Hills, 2019).

The effectiveness of concrete language can potentially be explained by dual-coding theory (Paivio et al., 1994), which posits that people process information through both verbal and non-verbal (imagery-based) systems. Concrete language benefits from activating both systems simultaneously, enhancing comprehension and recall. In contrast, abstract language typically activates only the verbal system, which may be less effective in demanding or uncertain environments. Relatedly, concreteness has been shown to improve memory, reduce ambiguity, and increase the perceived truthfulness and psychological proximity of information (Hansen & Wänke, 2010; Sadoski et al., 1997). These properties suggest that concrete messages are particularly well-suited for capturing attention and promoting information sharing in competitive, high-noise settings such as social media. It has been found that concreteness positively affects sustained attention, which was measured by how far down users scroll for articles (Berger et al., 2023). Therefore, concreteness may not just influence how information is processed, but also whether it is retained and retransmitted—key features in the dynamics of online communication.

In line with this, a growing body of research has shown that concreteness enhances not only comprehension and recall but also the likelihood of information being shared. For example, Li et al. (2024) found that concrete words were more likely to survive both short-term retellings and long-term cultural transmission. In experimental studies, participants are more likely to recall concrete messages (Hanley et al., 2013), process them more quickly (Barber et al., 2013), and rate them as more vivid, familiar, and truthful (Hansen & Wänke, 2010). Beyond memory, concreteness appears to enhance the persuasive power of messages, especially under uncertainty. For instance, research on construal-level theory shows that people prefer concrete over abstract descriptions when events feel psychologically near or personally relevant (Snefjella & Kuperman, 2015; Trope & Liberman, 2010). In the context of social media, where messages compete for attention in rapid, low-effort processing environments, such features may give concrete content a significant advantage. Concreteness also interacts with other known drivers of virality—such as novelty, arousal, and belief-consistency (Berger & Milkman, 2012; Goebel et al., 2024)—by reducing cognitive effort and increasing clarity. Prior exposure to content (Vellani et al., 2023), emotional intensity (Berger, 2011), and perceived usefulness (Epstein et al., 2021) all contribute to sharing behaviour, and concrete messages may be more likely to check these boxes. From this perspective, concreteness is not merely a linguistic feature but a cognitive shortcut: one that helps information break through the noise, stick in memory, and spread.

Applying concreteness and information competition to modern information markets, Hills and Adelman (2015) evaluated the statistical properties of 355 billion words of American English over the last two centuries and found a dramatic and nearly monotonic rise in average levels of concreteness. Hills et al. (2016) further developed a model of this rise based on Shannon’s information theory, showing that increasing “noise” (i.e., competition) enhances the need for information that is processed more quickly by its receiver. In more recent work, Li et al. (2024) found that not only were concrete words more likely to survive over the past two centuries, but they are also more likely to be preserved in story retellings, which were measured using a large dataset of more than 12,000 story retellings by online participants.

Linguistic features could also be one of the drivers for success of fake news, with concreteness directly related to the mechanisms by which people share or believe in fake news: Judgement of truth can be affected by the processing fluency of the information, and concrete information can be recognized faster and have an advantage on comprehensibility (Dechêne et al., 2010). Concrete information also tends to be easier to imagine and recall and therefore might leave the impression of having been previously encountered (Doest & Semin, 2005). This feeling of familiarity tends to contribute to the truth advantage of concrete information, a phenomenon known as the illusory truth effect (Dechêne et al., 2010; Fazio et al., 2015). Furthermore, concreteness helps with imaginability, which itself contributes to perceived truth (Koehler, 1991; Sherman et al., 1985).

According to construal level theory (Liberman & Trope, 2008; Trope & Liberman, 2010), there is an association between likelihood of an event and how concrete the construal is. More likely events tend to be represented more concretely and related to direct experience (Borgida & Nisbett, 1977). Research on reality monitoring identified the richness of perceptual, semantic, and contextual details, and the vividness level of the memory as the most important cues by which people judge if a memory is true or imagined (Johnson, 2006; Johnson et al., 1993; Johnson & Raye, 1981). People also tend to think information with more details is true (DePaulo et al., 2003): the description of concrete feelings and the extent of details are used by the police to judge the accuracy of a testimony (Akehurst et al., 1996). Furthermore, vividly described information tends to have more impact on judgement and decision-making: situations with more details tend to be regarded as more representative and more likely to happen compared with situations with fewer details (Borgida & Nisbett, 1977). Notably, it has been found that people are more likely to share information they approve of and concrete false statements are more likely to be rated as true (Hansen & Wänke, 2010).

Beyond concreteness, words acquired early in life are processed more fluently and are more deeply embedded in semantic memory (Brysbaert & Ellis, 2016). Fake news that is phrased in familiar, early-acquired vocabulary may therefore “feel” more true, enhancing its plausibility and shareability, even in the absence of factual accuracy. Also, humorous information tends to attract more attention regardless of the veracity of the information: it was found in an eye-tracking study that humour could direct people’s attention to both fake news and its correction (Kim et al., 2021). Humour has been shown to increase the likelihood of sharing while reducing cognitive scrutiny, and research suggests that humorous content is more likely to be forwarded (Berger & Milkman, 2012). While deliberation makes people less likely to believe or share fake news (Pennycook et al., 2020), humorous elements/funny content can lower people’s resistance to persuasion (Nabi et al., 2007).

Our goal here is to ask whether or not more concrete language influences sharing behaviour in online social media. By analysing posts from Twitter and Reddit with different topics and subreddits, we investigate if more concrete information is more likely to be shared and upvoted. We do this in two ways. First, we examine posts scraped from Twitter and Reddit and compare their concreteness alongside the extent to which of these posts are retweeted or upvoted. Second, we present a pre-registered, controlled experimental study that manipulates the concreteness of posts and then asks people to select which of two posts they would be more likely to share. Besides concreteness, we also explore the roles of other psycholinguistic properties, including age of acquisition, arousal, dominance, humour and valence, which have been found to be useful covariates in past work (Li et al., 2024).

Study 1

At the time of writing, Twitter (now X) currently has 368 million users (Twitter logistics, 2024). Quantitative research on Twitter has revealed a wide variety of real-world insights. This includes relationships between tweets and hate crime (Müller & Schwartz, 2023), the sharing of anti-vaccine sentiment (Bonnevie et al., 2021), political polarization (Conover et al., 2011), rumour spreading (Zubiaga et al., 2016), and their potential impacts on democracy (Lorenz-Spreen et al., 2023). In each of these cases, the spread of information is a critical component to the long-term impacts of that information. To address what makes Twitter posts more or less likely to be successfully spread, we evaluated the statistical properties of more than 15 million posts.

Methods

The Twitter dataset used in this study was collected by Zhao et al. (2023), who examined political polarization in online discussions surrounding the COVID-19 pandemic. The dataset consists of publicly available tweets posted between February 2020 and January 2021. For the present study, we focus solely on the linguistic features of tweets, independent of any political classification. The tweetsbotornot2 package in R (Kearney, 2018) was used to assess the likelihood of an account being automated, and any account with a bot probability score above .5 was excluded from the dataset. Given that language use plays a critical role in our analyses, we restricted our dataset to English-language tweets. Non-English tweets were identified and removed using Twitter’s metadata language tag, ensuring consistency in linguistic processing and measurement of language features.

For each post, we know whether it was retweeted. To investigate if concrete posts are more likely to be shared, we computed the concreteness ratings for each post using recent large-scale concreteness norms, including more than 40,000 words, each ranked on a Likert 5-point scale by multiple online participants (Brysbaert, Warriner, et al., 2014). In computing the concreteness of a post, we relied on word-level concreteness ratings from Brysbaert, Warriner, et al. (2014). Not all words in our dataset had an assigned concreteness value in this database. When a word lacked a concreteness score, it was excluded from the averaging process rather than being imputed or assigned a default value. Approximately 12.43% of words in the Twitter dataset lacked a rating, leaving our concreteness computations based on the remaining 87.57% of words. This approach ensures that our computed post-level concreteness scores reflect only words with established ratings. For each psycholinguistic feature (e.g., concreteness, age of acquisition), we matched words in each post to established norms and computed the post-level score by averaging the values of all matched words.

We also computed age of acquisition (Kuperman et al., 2012), valence, arousal, dominance (Warriner et al., 2013), and humour (Engelthaler & Hills, 2018) for each post. Age of acquisition refers to the age when people learn a particular item (Elsherif et al., 2023). Concreteness refers to specific and tangible language (Williams & Bizup, 2017). Valence is a characteristic of emotions. For example, “negative” emotions like anger and fear have a negative valence but positive emotions like joy have a positive valence (National Institute of Mental Health (NIMH), 2023).

For age of acquisition, we used word-level ratings from Kuperman et al. (2012) as it was the most comprehensive age of acquisition resource available, and the ratings contribute substantially to the lexical-decision task of the English Lexicon Project (Kuperman et al., 2012). For valence, arousal, and dominance, we used the norm ratings from Warriner et al. (2013), whose ratings correlate with previous affective norm datasets while covering many more words and more commonly used by existing studies. For humour, we used norm ratings from Engelthaler and Hills (2018), which is one of the few well-validated existing norm ratings for humour.

To ensure data quality and consistency, we applied a series of preprocessing steps to the dataset before analysis. First, we removed duplicate entries to retain only unique posts. Next, all text was converted to lowercase to standardise formatting and prevent mismatches due to capitalisation. To eliminate extraneous elements, we removed URLs and hashtags. Additionally, we filtered out numbers and punctuation to focus on meaningful words. Since social media posts often contain emojis and other non-standard symbols, we used Unicode-based regular expressions to exclude such characters. Finally, we removed stopwords (e.g., the, and, is) using a predefined list of English stopwords (R Core Team, 2024) to ensure that non-informative, high-frequency words did not bias the analysis. After these preprocessing steps, the cleaned text was used for subsequent linguistic feature extraction and engagement analysis.

For a post, we took each word and found its value in the associated norms (if the word appeared in the norms) and then calculated the average value of the words in the post. We ran both a single logistic regression model for all variables and a logistic regression for each topic with all variables using whether the post had been retweeted as the dependent variable, with concreteness, age of acquisition, valence, arousal, dominance, and humour as independent variables. We also applied a Bonferroni correction for multiple hypothesis testing. The dataset was categorised by Zhao et al. (2023) into 35 topics using Latent Dirichlet Allocation (LDA) (McCallum, 2002). The 35 topics cover a wide range of issues, including sports, religion, show business, cooking, the 2020 U.S. election, diplomacy, and national security (Zhao et al., 2023). We used the scale() function in R to standardise the scores of each psycholinguistic feature.

Results

For the entire dataset with all topics combined in a regression, all predictors are significant (p<.001), with age of acquisition, concreteness, valence, and arousal positively predicting the likelihood of retweet, while humour and dominance negatively predicting the likelihood of retweet (see Figure 1 and Table S1 for the full regression table). This supports the primary claim that concreteness independently influences the appeal of information. In a regression with topics and concreteness as independent variables while “whether a tweet was retweeted” as dependent variable, concreteness remains predictive (β=.014, SE=0.00059, CI=[0.012, 0.015], p<.001), which indicates that it is not more concrete topics driving the main effect of concreteness.

Figure 1.

Standardized effect size (coefficient estimates) of psycholinguistic features for all Twitter data (15,565,582 Tweets). The effect size represents the coefficient estimate of a logistic regression model to predict whether a tweet was retweeted. The error bars represent 95% confidence intervals. Some of the confidence intervals are too small to be seen.

Figure 2 shows that within topics, we see the same effect of concreteness, with more concrete posts increasing the likelihood of retweeting. For example, the concreteness effect is particularly strong for the topics of “racism,” “2020 election,” “criticism of politicians” and “ideological conflict,” while less for topics such as “shows,” “traffic,” “showbiz,” and “religion.” The trend for the influence on concreteness in political language may be a general effect, also supported by Bhatia and Walasek’s (2016) discovery that the New York Times used increasingly more concrete language as election dates approached (from 1987 to 2007). In the Supplemental Materials (Table S1 and Figures S1–S5), we show the effects for additional psycholinguistic features within topics. As above, age of acquisition, concreteness, arousal, and valence positively predict whether the tweets have been retweeted across all the topics, while dominance predicts retweeting with differences across topics. For example, less dominant tweets under the topic “enjoyment” tend to be retweeted, and more dominant tweets under the topic “2020 election” tend to be retweeted. Humour negatively predicts retweeting, with more humorous tweets being less likely to be retweeted.

Figure 2.

Standardised effect size (coefficient estimates) of concreteness across Twitter topics from separate regression models for data from each topic. The effect size represents the coefficient estimate of a logistic regression model to predict whether a tweet was retweeted. The error bars represent 95% confidence intervals.

Study 2

In Study 2, we ask whether we observe similar effects for Reddit posts. Reddit posts represent a different format from Tweets and offer a way to communicate ideas to a target community with respect to a specific topic (or subreddit).

Methods

Reddit is a social network with elements of news aggregation, content rating, and forums. We scraped Reddit posts using Python. The scraping started on 28 December 2024 and ended on 8 January 2024. We scraped the front page, which includes posts from a range of subreddits. We also scraped posts directly from popular subreddits, including “askreddit,” “explainlikeimfive,” “futurology,” “gaming,” “getmotivated,” “lifeprotips,” “news,” “nottheonion,” “science,” “showerthoughts,” “space,” “sports,” “todayilearned,” “upliftingnews,” “worldnews.” These 15 subreddits do not focus on gifs, music, pictures or videos and are among the top 30 popular subreddit communities (as listed at https://www.reddit.com/best/communities/1/). Overall, we collected 90,347 distinct Reddit posts. Because the frontpage consisted of different subreddits, we removed posts from those subreddits included in the front page that focused on music/gif/video/pictures. These removed subreddits included “aww,” “creepy,” “earthporn,” “food,” “listentothis,” “photoshopbattles,” “pics,” “videos,” “funny,” “gifs,” “announcements,” “blog,” “DIY,” “documentaries,” “mildlyinteresting,” “music,” and “oldschoolcool.” For each post, we captured the text and the number of upvotes it received.

After exclusions, our dataset consisted of 50,517 Reddit posts. Similar to Study 1, we used word-level concreteness ratings from Brysbaert, Warriner, et al. (2014). Words without a concreteness score were excluded from the averaging process to ensure that our computed post-level concreteness scores reflect only words with established ratings. Approximately 11.44% of words in the Reddit dataset lacked a rating, leaving our concreteness computations based on the remaining 88.56% of words. Similar to Study 1, we applied preprocessing steps to the dataset and computed the concreteness, age of acquisition (Kuperman et al., 2012), valence, arousal, dominance (Warriner et al., 2013), and humour (Engelthaler & Hills, 2018) for each Reddit post. Then we ran a multiple linear regression using the number of upvotes as the dependent variable, with concreteness, age of acquisition, valence, arousal, dominance, and humour as the independent variables (Table S2). We also ran regressions for the data from each subreddit. We applied a Bonferroni correction for multiple hypothesis testing. We used standardised effect sizes by using the function scale in R for analysis.

Results

For the entire Reddit dataset with subreddits combined (Figure 3), we found that concreteness positively predicts the number of upvotes, together with age of acquisition, and arousal. Valence and humour negatively predict the number of upvotes, and dominance is not predictive. To be more specific, every unit point increase in the effect size of concreteness is equivalent to 216.23 more upvotes. (Table S2).

Figure 3.

Standardized effect size (coefficient estimates) of psycholinguistic features for all Reddit data (50,517 posts). The effect size represents the coefficient estimate of a multiple linear regression model to predict the number of times each post is upvoted. The error bars represent 95% confidence intervals.

Within subreddits, we found that concreteness of content positively predicts the number of upvotes in 6 of 29 subreddits: “gaming,” “Futurology,” “movies,” “books,” “tifu,” and “twoxchromosomes,” and concreteness of content negatively predicts the number of upvotes in 6 of 29 subreddits: “nottheonion” and “todayilearned,” “LifeProTips,” “worldnews,” “Jokes,” “GetMotivated.” For other subreddits, concreteness was non-significant as a predictor. In the case of the “nottheonion” subreddit, this may indicate that for news perceived as unrealistic or untruthful, greater concreteness in language reduces the likelihood of upvotes. In the Supplemental Material, we provide further information about the other psycholinguistic features (Figures S6–S10). As seen in Figure 4, there is notable variation in the direction of the concreteness effect across different topics—an approximately equal number of topics show a positive or negative relationship between concreteness and upvotes. This suggests that the overall positive main effect of concreteness on Reddit is moderated by topic. Some topics benefit more from concreteness than others, and the topics that do benefit are also more popular, because although we tried to scrape the same number of posts for each subreddit, some subreddits have a smaller number of distinct posts, and those subreddits with fewer posts tend to show negative effects of concreteness. Therefore, it is difficult to disentangle whether popularity is driving the concreteness effect, or vice versa. This motivates the experiment we present below.

Figure 4.

Standardized effect size (coefficient estimates) of concreteness across subreddits from separate regression models for data from each subreddit. The effect size represents the coefficient estimate of a linear regression model to predict the number of times each post is upvoted.

Study 3

The above two studies indicate that the likelihood of sharing or upvoting Twitter and Reddit posts is positively correlated with their concreteness. However, as the other variables indicate, this could be for a variety of reasons, such as content alignment with other news sources and complex interactions between different variables. A causal demonstration of concreteness requires an experimental manipulation. Additionally, because averaging concreteness scores may obscure meaningful within-post variation, we need a more controlled approach to manipulate concreteness. In order to achieve this, we ran a controlled experiment that independently manipulated concreteness in an ecologically valid context using social media posts. We used ChatGPT to slightly alter posts to make two new posts that were more and less concrete. Participants then indicated which post they would be more likely to share. This directly tests the impact of concreteness and allows us to additionally control for other features of language variation as described below.

Methods

The experiment and analysis were pre-registered (https://osf.io/qp8ms). We received ethics approval from the University of Warwick HSSREC. The experiment has a within-subjects design. Each participant was shown a sequence of 55 questions in a random order. In each question, participants were presented with a pair of posts that slightly differed in their language. They were asked to select the one they are more likely to upvote(like). Participants were presented with the prompt “Among the following two Reddit posts, please select the one you are more inclined to upvote or share.” A one-sample t-test with the proportion of more-concrete options chosen was used to test for a preference for concreteness. We also fit a logistic regression with the choice of the more-concrete option as the dependent variable and the differences in psycholinguistic features between pairs of statements as independent variables.

For example, one pair of posts were “Twitter profiles displaying explicit gender pronouns (such as she/her, he/him, and they/them) have seen a marked rise in prevalence since the last decade” (more concrete) and “Twitter user bios denoting personal pronouns have become notably widespread in the past decade” (less concrete).

For the stimuli selection procedure, we began by collecting posts from Reddit’s front page on 23 December, forming the basis of our dataset. We filtered out posts from subreddits focused on visual, auditory, or animated content, ensuring our stimuli were text-based. Posts were then ranked by concreteness, with the median five posts from each relevant category selected. Further exclusions were made for excessively long, irrelevant, or unpopular posts. The selected posts were then edited by ChatGPT to create more and less concrete versions, followed by manual quality checks. Length and other linguistic features were controlled between pairs of statements. Finally, we validated the effectiveness of these manipulations by using concreteness ratings from Brysbaert, Warriner, et al. (2014) to compute averaged concreteness score for each post across words, ensuring that the manipulated content differs on concreteness (see the more detailed stimuli selection procedure in the pre-registration: https://osf.io/qp8ms).

Two attention checks were embedded within the questions, and participants who failed either check were excluded from the analysis. For example, in one of the attention checks, the statements were: “The Pentagon is using smart technology to make tough choices about weapons that can act on their own. Many experts believe the U.S. will have fully self-operating lethal weapons in a few years. [This is an attention check. These two options are the same. Please choose the other option].” and “The Pentagon is using smart technology to make tough choices about weapons that can act on their own. Many experts believe the U.S. will have fully self-operating lethal weapons in a few years.”

The study was conducted online using the Qualtrics online survey platform and Research Experience platform of the University of Warwick. 192 students consented to taking part in the study. Twenty-three participants failed the attention checks. Two participants did not finish the study. One hundred and sixty-seven participants remained in the dataset.

Results

A one-sample t-test showed that participants chose the more-concrete option significantly more often than the less-concrete option (t(166) = 15.36, p<.001). A non-parametric Wilcoxon signed-rank test supported this result (V=13249, p<.001).

A multiple logistic regression found that more concrete posts were more likely to be chosen, with a strong effect size for the intercept in the regression. In addition, differences in concreteness, age of acquisition, arousal, all significantly predicted the likelihood of choosing the more concrete option. This indicates that higher concreteness and arousal increase the likelihood of participants choosing the more-concrete option while lower age of acquisition increases the likelihood of participants choosing the more-concrete option (Figures 5 and 6). We also ran a linear regression with the percentage of selections for each more-concrete statement as the dependent variable and the difference of concreteness between each pair as the independent variable and found that the greater the difference of concreteness between each pair of post, the more likely participants would choose the more-concrete option (Figure 7). The intercept confidence interval spans 50%, which aligns with there being no preference when posts have the same concreteness (Table S3).

Figure 5.

Standardized effect size (coefficient estimates) of psycholinguistic features for experiment data (55 pairs of posts). The effect size represents the coefficient estimate of a logistic regression model to predict whether a more concrete option was chosen. The error bars represent 95% confidence intervals.

Figure 6.

Percentage of choice for experiment data (55 pairs of posts).

Figure 7.

Percentage of selections for each more-concrete statement against the difference of concreteness between each pair. The points represent each pair of posts, with the difference in concreteness between the posts (x-axis) and the number of participants who chose the more concrete post. The red line shows a linear regression fit.

Discussion

Concreteness is a property of language that shapes the way we communicate and the way language changes over time (Brysbaert, Warriner, et al., 2014; Hills & Adelman, 2015; Hills et al, 2017; Li et al., 2024). Here, we examined if this effect would be observable on the likelihood of sharing social media posts. We used data from two different social media platforms to examine the connection between concreteness and the behaviours of retweeting and upvoting. In Study 1, we examined if people tend to retweet more concrete Twitter posts using around 15 million Twitter posts consisting of 35 different topics. In Study 2, we analysed if concrete information is more likely to be upvoted using around 50 thousand scraped posts from Reddit. Both studies converged on the same conclusion: concreteness positively predicts information sharing/upvoting. In Study 3, we verified these findings with a behavioural experiment that showed that participants prefer more concrete statements.

Overall, we found compelling evidence that concrete language is preferred by people when deciding between pieces of information. People are more likely to share, or propagate, concrete information. This is consistent with the observation that concrete language is more likely to survive story retellings in the laboratory and to persist in language over hundreds of years (Li et al., 2024). With the current study providing supporting evidence in the form of preferences for sharing more concrete information in modern media environments, people’s preferences for sharing concrete language are likely to be an important factor in the rise of concrete language more generally (see Hills & Adelman, 2015).

While our findings highlight the role of linguistic concreteness in shaping engagement on social media, it is important to acknowledge that engagement metrics (e.g., retweets, upvotes) are influenced by additional extrinsic factors that were not explicitly modelled in our analyses. For instance, the follower count of the original poster can significantly affect the visibility and spread of a post, as accounts with larger audiences are more likely to have their content amplified. Moreover, platform-specific algorithms play a critical role in content dissemination by prioritising certain types of posts based on factors such as recency, interaction history, or controversiality (Lorenz-Spreen et al., 2021). These algorithmic interventions may introduce biases that are independent of a post’s linguistic features. Although these factors were not controlled for in our models—primarily due to data availability constraints—it is important to consider whether they systematically interact with linguistic concreteness. If, for example, platform algorithms or high-profile users tend to favour concrete language, then our observed effects may be partially mediated by these extrinsic influences. Conversely, if algorithms prioritise more abstract or provocative content, our results might be an underestimate of the true effect of concreteness on engagement. These concerns are partly allayed by Study 3, which also found a preference for greater concreteness, despite the absence of such extrinsic factors.

Also, while our study highlights the role of concreteness in shaping engagement in online communication, our results also show that other linguistic features, such as arousal and valence, sometimes have stronger effects on engagement metrics. This suggests that while concreteness is an important driver of information propagation, it operates within a broader landscape of word-level properties that shape online discourse. From the perspective of information markets, this finding is consistent with prior research showing that emotionally charged content—particularly highly arousing or positively/negatively valenced language—can enhance engagement by capturing attention and eliciting stronger reactions (Berger & Milkman, 2012; Ito et al., 1998; Robertson et al., 2023). In fast-paced digital environments, where users are constantly exposed to competing information, high-arousal content may spread more rapidly simply because it is more attention-grabbing, whereas concreteness may play a stronger role in influencing credibility and comprehension.

Furthermore, while our study relies on Brysbaert, Stevens, et al.’s (2014) concreteness norms, these ratings are based on isolated words and static judgments, which may not fully capture how concreteness functions in dynamic, context-dependent online discourse. Words can shift in meaning depending on phrasing and usage (e.g., “cell” as a biological unit vs. a terrorist group), and multi-word expressions are not accounted for in single-word ratings. Additionally, the norms lack coverage for internet slang, technical jargon, and emerging words, leading to missing data that could bias engagement estimates. Future work could use context-based word embeddings to generate concreteness scores dynamically.

One of the limitations of our experiment is that we did not disentangle concreteness and specificity, mainly because we wanted to control the meaning of the statements being rated. The correlation between specificity and concreteness among the terms we use is .255. Future research could treat concreteness and specificity as two different dimensions to investigate their effects and interactions.

Despite these limitations, the robustness of our findings across platforms suggests that linguistic concreteness remains a meaningful predictor of engagement. Future studies could incorporate more nuanced, dynamic measures of concreteness and examine how it interacts with other linguistic and extrinsic engagement factors, such as author influence, content novelty, and platform algorithms.

Why is concrete language preferred? One potential explanation stems from dual-coding theory (Paivio et al., 1994), according to which memory of abstract content involves only verbal processing, whereas memory of concrete content involves both imaginal and verbal processing. The greater involvement of cognitive processing may make this information more stimulating or satisfying, and people may infer the impact of this on others when sharing information. Further work is needed to connect the preference for concreteness with underlying neural and psychological mechanisms.

Beyond its role in enhancing engagement and retention, concreteness may also interact with other cognitive and emotional mechanisms that influence the spread of fake news. One promising direction for future research lies in understanding how linguistic concreteness interacts with word-level features such as age of acquisition and emotional tone. Future work could examine how concreteness, in combination with humour and early-acquired language, contributes not only to the spread of information but also to the formation of beliefs—whether accurate or not.

Overall, our findings speak to the importance of word features, and specifically concreteness, in online communication. This fits into a broader literature on the role of concreteness in facilitating cognitive processing, in both recognitions, encoding, and later recall. The attributes are critical to the life cycle of information (Hills, 2019), and the present study suggests that we can see that influence even in short-form media. In addition to the rise in concreteness in language more generally (Hills & Adelman, 2015), short-form media is also on the rise (Pilgrim et al., 2024). Future work could look across content forms to evaluate whether or not short-form media is generally higher in concreteness than long-form media, and may in part be facilitating its rise. If the history of communication is partly about communicating more information more rapidly into the heads of others, then concreteness and the properties it carries are deserving of more study and may be critical to understanding how to deliver better content, why some content is stickier than others (e.g., Giese et al., 2020; Jagiello & Hills, 2018).

Supplemental Material

sj-docx-1-qjp-10.1177_17470218251392831 – Supplemental material for Concrete language enhances sharing of social media posts on Twitter, Reddit, and experimentally

Supplemental material, sj-docx-1-qjp-10.1177_17470218251392831 for Concrete language enhances sharing of social media posts on Twitter, Reddit, and experimentally by Danyang Hu, Charlie Pilgrim, Weize Zhao and Thomas T. Hills in Quarterly Journal of Experimental Psychology

Footnotes

ORCID iD

Charlie Pilgrim

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Accessibility Statement

The data from the present experiment are publicly available at the Open Science Framework website: ;

Supplementary Material

Supplementary Material is available at:

References

Akehurst

Köhnken

Vrij

Bull

(1996). Lay persons’ and police officers’ beliefs regarding deceptive behaviour. Applied Cognitive Psychology, 10(6), 461–471.

Barber

H. A.

Otten

L. J.

Kousta

S. T.

Vigliocco

(2013). Concreteness in word processing: ERP and behavioral effects in a lexical decision task. Brain and Language, 125(1), 47–53.

Berger

Moe

W. W.

Schweidel

D. A.

(2023). What holds attention? Linguistic drivers of engagement. Journal of Marketing, 87(5), 793–809.

Berger

(2011). Arousal increases social transmission of information. Psychological Science, 22(7), 891–893.

Berger

Milkman

K. L.

(2012). What makes online content viral? Journal of Marketing Research, 49(2), 192–205.

Bhatia

Walasek

(2016). Event construal and temporal distance in natural language. Cognition, 152, 1–8.

Bonnevie

Gallegos-Jeffrey

Goldbarg

Byrd

Smyser

(2021). Quantifying the rise of vaccine opposition on Twitter during the Covid-19 pandemic. Journal of Communication in Healthcare, 14(1), 12–19.

Borgida

Nisbett

R. E.

(1977). The differential impact of abstract vs. concrete information on decisions 1. Journal of Applied Social Psychology, 7(3), 258–271.

Brysbaert

Stevens

De Deyne

Voorspoels

Storms

(2014). Norms of age of acquisition and concreteness for 30,000 Dutch words. Acta psychologica, 150, 80–84.

10.

Brysbaert

Warriner

A. B.

Kuperman

(2014).Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46, 904–911.

11.

Brysbaert

Ellis

A. W.

(2016). Aphasia and age of acquisition: Are early-learned words more resilient? Aphasiology, 30(11), 1240–1263.

12.

Connell

Lynott

(2012). Strength of perceptual experience predicts word processing performance better than concreteness or imageability. Cognition, 125(3), 452–465.

13.

Conover

Ratkiewicz

Francisco

Gonçalves

Menczer

Flammini

(2011). Political polarization on twitter. Proceedings of the International AAAI Conference on Web and Social Media, 5(1), 89–96.

14.

Davenport

T. H.

Beck

J. C.

(2001). The attention economy. Ubiquity, 2001(May), 1-es.

15.

DePaulo

B. M.

Lindsay

J. J.

Malone

B. E.

Muhlenbruck

Charlton

Cooper

(2003). Cues to deception. Psychological Bulletin, 129(1), 74.

16.

Dechêne

Stahl

Hansen

Wänke

(2010). The truth about the truth: A meta-analytic review of the truth effect. Personality and Social Psychology Review, 14(2), 238–257.

17.

Doest

L. T.

Semin

(2005). Retrieval contexts and the concreteness effect: Dissociations in memory for concrete and abstract words. European Journal of Cognitive Psychology, 17(6), 859–881.

18.

Epstein

Berinsky

A. J.

Cole

Gully

Pennycook

Rand

D. G.

(2021). Developing an accuracy-prompt toolkit to reduce Covid-19 misinformation online.

19.

Engelthaler

Hills

T. T.

(2018). Humor norms for 4,997 English words. Behavior Research Methods, 50, 1116–1124.

20.

Engelthaler

Hills

T. T.

(2017). Feature biases in early word learning: Network distinctiveness predicts age of acquisition. Cognitive science, 41, 120–140.

21.

Elsherif

M. M.

Preece

Catling

J. C.

(2023). Age-of-acquisition effects: A literature review. Journal of Experimental Psychology: Learning, Memory, and Cognition, 49(5), 812–847. https://doi.org/10.1037/xlm0001215

22.

Evans

D. S.

(2020). The economics of attention markets. Available at SSRN 3044858.

23.

Falkinger

(2008). Limited attention as a scarce resource in information-rich economies. The Economic Journal, 118(532), 1596–1620.

24.

Fazio

L. K.

Brashier

N. M.

Payne

B. K.

Marsh

E. J.

(2015). Knowledge does not protect against illusory truth. Journal of Experimental Psychology: General, 144(5), 993.

25.

Giese

Neth

Moussaïd

Betsch

Gaissmaier

(2020). The echo in flu-vaccination echo chambers: Selective attention trumps social influence. Vaccine, 38(8), 2070–2076. https://doi.org/10.1016/j.vaccine.2019.11.038

26.

Goebel

J. T.

Susmann

M. W.

Parthasarathy

El Gamal

Garrett

R. K.

Wegener

D. T.

(2024). Belief-consistent information is most shared despite being the least surprising. Scientific Reports, 14(1), 6109.

27.

Hanley

J. R.

Hunt

R. P.

Steed

D. A.

Jackman

(2013). Concreteness and word production. Memory & Cognition, 41, 365–377.

28.

Hansen

Wänke

(2010). Truth from language and truth from fit: The impact of linguistic concreteness and level of construal on subjective truth. Personality and Social Psychology Bulletin, 36(11), 1576–1588.

29.

Hansen

M. T.

Haas

M. R.

(2001). Competing for attention in knowledge markets: Electronic document dissemination in a management consulting company. Administrative Science Quarterly, 46(1), 1–28.

30.

Hefti

Lareida

(2021). Competitive attention, superstars and the long tail. Working Paper (No. 383). University of Zurich, Department of Economics.

31.

Hills

T. T.

(2019). The dark side of information proliferation. Perspectives on Psychological Science, 14(3), 323–330.

32.

Hills

C. T. T.

Adelman

J. S.

Noguchi

(2016). Attention economies, information crowding, and language change. In Big data in cognitive science (pp. 279–302). Psychology Press.

33.

Hills

T. T.

Adelman

J. S.

Noguchi

(2017). Attention economies, information crowding, and language change. In Jones

M. N

. (Ed.), Big data in cognitive science (pp. 270–293). Routledge.

34.

Hills

T. T.

Adelman

J. S.

(2015). Recent evolution of learnability in American English from 1800 to 2000. Cognition, 143, 87–92.

35.

Ito

T. A.

Larsen

J. T.

Smith

N. K.

Cacioppo

J. T.

(1998). Negative information weighs more heavily on the brain: The negativity bias in evaluative categorizations. Journal of Personality and Social Psychology, 75(4), 887.

36.

Kim

S. C.

Vraga

E. K.

Cook

(2021). An eye tracking approach to understanding misinformation and correction strategies on social media: The mediating role of attention and credibility to reduce HPV vaccine misperceptions. Health Communication, 36(13), 1687–1696.

37.

Jagiello

R. D.

Hills

T. T.

(2018). Bad news has wings: Dread risk mediates social amplification in risk communication. Risk Analysis, 38(10), 2193–2207. https://doi.org/10.1111/risa.13117

38.

Johnson

M. K.

(2006). Memory and reality. American Psychologist, 61(8), 760.

39.

Johnson

M. K.

Hashtroudi

Lindsay

D. S.

(1993). Source monitoring. Psychological Bulletin, 114(1), 3.

40.

Johnson

M. K.

Raye

C. L.

(1981). Reality monitoring. Psychological Review, 88(1), 67.

41.

Kearney

M. W.

(2018). tweetbotornot: Classify Twitter users as bot or not (R package version 0.1.0) [Computer software]. https://github.com/mkearney/tweetbotornot

42.

Kuperman

Stadthagen-Gonzalez

Brysbaert

(2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44, 978–990.

43.

Koehler

D. J.

(1991). Explanation, imagination, and confidence in judgment. Psychological Bulletin, 110(3), 499.

44.

Breithaupt

Hills

Lin

Chen

Siew

C. S.

Hertwig

(2024). How cognitive selection affects language change. Proceedings of the National Academy of Sciences, 121(1), e2220898120.

45.

Liberman

Trope

(2008). The psychology of transcending the here and now. Science, 322(5905), 1201–1205.

46.

Lorenz-Spreen

Mønsted

B. M.

Hövel

Lehmann

(2019). Accelerating dynamics of collective attention. Nature Communications, 10(1), 1759.

47.

Lorenz-Spreen

Geers

Pachur

Hertwig

Lewandowsky

Herzog

S. M.

(2021). Boosting people’s ability to detect microtargeted advertising. Scientific Reports, 11(1), 15541.

48.

Lorenz-Spreen

Oswald

Lewandowsky

Hertwig

(2023). A systematic review of worldwide causal and correlational evidence on digital media and democracy. Nature Human Behaviour, 7(1), 74–101.

49.

McCallum

A. K.

(2002). Mallet: A machine learning for language tool kit. http://mallet.cs.umass.edu

50.

Müller

Schwarz

(2023). From hashtag to hate crime: Twitter and antiminority sentiment. American Economic Journal: Applied Economics, 15(3), 270–312.

51.

Nabi

R. L.

Moyer-Gusé

Byrne

(2007). All joking aside: A serious investigation into the persuasive effect of funny social issue messages. Communication Monographs, 74(1), 29–54.

52.

National Institute of Mental Health (NIMH). (2023). Negative valence systems.

53.

Paivio

Yuille

J. C.

Madigan

S. A.

(1968). Concreteness, imagery, and meaningfulness values for 925 nouns. Journal of Experimental Psychology, 76, 1–25.

54.

Paivio

Walsh

Bons

(1994). Concreteness effects on memory: When and why? Journal of Experimental Psychology: Learning, Memory, and Cognition, 20(5), 1196–1204.

55.

Pennycook

McPhetres

Zhang

J. G.

Rand

D. G.

(2020). Fighting Covid-19 misinformation on social media: Experimental evidence for a scalable accuracy-nudge intervention. Psychological Science, 31(7), 770–780.

56.

Picard

McEwen

B. S.

(2018). Psychological stress and mitochondria: A systematic review. Biopsychosocial Science and Medicine, 80(2), 141–153.

57.

Pilgrim

Guo

Hills

T. T.

(2024). The rising entropy of English in the attention economy. Communications Psychology, 2(1), 70.

58.

Pothos

E. M.

Lewandowsky

Basieva

Barque-Duran

Tapper

Khrennikov

(2021). Information overload for (bounded) rational agents. Proceedings of the Royal Society B, 288(1944), 20202957.

59.

R Core Team (2024). _R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

60.

Robertson

C. E.

Pröllochs

Schwarzenegger

Pärnamets

Van Bavel

J. J.

Feuerriegel

(2023). Negativity drives online news consumption. Nature Human Behaviour, 7(5), 812–822.

61.

Sadoski

Kealy

W. A.

Goetz

E. T.

Paivio

(1997). Concreteness and imagery effects in the written composition of definitions. Journal of Educational Psychology, 89(3), 518.

62.

Shannon

(1949). The mathematical theory of communication. In Shannon

Weaver

(Eds.), The mathematical theory of communication (pp. 1-20). Urbana: University of Illinois Press.

63.

Sherman

S. J.

Cialdini

R. B.

Schwartzman

D. F.

Reynolds

K. D.

(1985). Imagining can heighten or lower the perceived likelihood of contracting a disease: The mediating effect of ease of imagery. Personality and Social Psychology Bulletin, 11(1), 118–127.

64.

Simon

H. A.

(1996). Designing organizations for an information-rich world. International Library of Critical Writings in Economics, 70, 187–202.

65.

Snefjella

Kuperman

(2015). Concreteness and psychological distance in natural language use. Psychological Science, 26(9), 1449–1460.

66.

Terranova

(2012). Attention, economy and the brain. Culture Machine, 13.

67.

Trope

Liberman

(2010). Construal-level theory of psychological distance. Psychological Review, 117(2), 440.

68.

Twitter logistics (2024). Twitter user statistics. Retrieved June 5, 2024, from https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/#:~:text=Twitter%20has%20368%20million%20monthly,in%202022%20was%20%244.4%20billion.

69.

Vellani

Zheng

Ercelik

Sharot

(2023). The illusory truth effect leads to the spread of misinformation. Cognition, 236, 105421.

70.

Walker

Hulme

(1999). Concrete words are easier to recall than abstract words: Evidence for a semantic contribution to short-term serial recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25(5), 1256–1271. https://doi.org/10.1037/0278-7393.25.5.1256

71.

Warriner

A. B.

Kuperman

Brysbaert

(2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191–1207.

72.

Williams

J. M.

Bizup

(2017). Style: Lessons in clarity and grace (12th ed.). Pearson Education Inc.

73.

Winkielman

Cacioppo

J. T.

(2001). Mind at ease puts a smile on the face: Psychophysiological evidence that processing facilitation elicits positive affect. Journal of Personality and Social Psychology, 81(6), 989.

74.

Zhao

Walasek

Brown

G. D.

(2023). The evolution of polarization in online conversation: Twitter users’ opinions about the Covid-19 pandemic become more politicized over time. Human Behavior and Emerging Technologies, 2023(1), 9094933.

75.

Zubiaga

Liakata

Procter

Wong Sak Hoi

Tolmie

(2016). Analysing how people orient to and spread rumours in social media by looking at conversational threads. PloS One, 11(3), e0150989.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.77 MB