Abstract
For understandable reasons, scholarly impact statistics have become a contentious issue for university faculty. They often look to their librarians to advise them on how best to monitor their performance, and what they could do to raise their profile. The present investigation seeks to equip those librarians with background and tools to provide useful perspective to their worried patrons. For over forty years the literature has been debating what characteristics of an article influence its later citation. While many suppose that outcome is determined solely by the quality and originality of the piece, one of the consistent findings has been that arguably irrelevant features appear to play an important role. The present discussion focuses on two of the most prominent such features, whether the article title includes a colon, and how long that title is. Both of these variables have been widely researched, but the outcomes are not typically offered in a form that will be useful to faculty patrons. Specifically, while both colons and shorter titles, for whatever reasons, reliably correlate with higher citations, these patterns vary by discipline and are not conveniently aggregated and reported. To fill this need, results have been extracted from seventy-four empirical investigations and presented by discipline. A wide range of disciplinary variance was found for these two variables which can be considered by an author. This collection of findings also has permitted correction of prior hypotheses about why such apparently irrelevant elements influence citation, which can improve understanding of the drivers of scholarly impact statistics.
Introduction
In discussions of scholarly impact, a traditional starting point has usually been to ask how content quality determines impact measures. An influential article will be one that presents provocative and novel ideas that are cleanly argued, for example. Over time, however, and with little fanfare, the focus has flipped from asking how to tailor content to improve impact, to questions about how the desire for impactful articles should influence content. Looking beyond subjective criteria such as article quality or originality, comparatively objective variables such as article length and journal placement are manipulated to drive subsequent citations. These correlations become so persuasive as to be invoked as prescriptive suggestions on how to structure articles so that they generate great impact (e.g. Ebrahim et al., 2013).
If once we asked what authors could do to increase their rankings, now we ask how rankings influence what goes into articles. For example, if article abstracts once sketched the results of the research, many today leave those details aside to force the initially interested reader to download the full paper to find these details, thereby plumping unnecessarily one of the significant impact numbers (Hyland and Zou, 2022). In this light it is not surprising that those added downloads do not lead to more citations: Braticevic et al. (2020) found that declarative titles (those that describe both the subject of the paper and the main conclusions, such as “Experimental Replication Shows Knives Made from Frozen Human Feces Do Not Work,” (Eren et al., 2019)) receive more citations than descriptive titles, those that describe only the topic of the paper.
The motivations to chase after impact statistics are clear: “Getting published is good, but ideally one will also be read. Better still, one’s writing will persuade and enlighten readers, influence their work, and stand as a lasting contribution to knowledge” (Haslam et al., 2008: 169). Writers within the academy exert time and energy tracking, measuring, and seeking to improve their numbers, which can then be cashed out for promotions, grants, and other acknowledgments of achievement. This prize has become increasingly difficult to achieve. Elsevier, a prominent scholarly publisher, reports in 2023 that its “2900 journals published more than 630,000 articles, from almost 3 million submitted” (RELX, 2023: 20). Perhaps even more distressing for authors has been the finding that the almost desperate effort to increase one’s citations plays out against research demonstrating that approximately 90% of published papers in academic journals are never cited by others, ever (Hamilton, 1990, 1991; Meho, 2007: 32).
The competition to be downloaded, read, and cited, is therefore increasingly fierce, and any advantageous edge is desired. Granting that quality content is an obvious first step, what other elements would help a published article stand out within the ocean of offerings competing for time and attention? Beleaguered scholars often turn to librarians to answer such questions. Because librarians commonly have responsibility to curate the collections of faculty publications, and are often delegated the task to collect and report faculty impact numbers, they are thought to also have deep insight into the broader research publication industry: What gets accepted, where, to what end? The present article attempts to extract from existing piecemeal findings the broader trends that will allow librarians to more effectively offer such guidance.
Despite its prevailing obsession, what is meant by “scholarly impact” is a bit unclear. For most purposes it is treated as synonymous with the citations the work receives, most often in subsequent academic work but sometimes expanded to include ephemeral mentions in blogs and tweets. This paper works under this assumption that citation counts are the gold standard of the impact of papers, despite empirical evidence that the automated tools used to extract those figures are often confounded by extraneous variables such as the presence in the titles of nonalphanumeric characters such as hyphens (Zhou et al., 2021), leading to underreporting of the articles’ later citations. Tahamtan and Bornmann (2019) similarly review the complicated relationships between citing and cited documents, further complicating assumptions about any direct and simple significance of citation counts. An alternative statistic, downloads, appears to be sufficiently well correlated with citations, leading some faculty to prominently announce their latest numbers, despite the fact that many downloaded articles are never read, much less later cited (Jamali and Nikzad, 2011; Subotic and Mukherjee, 2014). At best, current downloads are a preliminary but uncertain indication of future citations.
For these and other reasons, a literature has developed that seeks to identify ways (beyond, of course, authoring substantively high quality and insightful articles) to improve citation rates. Surprisingly, many features with no obvious relationship to the quality of the article have been found to correlate with higher citation rates.
Beyond format (open access or subscription) or venue (rank of the outlet), structural elements of the paper have been determined to have an apparent influence. For example, one examination of papers in evolutionary psychology found that articles with more references, and those with more authors, were in turn cited more often (Webster et al., 2009). Recurring elements of interest have concerned the title, particularly its length and whether it includes a colon, two of the most studied article title characteristics (Milojević, 2017). The extensive research on this possibility has found that while these features of the title do relate to subsequent citations on the whole, the details vary by discipline.
While comprehensive reviews of this literature have appeared (e.g. Tahamtan et al., 2016), those works do not present the findings in an easily accessible manner. The first goal of this paper has been to make the published research results readily useful to answer questions from nervous faculty scholars. A generalized empirical observation, for example, that shorter titles consistently correlate to more citations is only helpful if one knows what the average title length is for the faculty member’s specific discipline. What is short for a medical journal, for instance, may be grossly oversized for a mathematics submission. Given those disciplinary differences, the second goal will be to analyze the patterns within the data to attempt to derive a working model that will account for both the general correlation and the observed disciplinary variation.
Colons
Social psychologists who study physical attractiveness have identified a “halo effect,” by which a single variable—the subject’s good looks—is used by onlookers to infer values for other features, such as trustworthiness or good character, despite there being no obvious understanding how physical attractiveness could relate to any such personality traits (Patzer, 1985: 8). This phenomenon has been particularly well documented in the context of jury trials (Wiley, 1995). Similarly, components of a paper are argued to influence interest and reactions of readers despite there being no discernible connection to its more substantive qualities. Research into how structural elements that are irrelevant to the paper’s subjective quality might significantly influence later citation was initiated by James T. Dillon, who focused on the appearance of a colon in article titles.
The first use of the titular colon—having evolved from earlier use of the dash, period, comma, and semicolon—can be attributed to an 1886 article by Henry Alfred Todd (Dillon, 1982). A century later, examining article titles in 30 then-current issues of leading journals in psychology, education, and literary criticism, Dillon (1981) found colons in 72% of cases, far exceeding the rate found in unpublished or nonresearch articles. This finding has been replicated: From 21,000 titles in ecology and aquatic sciences, Perry (1985) found that the more scholarly journals had a higher percent of colonicity than less scholarly publications, ranging from 22.5 to 15.4%. Dillon used this differential to hypothesize that the use of a titular colon signals the high scholarly quality of the work as measured by “publishability, productivity, complexity of thought, distinction of endeavor, and progress of the enterprise” (Dillon, 1981).
Dillon does not go further and construe the colon’s influence as paying out with increased citations, but that would appear to be the reasonable outcome of his argument. As Table 1 below demonstrates, there has been no lack of subsequent research inspired by this suggestion to test whether the relationship indeed exists. Despite such support, not everyone is convinced that deliberately increasing the number of colonic titles will improve the perceived quality of academic productions. Kerr (2014), for example, goes so far as to opine that, at least in law, Dillon has it exactly backward, because the colonic title is more common among student works than among faculty scholarship (Deahl and Eskandari, 2006). Empirical outcomes, we begin to see, vary across disciplines.
Disciplinary variance of colonic titles. .
: No citation data; +: Positive relationship; -: Negative relationship; *: No relationship.
The practical utility of this line of investigation is presaged in Dillon’s own work. He applies his descriptive result to prescribe how scholars can improve the positive reception of their works, holding that to “achieve scholarly publication, a research title should be divided by a colon into shorter and longer pre-and postcolonic clauses, respectively, the whole not to fall below a threshold of 15-20 words minimum” (Dillon, 1981). Rather than counseling academics to produce better works, he provides the same advice to scholars as attorneys give to defendants, to polish the external appearance to better influence perceptions of presumed good character from judges and juries. By including the hallmark of exceptional scholarship, the titular colon, readers, Dillon argues, will assume the high quality of the offering and at least take the time to give it a further look.
Title length
As subsequent workers pushed Dillon’s research agenda, the length of the article title has been especially scrutinized. This possibility was baked into Dillon’s original suggestion describing the positive signaling of colons. As he noted, the appearance of a colon often influences the structure of the title, tending to make them longer, leading him to identify what he thought would be the optimal length of article titles, not less than fifteen or twenty words. The question would then become whether any such correlation was a function of the presence of a colon, or whether length exerted an effect of its own.
To begin, there are independent reasons to entertain the possibility that the title per se wields great influence on the subsequent treatment of the article after publication. “The title of a research article is an abstract of an abstract. Titles play a decisive role in convincing readers at first sight whether articles are worth reading or not” (Diao, 2012). Titles alert the reader to the content of the paper, and stimulate curiosity to go further. In the ideal sequence, attention results in reading, which culminates in citation. Initial appraisal of an article, therefore, can be determined less by what it contains than how it is packaged and labeled; to falter on the latter renders the former potentially bootless.
Given the fundamental importance of the title, various possibilities can be offered to arrive at its optimal length. Guo et al. (2018), for example, thought that short titles would be more appealing prior to the computerization of research, but thereafter that longer titles would allow for more metadata that would increase its appearance in search results. Within any such general principles, Busch-Lauer (2000: 92) reasoned that the disparate intellectual disciplines, each having its own tradition on scholarly writing style, would be expected to favor varying average article lengths. This suggestion is borne out through the summary of empirical reports below in Table 2.
Disciplinary variance of average title length.
No citation data; +Positive relationship; –Negative relationship; *No relationship.
Review method and results
Using traditional bibliographic tools such as Google Scholar and EBSCO’s “Library Literature & Information Science” database, using keywords such as “scholarly impact,” “title,” “colon,” and “length,” an initial pool of article were pulled that both dealt with the general topic while providing data broken out by academic discipline. The references of these papers were then examined to identify additional contributions to the literature. The gathering of further literature ceased when results reported appeared to simply verify the relationships already recorded, without altering the overall patterns.
This process yielded 74 research articles investigating relationships between structural variables such as colon use and title length with later citation. Although this collection does not represent the complete universe of published scholarship, it captures data from as many academic disciplines as was possible. The results from these reports were extracted and inserted into either Table 1: Disciplinary Variance of Colonic Titles, or Table 2: Disciplinary Variance of Average Title Length. Because many authors were focused on Dillon’s hypothesis of the use of colons as a correlate of high scholarship, relating the colon directly to citations was not always a primary question. Consequently, some reports were descriptive only on the primary variable, without looking into any possible dependent variable citation effects, and thus “N/A” appears frequently. These have been included in the tables, however, in the belief that the raw rates of colonic titles and article length are of intrinsic interest for librarians and faculty even if not always correlated with citation rates, in order to discern the discipline’s typical values on that measure.
Table entries are rank-ordered based on the primary values in column 4. Where article length is recorded in characters rather than the more common unit of words, the conversion factor of one word to five characters was applied. When several articles report on the same discipline, they are ordered in reverse chronological order. The most recent report is in bold, while earlier investigations are in parentheses for more ready comprehension. A brief summary of the sample used by each report has been included to allow a sense of the representativeness of its results. Generally, results from wider and more recent samples are weighted more heavily during interpretation of disciplinary trends when intradisciplinary conclusions disagreed across studies.
Summary and discussion
Colonic titles
Of the 31 disciplines represented within the summarized reports, seventeen reported only average occurrences of colons without measuring how that use may have impacted citation rates. Of the remaining fourteen disciplines, nine found that the presence of a titular colon correlated with higher article citations, while only one area found a negative outcome: Biomedical Research. Law stands out as an ambiguous category, with two results of comparable quality reaching opposite conclusions, although the review of selected law reviews by Deahl and Eskandari (2006: 15) landed on the conclusion that “the worse the article is, the more likely it is to have a colon in its title.” Finally, three disciplines found no relationship between colons and citations, including Earth Sciences and Computer Science. Psychology falls into this category as well despite having some earlier results showing the predicted positive correlation, because its most recent study, with the largest sample, produced a null finding.
All told, the weight of the global evidence appears to favor Dillon’s broad suggestion that the presence of a colon in the title correlates with more frequent citations in later publications. However, a more fine-grained analysis tells a different story. The average disciplinary frequency of colonic titles ranges from single digits in Philosophy and Mathematics, to a high of over 81% in History. Although an imperfect pattern, Hartley (2007) suggests the data points toward an increase of titles with colons as one moves “from the natural sciences to the arts and humanities.” The pattern of the data in Table 1 appears to support this generalization. This trend suggests that the preference for colons in titles is grounded in the traditional forms of academic writing of each discipline, which determine what makes a submission “look right” to other experts, rather than any universal intellectualist factor such as Dillon proposed.
We are then left with two observations that require some explanation. First, why do disciplines systemically vary in terms of the frequency with which colons are employed in titles, and second, regardless of the rate at which colons appear, why might the use of colons increase an article’s citation?
On the first question, it is fair to note that the varied intellectual disciplines have a favored default style when it comes to its typical contributions. For example, a law review submission where most pages are not half footnotes and half text would look odd to most members of the bar, with corresponding negative assumptions about the quality of research on offer. While articles in the humanities and social sciences typically run from thirty to fifty pages double-spaced manuscript pages, law submissions editors attempt to impose (often unsuccessfully) a 30,000-word ceiling, or approximately 100 manuscript pages. In the opposite direction, published pieces in the technical and hard sciences are ordinarily between 4 and 5,000 words, and are further distinguished by often having far more authors. The longest section of the paper in these disciplines is that which describes the methodology, something that rarely appears in the “soft” disciplines like philosophy or anthropology (Choueiry, 2024). Against such programmatic differences, the suggestion that the signaling impact of a title colon would also vary across disciplines seems reasonable, thus raising the question as to the nature of that signal.
While the disciplines diverge in terms of the degree to which the colonic title is used, where they do not differ greatly is in regard to the impact of that use when it appears. As described, with only few exceptions does the use of a colon in the title not correlate with more citations when compared with similar articles that lack that punctuation. Dillon defended a suitably uniform explanation, that colons are used by better scholars, perhaps because using one correctly is not always self-evident especially when it comes to appropriately balancing the two sides of the partitioned title.
There are several reasons to suspect that Dillon has got this wrong. First, if that were the case, semicolons, which are especially difficult to use correctly, would surely be expected to be the better signal of higher scholarly quality. What little evidence exists on this possibility points in the opposite direction, however, as Dillon claimed that colonic titles evolved out of an earlier stage using semicolons. The difficulty with this position is that as a mark of punctuation colons preceded the appearance of the semicolon (Rhodes, 2019), making it unlikely that use of the colon can be explained by the prior use of its successor. Second, if the colon signals heightened intellectual sophistication, that does not explain why, in at least some disciplines, they occur more often in student writings than that of experts (Deahl and Eskandari, 2006), or why colons are not consistent across all disciplines, if we assume that practitioners across the academy are presumptively of equal “intellectual sophistication.”
Imitation of earlier important works within the specific discipline perhaps provides a more feasible explanation. In this vein, the mid-twentieth century rise in article title colons perhaps echoed what had previously appeared in book titles. The emerging popularity of the titular colon can be seen in treatment of Mary Shelley’s 1818 Frankenstein, which had as its original full title Frankenstein; or, the modern Prometheus, with semicolon and comma. Most later editions, however, have either dropped the subtitle, or converted it to Frankenstein: The modern Prometheus. By the 1950s, books were tending to use the colon rather than the old-fashioned semicolon to allow more descriptive details to attract readers. While this is true of nonfiction volumes especially, even today a post-colon “A Novel” appears on many works of fiction.
A possible criticism of this suggestion is that the cover of the book does not typically explicitly include the colon, instead using varying fonts to mark the divisions. Even without an obvious colon, though, its presence can be detected in the way such titles are read. The colon signals a longer pause than the semicolon, which is longer than the comma. For example, when one reads aloud the two versions of the Frankenstein title, most readers will read the original as a single breath with a slight pause, but the later versions will receive a fuller break (more than semicolon, less than a period). So even when a colon is not present on the page which has relied upon typographical variations, the reader typically treats it as present. Library catalogs, lacking the option of varying fonts, make literal within the MARC catalog record the colon that on book covers has been otherwise communicated. The same lack of font variation generally pertains to academic article subtitles, meaning the article title follows the template of the library catalog, or the explicit colon.
Unfortunately, no studies on the use of colons in book titles comparable to those done for articles was able to be identified. Nonetheless, if colons in book titles did become more established—either explicitly or implicitly—in leading nonfiction works appearing in the latter half of the twentieth century, a reasonable hypothesis is that this exerted a stylistic influence on the titles of academic articles that were often targeting the same educated audience. Further, if disciplinary differences existed in the rate at which influential books include colons, that might also explain what appears in research articles in the same field. Once the pattern was established, the expectation of colons in titles would generate their own inertial appearances.
Title length
Looking at the top 20,000 articles in SCOPUS for each of the years 2007–2013, Letchford et al. (2015) found “that papers with shorter titles do receive greater numbers of citations.” Similarly, over 32 million SCOPUS records from 1996 to September 2021 were used to train an AI program on fourteen variables, including length of title. Longer titles did not contribute positively to citations (Ha, 2022), that is, shorter titles can contribute to later citation.
These general conclusions, not broken out into the different disciplines, necessarily mask a lot of noise. “Short” and “long” is relative to the mean of each discipline, meaning that a short chemistry title (below the 19–20 word average) would still be very long when compared to a title in literature (9–10 words). Still, the broad within-discipline patterns are persuasive.
The collected data describes twenty-four disciplines, with all but nine drawing upon multiple investigations. Nine did not report how article length related to subsequent citations. Of the sixteen remaining, seven favored the conclusion that shorter titles receive more citations, while only two (Medicine and Astronomy) landed on the opposite. Seven, however, found there was no statistical relationship between article length and citations: Chemistry, Computer Science, Ecology, Marketing, Mathematics, and Scientometrics. So while a slim majority of disciplines do find some relationship, it is a more ambivalent outcome than was found for the presence of a titular colon.
Dillon had observed that the use of colons created longer titles, thereby implying that the “halo effect” of the colon would lead to an aesthetic preference for longer titles, arguably then resulting in more citations. The data in Table 2, however, has shown that on this point Dillon was mistaken: shorter, not longer, titles consistently get more citations, at least where any pattern has been discerned. As this prediction follows from his characterization of the use of colons, its failure in what he identified as the most immediate consequence of that description weakens the rationale he offered in its defense.
If not the fallout from the presence of colons, what might explain the marginally more likely expectation that shorter titles may lead to more citations? The literature offers two possibilities. First, while shorter titles may be thought more accessible and attractive, longer ones could “be seen as complex and/or boring” and thus not as often selected for closer reading (Rostami et al., 2014). From this view, articles with shorter titles are more likely to be read, which is the prerequisite for later citation.
An interesting and thoughtful literature has examined how the concept of “appeal” operates to stimulate interest that leads to consumption of books (Dali, 2014). This work presumably extends to what draws potential readers to engage with articles. The suggestion here is that these taste preferences, well studied in the context of books, can be discerned to exert influence from article title elements, especially length.
A different view suggests the muddled pattern of results is an artefact of inconsistent sampling. Considering the uneven empirical results on this variable relationship, Guo et al. (2018) helpfully suggest that there may be a temporal element involved. Earlier years relied on traditional means of research to find articles, such as printed indices, which presumably reported shorter titles that then received more cites. Later pieces (operationalized in the article as 2001–2012) favored longer titles because these benefited from more frequent discovery by electronic tools. The change in research method, in other words, altered the default for the more effective title length. Because most studies draw from both pools, these patterns have been obscured, weakening or washing out any patterns within each of the two classes. Further complicating the temporal influences, given that shorter titles were more common in earlier articles, they have had more time to collect citations on that account alone, regardless of title length.
While the first of these possibilities relies on the subjective appeal of titles, and would therefore be a challenge to operationalize, the second, temporal explanations, would involve possible influences which are more objectively controllable in future investigations. These should be explored before any final conclusion is offered on the influence of title length on citations.
Conclusion
Scholarly impact is an ongoing concern that falls within the expertise of many librarians, and their institutions often rely upon librarians to form the frontline of contact for whatever questions might arise. Librarians are a natural resource faculty consult to clarify the most confusing details of maximizing their scholarly impact.
While the broadly general trends are easy enough to communicate, the most useful advice will not describe the entirety of academia, but be tailored to the specific discipline of the patron. While the analyzed findings offered here are nonexhaustive, they capture the major patterns within most university departments. Librarians will prefer, in this case as in all others they encounter with patrons, to avoid claiming certitude about what actions the faculty member should take to improve scholarly impact, but they should not be shy to communicate the documented relationships. Any choices, of course, remain with the patron.
The tabulated research results will easily incorporate any additional local information arising from the specific disciplines represented by the library’s patrons. Beyond use as a compendium of specialized findings to use when advising faculty, however, the aggregated results permit a refinement of the foundational premises upon which the research project has been built. While Dillon’s initial intuition that colons correlate with well-received scholarship has been extensively supported, his explanation for that result—that colons signaled high scholarly standards—has not. Unsurprisingly then, the corollary of the argument—that titles with colons will be longer, and therefore those with longer titles will be better appreciated—has also not been substantiated. The significance of these results is not simply a corrective to Dillon, but reframes the relationship between impact and nonqualitative structural elements as not being wholly detached from the substance of the paper. If the impact of elements is not of obvious and manifest relevance to substance, the work should look to not simply describe the patterns but to map their latent significance.
Many academics may be disappointed to learn that their scholarly reputations do not appear entirely to depend on the quality or originality of the publications. Confounding influences such as the two explored here introduce from the faculty’s perspective apparently irrelevant subjective factors of which they may be unaware. The literature can provide guidance on these puzzles, with the librarian providing good counsel.
Footnotes
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Ethical considerations
Ethical approval was not required for this project.
