Abstract
Retractions of scientific articles are becoming the most relevant institution for making sense of scientific misconduct. An increasing number of retracted articles, mainly attributed to misconduct, is currently providing a new empirical basis for research about scientific misconduct. This article reviews the relevant research literature from an interdisciplinary context. Furthermore, the results from these studies are contextualized sociologically by asking how scientific misconduct is made visible through retractions. This study treats retractions as an emerging institution that renders scientific misconduct visible, thus, following up on the sociology of deviance and its focus on visibility. The article shows that retractions, by highlighting individual cases of misconduct and general policies for preventing misconduct while obscuring the actors and processes through which retractions are effected, produce highly fragmented patterns of visibility. These patterns resemble the bifurcation in current justice systems.
Introduction
Misconduct in science has been a phenomenon of fascination for scientists and the public alike; however, the small number of publicly known cases limited knowledge to case studies and anecdotal evidence. Over the past decade, retractions of scientific articles have developed into the main format through which scientific misconduct is made visible. Retractions in general can result from either misconduct or honest mistakes and are thus a plurivalent sign for flawed research. Still, because information on misconduct provided by other sources remains so scarce, retractions are becoming the most relevant institution for making sense of scientific misconduct. An increasing number of retracted articles, mainly attributed to misconduct, is providing a new empirical basis for research about scientific misconduct. As with most forms of misconduct, discussions about the issue are controversial. A significant number of studies on retracted articles have attempted to provide an empirical basis for assessing the issue. We review this literature and contextualize the results sociologically by asking how scientific misconduct is made visible through retractions.
Whether some form of action corresponds to or violates a social norm, visibility has proven to be a key concept. The sociology of deviance 1 and criminology have been remarkably attentive to questions of visibility. Andrea Brighenti has even suggested visibility as ‘a general category for the social sciences’ (Brighenti, 2007: 323). Emphasizing the importance of visibility in the work of classical sociologists, like Georg Simmel or Erving Goffman, Brighenti argues that basic social processes, like recognition and control, are two opposing outcomes of visibility. His analysis is rooted in the premise that, especially in modern western societies, seeing entails a deep epistemology that leads to ambivalent effects in that control and power may be related to visibility as well as to invisibility. Visibility is conceptualized as an embodied form of perceiving the world that is fundamental for normativity (Breyer, 2015; Küpers, 2014) and thus for social processes in general. The distinction between a dark and light field, the labelling approach, or Foucault’s reference to the arrangement of visibility in the Panopticon (Foucault, 1977) bear witness to this deep connection between social processes of control, phenomena of power and visibility. 2
We are treating retractions in this article as an emerging institution that renders scientific misconduct visible, thus following up on the sociology of deviance. According to the focus in the literature on retractions, we emphasize the actors and processes that produce retractions and will be less concerned with their effects. The general ambivalence of retractions, pointing both to misconduct and error, and striving to be both a corrective and a correctional institution, is a recurring theme in this literature. Since institutions that render something visible tend to be naturalized, i.e. the processes through which they produce (in)visibility are made invisible (Brighenti, 2007: 335), we concentrate on processes of naturalizing retractions. We show that retractions produce highly fragmented patterns of visibility by highlighting individual cases of misconduct and general policies for preventing misconduct while obscuring the actors and processes through which retractions are effected. These patterns resemble the bifurcation in current justice systems (Cavadino et al., 2013), thus raising the question whether retractions are an effective instrument in dealing with scientific misconduct.
Visibility
Distinguishing two forms of visibility is crucial: physical and social visibility (Breyer, 2015). Physical visibility refers to our capacity to see and recognize the world with our eyes. Beyond the purely physical and biological aspects of this capacity, seeing and being seen are embodied processes through which human beings relate to each other (Merleau-Ponty, 1968). Being aware that others see us when we see them lays the foundation for human action to become social action. 3 However, visibility extends beyond the physical and the embodied, as many social interactions are mediated. The media through which we recognize the actions of others are multiple and growing through technical innovation (Thompson, 2005). This allows for many forms of visibility, alleviating the necessity for temporal and spatial co-presence in face-to-face interaction. Foucault’s reference to Bentham’s prison architecture is one example of such a mediated form of visibility that rearranges the spatial setting to allow for one-way visibility and thus changing power relations dramatically. In the case of retractions, visibility of scientific misconduct is mainly established through the medium of written text and thus even less constricted by temporal and spatial necessities of co-presence.
Extending the concept of visibility runs the risk of overextending it. Not everything we experience through vision accompanied by our other senses is socially relevant. Küpers (2014) even suggests that visibility can refer to the ways we ‘see’ the world through institutions and organizations. Thus, not everything that is physically visible is also socially visible. 4 Social visibility, then, refers to processes that make objects visible by rendering them relevant for social action. Already Edwin Lemert (1967), one of the founders of Labelling Theory, stressed that social visibility is the key determinant for persons and acts to be labelled as deviant (Inciardi, 1972: 222). To establish visibility as a more general category for the social sciences, Brighenti suggests three types of visibility: the social-type, the media-type and the control-type. 5 ‘The social-type is a fundamentally enabling resource, linked to recognition’ (Brighenti, 2007: 339) and based on face-to-face interaction. ‘The media-type … tends to work according to a flash-halo mechanism, whereby subjects are isolated from their original context and projected into a different one endowed with its own logic and rules. Finally, the control-type transforms visibility into a strategic resource for regulation (as in Foucault’s surveillance model)’ (Brighenti, 2007: 339).
For reviewing the literature on retractions, media-type and control-type visibility will be most relevant as most of the studies are dealing with the processes through which misconduct is made visible in the medium of journal publications, i.e. retractions (media-type), and how retractions are and should be used to control research and publishing practices that are seen as problematic (control-type). Prominent cases, e.g. Diederik Stapel, illustrate that face-to-face interaction between accuser, accused and representatives of organizations that handle scientific misconduct can be key elements in making misconduct visible. However, little is known about these processes related to social-type visibility as they are not covered by the existing literature. What the literature covers extensively is the quality and quantity of retractions (media-type visibility) as well as the practices and policies that journals have established to handle problematic publications (control-type visibility). Furthermore, guidelines and policies by universities, funding bodies, or organizations like the Office of Research Integrity (ORI) in the USA are analysed, also covering some aspects of how these organizations handle allegations of misconduct. We conclude this article with a review of the common themes of the literature on retractions and identification of results, assumptions and omissions that require further research, as well as a discussion of the results by stressing visibility as a general category for the social sciences.
Handling of retractions and misconduct in publishing
Correcting the scientific literature
Retractions present the most visible sign of questionable research that might constitute misconduct. It is widely assumed that scientific journals retract individual articles to remove errors from the scientific literature to prevent further use of problematic knowledge. Retractions usually consist of two separately published items: first, the original article which is deemed problematic and then retracted, second, the notice announcing that the article was retracted. Besides retractions, journals also publish corrections or errata, supposedly to correct minor errors. Retractions have become more frequent, generating interest mostly from researchers concerned with scientific misconduct, but, to date, almost none from sociologists of science. Nonetheless, the available literature must be of interest for a sociological perspective on retractions and the scientific publishing system in general. Topics most frequently dealt with in the literature are the incidence and rate of retractions, the properties and content of retraction notices and the consequences of retractions.
Prevalence and correlates of retractions
The prevalence of retractions has been studied frequently (see Table 1). Retractions are rare compared to other publication types, with a share among all articles in the biomedical database PubMed no higher than 0.02% (Amos, 2014; Wager and Williams, 2011) or considerably lower (Cokol et al., 2007, 2008; Redman et al., 2008). A comparison of numbers or rates of retractions is cumbersome since existing studies employ different search strategies and sample limitations. Time lags contribute to these differences between studies as retractions take up to 35 months to be updated (Decullier et al., 2014), so that the complete number of retractions for a given year might not be accessible until three years later. Comparing the numbers of identified retractions in different studies still reveals remarkable differences not easily accounted for. Additionally, many studies fail to specify exact time frames, search terms (i.e. retraction or retraction notice) or further limitations (i.e. language), or all of the above. Given that these studies address the same phenomenon, it seems especially unfortunate that findings are hardly ever related to other existing studies.
Overview of search strategies and results.
It is unanimously acknowledged that retraction rates have been rising steadily since the 1970s with further acceleration after 2000 (Cokol et al., 2008; Fanelli, 2013; Fang et al., 2012; Gasparyan et al., 2014; Grieneisen and Zhang, 2012; He, 2013; Redman et al., 2008; Steen, 2011a; Wager and Williams, 2011). The increasing rate of retractions is equally attributed to misconduct and error (Steen, 2011c; Steen et al., 2013), however, Stretton et al. (2012) report that retractions for plagiarism have risen, while retractions for other forms of misconduct have remained stable since 1990. Fanelli (2013) ascribes this increase in retractions to more journals issuing retractions, which he takes to be a sign of greater scrutiny and not an increase in the underlying incidence of misconduct.
Only a minority of studies use the Web of Science (Bilbrey et al., 2014; Fanelli, 2013; He, 2013; Trikalinos et al., 2008; Van Leeuwen and Luwel, 2014) that covers disciplines besides Biomedicine. Grieneisen and Zhang (2012; see also Zhang and Grieneisen, 2013), in the most comprehensive study to date, include as many as 42 different databases and publisher websites. One study concentrates on the field of Management and Economics and searches four different economics databases, yielding only 37 retracted articles (Karabag and Berggren, 2012). Thus, so far, retractions are most common (and most researched) in Biomedicine, perhaps mirroring that in this research area, oversight is greatest because of a concern for patient safety and the possibility of bodily harm caused by flawed research (Zuckerman, 1977).
Articles take about two years to be retracted, with a mean time between 21 (Redman et al., 2008) and 28 months (Budd et al., 1998; see also Furman et al., 2012; Trikalinos et al., 2008). The time to retraction varies with different factors: it is longer for papers retracted for fraud (Fang et al., 2012; Steen, 2011b), possibly reflecting the more complicated investigations, and shorter for journals with higher impact factors (Fang et al., 2012). It also varies with discipline, with retractions in the drug literature and radiology taking slightly longer (Rosenkrantz, 2016; Samp et al., 2012). Furthermore, papers take especially long to be retracted if the last author was deemed responsible for the paper’s flaws (Trikalinos et al., 2008). The authors take this as a sign that more senior researchers are able to put up more resistance against a retraction. Steen (2011c) finds that the time to retraction is increasing over the years, while the majority of studies report that it has decreased (Foo, 2011; Furman et al., 2012; Steen et al., 2013).
A number of studies address correlations between country (usually of the corresponding author’s institution) and risk for retraction, discussing whether lower-income countries experience more retractions (Amos, 2014; He, 2013; Stretton et al., 2012). Comparability between these studies is limited as both absolute numbers and shares relative to a country’s output are used. The USA ranks first as country most prone to retractions when absolute numbers are used (Amos, 2014; Casadevall et al., 2014; Grieneisen and Zhang, 2012; He, 2013; Van Leeuwen and Luwel, 2014; Trikalinos et al., 2008; Zhang and Grieneisen, 2013) and emerging science nations when normalized numbers are used (Grieneisen and Zhang, 2012; He, 2013; Van Leeuwen and Luwel, 2014). Stretton et al. (2012) report that both lower-income countries and non-English speaking countries have a higher risk of retractions because of plagiarism than other countries, but this study does not compare overall retraction rates across countries (for a similar result see also Almeida et al., 2015). The general hypothesis that national contexts might influence both the incidence of scientific misconduct and its detection seems plausible. However, most of these studies fail to provide any theoretical justification which factors should be considered influential and why, and exhibit a preoccupation with research from developing countries, without justifying such a focus. A different hypothesis is put forward by Fanelli et al. (2015), who find evidence that retractions are less likely if the country has a national policy addressing scientific misconduct and a higher education system following an Anglo-American model. These categories remain poorly defined, thus not accurately capturing the diversity of both the existing national frameworks for handling misconduct and the different systems of higher education.
The influence of disciplines on retractions seems clearer: Medicine, Chemistry, Life Sciences and Multidisciplinary Studies are continually among the fields most prone to retractions (Grieneisen and Zhang, 2012; He, 2013; Van Leeuwen and Luwel, 2014; Zhang and Grieneisen, 2013). Further studies find no influence of the (sub)field on retraction rates (Furman et al., 2012; Steen and Hamer, 2014; Trikalinos et al., 2008).
Both articles in higher impact factor journals (Fang and Casadevall, 2011; Fang et al., 2012; Gasparyan et al., 2014) and highly cited articles are retracted more often (Furman et al., 2012), which could either mean that high visibility increases the risk of retraction, or that researchers committing misconduct will more often target high impact journals and claim spectacular results that will earn them a lot of citations (see also Abritis, 2015: 63) In contrast, He (2013) only identifies a correlation between journal impact factor and number, but not rate, of retractions, arguing that high impact journals publish more articles, but are not especially prone to retractions.
A number of studies note the great influence of ‘repeat offenders’ on the number of retractions: 13 individuals are found to account for 54% of the misconduct retractions (Grieneisen and Zhang, 2012); while a different study estimates that 10.6% of the responsible authors account for almost 80% of all misconduct retractions (Mongeon and Larivière, 2013). Steen et al. (2013) conclude that the influence of repeat offenders is declining over time (see also Steen, 2011b; Zhang and Grieneisen, 2013).
Stern et al. (2014) estimate the amount of NIH funding involved in publications that are retracted subsequently and find that between 1992 and 2012, no more than 0.02% of the NIH budget went to studies that were later retracted.
The existing investigations of correlates of retractions rely heavily on available meta-data in databases. There is little effort to add more data sources that could supplement information on factors such as intensity of oversight or the procedures of journals or universities, which could shed light on the patterns of social control. In general, the literature is limited to descriptive findings and produces little explanation. When explanations are presented, they are more commonsensical than theoretically and empirically supported. There remains, thus, more theoretical and explanatory work to be done in this area.
Retraction notices
Whether misconduct or error is more prevalent as cause for retractions seems unclear. Studies identifying error as the main cause (Bilbrey et al., 2014; Budd et al., 1998; Madlock-Brown and Eichmann, 2014; Nath et al., 2006; Steen, 2011b; Wager and Williams, 2011) stand opposed to studies finding misconduct more prevalent (Fang et al., 2012; Grieneisen and Zhang, 2012; Samp et al., 2012). Besides drawing on different samples, these studies employ different definitions of misconduct. Some define plagiarism and duplicate publication as error (Bilbrey et al., 2014; Madlock-Brown and Eichmann, 2014; Redman et al., 2008; Steen, 2011b, 2011c), others define it as misconduct (Casadevall et al., 2014; Fang et al., 2012; Mongeon and Larivière, 2013; Nath et al., 2006; Stretton et al., 2012), and one study classifies plagiarism as misconduct and duplication as error (Samp et al., 2012). However, this classification difference does not uniformly account for different estimates of error and misconduct. A number of studies also distinguish plagiarism and duplication as separate categories besides misconduct and error (Decullier et al., 2013; Grieneisen and Zhang, 2012; Van Leeuwen and Luwel, 2014; Wager and Williams, 2011). Additionally, some studies refrain from identifying misconduct altogether (Amos, 2014; Rosenkrantz, 2016), classify it independently from reasons for retraction (Azoulay et al., 2012), or fail to specify whether plagiarism is counted as misconduct or error (Budd et al., 1998). Idiosyncratic terms, such as ‘questionable interpretations’ (Grieneisen and Zhang, 2012) or ‘process’ (Madlock-Brown and Eichmann, 2014), render interpretation even more difficult.
The information given in the retraction notices is ambiguous, resulting in disagreement about the proportion of retraction notices that provide too little information to identify the reason for the retraction: ranging from 4.9% (Amos, 2014) to 22% (Bilbrey et al., 2014). This general problem is seldom acknowledged (for an exception see Rosenkrantz, 2016; Wager and Williams, 2011) and only Bilbrey et al. (2014) address it directly by rating retraction notices according to whether they state the reason for the retraction clearly, finding that journals are mostly inconsistent in the information they provide. Some studies also draw on additional information from the ORI as well as the science blog Retraction Watch (Fang et al., 2012; Grieneisen and Zhang, 2012; Zhang and Grieneisen, 2013), also indicating that the information in the notices has limited trustworthiness. Starting out by comparing misconduct findings from the ORI and retractions, Abritis (2015) also highlights the problem of treating retractions as a reliable indicator for misconduct, arguing that retractions ‘do not adequately convey the incidence of misconduct occurring in hard science research’ (Abritis, 2015: 55). Disagreement between studies thus highlights the secretive and opaque nature of retraction notices, which makes them ambiguous and open to interpretation by readers and researchers of scientific misconduct alike.
The retraction notices usually state who retracted an article, however this might not accurately reflect the decision process and those involved. In general, most papers are retracted by the authors (Bilbrey et al., 2014; Grieneisen and Zhang, 2012; Redman et al., 2008; Wager and Williams, 2011), and only a minority are retracted by editors, publishers or others. While almost all retractions pertaining to errors identify authors as responsible for the retraction (Nath et al., 2006; Samp et al., 2012), retractions due to misconduct are mostly issued by the editor or the publisher (Madlock-Brown and Eichmann, 2014; Nath et al., 2006; Samp et al., 2012). Authors may be more likely to contact the journal when finding problems that stem from honest error (see also Lu et al., 2013). Retraction notices written by journals contain more information than those written by the authors (Bilbrey et al., 2014). Authors might hence either refer to error as the cause of retraction or provide as little detail as possible as a defence strategy. Most likely, authors take an active part in the way misconduct is publicized and in how their own behaviours are labelled.
A smaller number of studies systematically compare retraction notices to information obtained from other sources. Neale et al. (2007) find that journals react in numerous ways to misconduct findings by the ORI, including retractions, errata or commentaries. A small number of publications are not retracted despite violating scientific integrity, and only a minority of the retractions or corrections explicitly mention ethics in their notices (Resnik, 2012). A study of Robert Slutsky’s case reveals that after notification by the university’s investigative body only 9 of the fraudulent 12 and 37 of the 48 questionable publications were retracted (Friedman, 1990). In a similar study of Joachim Boldt’s case, Elia et al. (2014) find that 10% of the incriminated articles were not retracted and that the retractions do not exhibit a standard format.
The format of retractions as far as typography, placement and title phrases are concerned also shows variation: the majority of retracted articles contain headings or watermarks across the pages of the pdf-files, however up to almost a third are not marked and a small fraction are deleted (Decullier et al., 2013; see also Rosenkrantz, 2016; Steen, 2011c). Snodgrass and Pfeifer (1992) identify 14 different headings journals use for retraction notices, and most retractions are not placed prominently in the respective issue (see also Friedman, 1990). Moreover, some journals do not include retraction notices in their table of contents, which hinders indexing in Medline, the database behind PubMed (Yank and Barnes, 2003). Yet, a more current case study suggests that most of the retracted articles are indexed in Medline and all of them are correctly marked as retracted (Wright and McDaid, 2011). Other databases such as Cochrane Central Register of Controlled Trials and Embase yield more uneven results. This problem also applies to the Web of Science (Van Leeuwen and Luwel, 2014). Moreover, because of the many places a single article might be found on the internet, 20% of retracted publications remain unmarked on non-publisher websites and 80% remain unmarked in personal libraries, i.e. Mendeley (Davis, 2012; see also Rosenkrantz, 2016).
The existing literature demonstrates that journals take a variety of visible actions when confronted with problematic articles. Even if the idea of retraction might imply a uniform mark across articles, the actual notification comes in many forms and uses different terms. Retractions, in general, do not use a consistent format, nor do they uniformly or non-ambiguously distinguish between misconduct and error – thus, suggesting a compromise between publicly labelling instances of misconduct, on the one hand, and disguising them, on the other.
Consequences of retractions
The visibility of retractions in the scientific community is investigated by measuring how retractions affect subsequent citations. Whitely et al. (1994) compare the citations to articles by Robert Slutsky to a control group and find a significant decline in citations after news media covered the story and the investigative committee released its findings, but that subsequent retractions did not have an additional impact on the citation rate. However, this case might be exceptional, as the retractions only followed after the case had already become widely publicized. Several other case studies demonstrate that citation rates decrease after retraction, however, retracted articles continue to be cited (Bornemann-Cimenti et al., 2015; Garfield and Welljams-Dorof, 1990; Korpela, 2010). One study notes that citations come increasingly from other research areas than the original articles, indicating that the visibility of retractions is highest within the immediate field (Bornemann-Cimenti et al., 2015). Comparing retracted papers and a control group, a significant loss of citations can be observed following a retraction, ranging from 65% (Furman et al., 2012) to over 80% (Lu et al., 2013). Citations to retracted articles come from journals distributed over the entire range of impact factors (Pfeifer and Snodgrass, 1990). Steen (2011b) also finds that studies retracted for fraud and for error do not differ significantly in the number of post-retraction citations, which contrasts to the findings of Redman et al. (2008), who find that only retractions for misconduct cause a decrease in citations.
Further studies focus on the content of citing papers. Citations to the articles of John Darsee mostly endorse the papers’ content (Kochan and Budd, 1992), unfortunately, the authors do not distinguish retracted from unretracted articles. The same holds true for a larger sample of retracted articles by different authors that also receive predominantly positive citations (Budd et al., 1999). Almost 39% of citations to articles retracted after ORI findings of misconduct explicitly mention the retracted publication (Neale et al., 2010). Generally, only a small fraction (no more than 7%) of citing publications acknowledge the retraction (Budd et al., 1999; Neale et al., 2010; see also Pfeifer and Snodgrass, 1990). In the case of Scott Reuben this fraction is substantially higher: 25% (Bornemann-Cimenti et al., 2015).
Previous articles by incriminated authors also suffer decreases in citation rates. These are more substantial for unknown authors compared to eminent ones, especially if the retracted paper was co-authored by an unknown and an eminent co-author (Jin et al., 2013). Losses are also worse if the retraction was not induced by the authors (Lu et al., 2013). The latter result highlights that there is not only a general effect of the retraction, but specific wordings such as who is designated as issuing the retractions also make a difference. That notwithstanding, innocent co-authors also suffer losses in citations (Mongeon and Larivière, 2013) and a decline in subsequent productivity (Mongeon and Larivière, 2016). Azoulay et al. (2015) consider the effect of retractions on entire scientific fields: they show that publication activity and citations decrease in the affected fields after a retraction, especially if the retraction mentions misconduct. Moreover, authors are unlikely to reappear in a related field, which holds especially true for scientists at the beginning of their career (Azoulay et al., 2012; see also Mongeon and Larivière, 2016).
These negative consequences indicate that although retractions are not formally designated to punish authors (Kleinert, 2009), they exhibit at least some stigmatizing effects, which are well-documented in the literature. Interestingly, the correction of the scientific literature, which the Committee of Publication Ethics (COPE) deems the primary function of retractions, is hardly empirically addressed in the research about retractions: to our knowledge, there are no studies investigating whether and how the information provided in the retraction notices relates to the scientific content of the respective article, or how it in turn influences subsequent research. Analyses like these would require intensive qualitative investigations as well as an intimate understanding of the scientific claims of the original paper and its retraction, which may contribute to the lack of such studies.
Errata, corrections and expressions of concern
There are other formats available to journals for correcting the scientific literature, i.e. errata, corrections or expressions of concern. Compared to the large number of studies on retractions, these formats receive little attention. Even though corrections are formally reserved for minor errors, comparing ORI cases and subsequent actions by journals (Neale et al., 2007) illustrates that misconduct might be followed by corrections instead of retractions. In some cases a retraction might be formally warranted, but a negotiation process results in a correction instead.
Focusing on corrections in Physics, Poworoznek (2003) finds that corrections are difficult to locate online and linking is inconsistent. Thomsen and Resnik (1995) examine citations to 17 corrected articles in two Physics journals finding that 37–40% of the citing articles draw upon the incorrect article and 2–9% even use it as a starting point. Only about a third of all citations mention the correction. For corrections in Biomedicine, Molckovsky et al. (2011) find 4% of the articles in high impact Oncology journals are corrected. Of these corrections, the 14% addressing serious errors take significantly longer to be corrected than minor errors (8 vs 3 months). Furthermore, corrections significantly reduce citations to affected articles. These results are mostly in line with Royle and Waugh’s findings (2004), who, in contrast, report only 1.2% of articles in high impact biomedical journals as corrected. Fanelli et al. (2015) find that authors with higher numbers of citations as well as more published papers have an increased likelihood of correction, suggesting that more experienced authors might be more willing to correct their papers. Overall, in contrast to retractions, the number of corrections has been stable for the last 30 years (Casadevall et al., 2014).
A single study addresses expressions of concern, a relatively new format (Noonan and Parrish, 2008), reporting 16 expressions of concern published until 2008, of which 4 led to subsequent retractions.
Policies and guidelines by journals
Journals contribute essentially to the process of issuing retractions and thus to the production of media-type visibility. The literature addresses the role of (biomedical) journals mainly through the institutionalization of and the compliance with policies. Whether journals have policies addressing misconduct or retractions is determined by direct requests to journals (Resnik et al., 2009, 2010, 2015), by surveying editors (Enders and Hoover, 2004), or by searching online especially within instructions for authors (Atlas, 2004; Bosch et al., 2012; Redman and Merz, 2006). These studies are based on the following samples. Atlas (2004): 122 high impact biomedical journals from Thomson Reuters’ Journal Citation Reports; Redman and Merz (2006): 50 journals with top journal impact from JCR biomedical categories; Bosch et al. (2012): 339 journals representing the 15 top impact journals from 27 categories from Biomedicine in JCR; Enders and Hoover (2004): 470 surveys to Economics journals with a response rate of 28%; Resnik et al. (2009, 2010): 400 random journals from JCR from science and social science, respectively, with response rates of 49% and 38%; Resnik et al. (2015): 200 science journals with top impact factor from JCR with a response rate of 74%.
Atlas (2004) reports that 21% of responding journals either have a specific policy for retractions or at least generally follow the ‘Uniform Requirements’ for biomedical journals. Formal policies for plagiarism are reported by 19% of responding editors (Enders and Hoover, 2004). A written policy for misconduct was found at 14% of journals by Redman and Merz (2006); however, when asking the journals directly, 55% report having a policy albeit only 48% in written form (Resnik et al., 2009). In their follow-up study focusing on social sciences, Resnik et al. (2010) found that 33% of journals in the social sciences have a formal misconduct policy. According to Bosch et al. (2012), 60% of journals have misconduct policies or endorse definitions or guidelines by associations. However, only 35% of the journals provide explicit definitions of misconduct and 45% explain procedures for dealing with misconduct. In their newest study, Resnik et al. (2015) report 65% of journals having a misconduct policy.
The implementation of retraction/misconduct policies remains far from complete, with the prevalence of policies being correlated with a journal’s impact factor (Resnik et al., 2010). Also, the existence of policies itself does not necessarily include exact definitions of misconduct and procedures for dealing with misconduct (Bosch et al., 2012; Resnik et al., 2009). In general, even though the available studies are similar in design and objectives, small sample sizes and differing sampling criteria hinder comparability.
A handful of studies review the content of policies and their development over time. Claxton (2005) provides a summarizing overview on a variety of authorship guidelines. The development of COPE guidelines for retractions (Kleinert, 2009) and reviewers (Hames, 2013) has also been studied in light of growing retraction numbers, low levels of standardization and visible cases of misconduct. Similarly, the development of authorship definitions by the International Committee of Medical Journal Editors has been studied by Jones (2003), while Parrish (1999) documents the heterogeneity of interests and legal positions of funding bodies, federal agencies and journals in misconduct investigations by the ORI in the early 1990s.
Editors and authors differ in their views on duplicate publications, especially in two aspects (Yank and Barnes, 2003): authors are more likely to estimate certain types of redundancy to be legitimate (e.g. second article in non-peer reviewed source; research article and letter to the editor) and to disapprove of harsh sanctions for duplicate publication like publication bans or notification of institutions and indexing. However, in general, editors’ level of concern is low, as reported by Wager et al. (2009), surveying all editors from one major publisher about their handling of misconduct. Redundant or salami publications were assessed as the most significant problem, followed by undisclosed conflicts of interest by authors, and plagiarism. The majority of editors reported misconduct to be rare or non-existent in their journal. Awareness of guidelines is generally low. Wager (2011) sees a role-conflict for editors as they are not equipped to investigate misconduct allegations by themselves, while, at the same time, being obligated to make all reasonable efforts to ensure proper investigations. Nonetheless, editors seem confident in their decision-making processes (Wager et al., 2009; Williams and Wager, 2013), even though admitting that every retraction is different and that clear-cut procedures are mainly absent.
There is some indication that compliance with policies depends on the visibility of the problematization of articles, as journals and institutions react to public discussion. Brookes (2014), the anonymous proprietor of a blog, documenting data integrity problems in published journal articles for six months in 2012, compared 274 papers with data problems documented in the blog with 233 similar papers that remained unpublished on his blog after its closure. Publicly discussed papers exhibited a six-fold higher rate of retractions and an eight-fold higher rate of corrections, and the retracting/correcting action in the public set is more clustered around laboratory groups than in the private set.
In contrast to most studies on retraction policies, Frow (2012) analyses the context and the effect of policies within research fields and argues against seeing policy guidelines as simply practical interventions and much more as reaching deep into the methods and understandings of visual representation in scientific practice.
Handling of misconduct and retractions in organizations
Besides journals, a variety of organizations, i.e. universities, research institutions, or national agencies like the ORI, handle allegations of scientific misconduct and investigate cases, thereby effecting retractions and influencing the visibility of scientific misconduct. A number of publications, more or less based on anecdotal evidence, refer to country-specific organizational processes for dealing with misconduct: the USA (Price, 2013; Steneck, 1994, 1999); Japan (Normile, 2007; Slingsby et al., 2006); Canada (Lytton, 1996); Brazil (Lins and Carvalho, 2014); South Africa (Rossouw et al., 2014); China (Jordan and Gray, 2013; Ren, 2012; Zeng and Resnik, 2010); Nigeria (Adeleye and Adebamowo, 2012); Spain (Puigdomènech, 2014); the UK (Chantler and Chantler, 1998; Khajuria and Agha, 2014); Korea (Kim and Park, 2013); Scandinavia (Nylenna et al., 1999); Germany (Deutsch, 2006; DFG, 2005; Schiffers, 2012; von Bargen, 2013).
Policies by research organizations
The first systematic study of policies for American universities focused on the prevalence of written research misconduct policies, the structure of investigation committees and difficulties in revising policies and procedures (Greene et al., 1985). Of the 423 responding institutions, 116 reported having written policies while 124 had neither a policy nor plans to develop one. One of the major controversial concerns reported in this study pertained to the definition of misconduct and especially the problem of distinguishing fraud from negligence. Behaviours like fabrication, falsification and plagiarism were easily identified as fraud, whereas distinctions between sloppy science and data manipulation appeared more difficult. Additionally, the policies themselves were called into question: while some respondents argued for specific rules, others reported that each case should be dealt with on its own merits.
Schoenherr and William-Jones (2011) provide a broad analysis of policies and procedures from the 47 highest ranking Canadian universities. They report that 87.2% of universities have unique research integrity policies. Nearly all policies define misconduct as fabrication, falsification and plagiarism, and also include further forms of misconduct. ‘[T]his diversity in how institutions define “misconduct” is arguably linked to the current vagueness in the broader academic community about what constitutes research integrity and good conduct’ (Schoenherr and William-Jones, 2011: 10; see also Faria, 2015).
Lind (2005) examines the accessibility as ‘the minimum number of clicks required to get from the University website’s homepage to the research integrity policy’ (Lind, 2005: 248) for the top 25 American universities. With 4.6 clicks on average, the policies are not particularly accessible. Regarding the amount of information provided, some policies contain information on a vast number of topics, other policies cover only a few aspects. Inquiry and investigation processes and whistleblower concerns are most thoroughly addressed, whereas appeal processes, mentoring and pursuing allegations are mentioned the least.
Handling of scientific misconduct in research organizations
Publications about policies refer to the institution’s formalized expectations regarding scientific misconduct, but contain little information about the actual implementation of policies and how organizations react to allegations of misconduct. As Mazur (1989) points out, observing how universities handle cases of misconduct is difficult because of the low accessibility of documents related to investigations. Accordingly, literature on this topic remains sparse and mainly discusses individual cases of misconduct (Alfredo and Hart, 2011; Epstein, 2010; Mazur, 1989; Rasmussen, 2014; Stroebe et al., 2012; White, 2005). In general, investigations are characterized by a lack of routine and a resulting high variability of processes and outcomes (Breen, 2003; Keranen, 2006; Rhoades, 2000). How cases of scientific misconduct are detected and investigated at the university level thus remains largely unexplored.
Office of Research Integrity
The Office of Research Integrity (ORI) was one of the first institutions explicitly dealing with scientific misconduct. Rhoades (2000) offers an overview of the experiences of the first 10 years, reporting that about half of the investigations resulted in findings of misconduct. Reynolds (2004) investigates the publicly available 249 ORI case summaries from 1992 to 2002. Debarment from funds as one of the most severe actions was applied in 85 cases, 6 cases occurred in clinical trials. Moreover, in clinical trials, but not in other cases, junior employees were more frequently sanctioned than senior researchers. Parrish (2004) analyses misconduct cases involving graduate students within the field of medicine using 26 closed ORI cases and 29 closed cases by the National Science Foundation’s Office of the Inspector General, which mostly involved accusations of falsification and fabrication. While in most of the ORI cases the accused was deemed guilty, the NSF made findings of misconduct in only a minority of cases. The sanction administered most often was dismissal from the institution.
Both Pascal (1999) and Rennie (1998) briefly describe the set-up and history of the ORI and LaFollette (1994) discusses the work of congressional oversight. In addition, several studies make use of ORI case files (Davis et al., 2007; Wright et al., 2008) to investigate causes of misconduct, but do not specifically examine procedures or outcomes.
Universities and research institutions
With respect to universities, preliminary investigations of scientific misconduct are carried out by Research Integrity Officers (RIOs). According to Bonito et al. (2011), there is little knowledge about the scope of their responsibilities and their training background. The authors interviewed RIOs about how they would handle allegations of misconduct on the basis of three hypothetical scenarios, comparing their answers to reactions deemed appropriate by selected experts. Only 53.2% of the respondents identified at least between one and five out of 26 recommended actions. About one-third of the respondents had never handled allegations of misconduct and slightly more than half of the respondents had never conducted an investigation. The authors conclude that RIOs are not particularly well-prepared to handle allegations of misconduct.
Pryor et al. (2007) focus on research coordinators’ experience with scientific misconduct in the US and on typical actions they would take. Slightly less than one-fifth of the respondents reported first-hand knowledge of misconduct occurring within the previous year. As typical action, 37.3% of the respondents identified expressing disapproval but not reporting. Reporting to administrative officials correlated with the rating of organizational effectiveness as high. Habermann et al. (2010) investigated research coordinators who reported having encountered scientific misconduct in the past year. Most commonly, respondents reported being first-hand witness of the event and 70% subsequently reported the incident. As reasons for non-reporting, respondents mentioned other authorities being already involved, a lack of risk to patients or harm, and that reporting depended on the degree of misconduct. Reporting resulted in a variety of outcomes, including resignation of the responsible party, but about 6% of the respondents also reported resignation or dismissal of the research coordinators themselves.
Braxton (1991) investigates whether the quality of the graduate school department influences the formality of action taken for violations of scientific norms, focusing on chairpersons of university departments of four different disciplines (Chemistry, Physics, Psychology, Sociology). The quality of a chair’s graduate school department was found to negatively influence the formality of sanctioning action only for violations of the Mertonian norm of organized scepticism. However, the authors indicate that ‘conclusions derived from it are tentative and await replication’ (Braxton, 1991: 100). A more recent study on European universities also highlights the importance of informal reactions as compared to formal sanctioning mechanisms (Faria, 2015).
DuBois et al. (2013) asked RIOs and members of the Institutional Review Board of medical schools about their experience with cases of scientific misconduct and their satisfaction with institutional responses. Respondents reported a modal number of 3–5 cases of experienced misconduct. The most common institutional responses to scientific misconduct were letters of reprimand, increased oversight and internal education. Most of the respondents reported to be satisfied with institutional responses, while nearly one-fifth were somewhat unsatisfied or very unsatisfied.
Wright and Schneider (2010) investigate the role of RIOs and of legal counsel supporting them. RIOs have been at their institution for an average of 5.2 years, encountering 8 cases of possible misconduct on average (range: 0–50) resulting in one to two findings of misconduct. Only 10% reported having an independent budget. The authors assess that the RIOs’ functions seem to be variously ‘as prosecutor, judge, mediator, counselor, teacher and regulatory manager’ (Wright and Schneider, 2010: 101), and accordingly, almost all RIOs also reported participation in drafting or revising the institution’s policies and procedures. Wilson et al. (2007) examine the role of research records in investigation procedures by interviewing RIOs at major universities: 23% of respondents had no experience with misconduct, 17% mentioned 10 and more cases, with the rest ranging in between. Problems with research records were mentioned in 38% of the reported investigations. A more anecdotal paper discusses cases of misconduct pertaining to image manipulation (Parrish, 2009). Mello and Brennan (2003) review institutional policies for dealing with scientific misconduct and argue that in most cases they do not appropriately guarantee due process rights for the accused researchers.
Discussion
Despite the wide range of questions and methods, studies on retractions share a number of topics, most fundamentally the contested nature of the definition of scientific misconduct. In the absence of a stable and universally supported definition of what constitutes scientific misconduct, researchers, journals, universities and others employ a variety of definitions, more or less formalized. These cover a wide range of topics, offering varying clarity, degrees of detail and amounts of information. Many studies furthermore demonstrate that this diversity of definitions is acknowledged by the actors themselves, who frequently allude to grey areas and questionable practices that are difficult to define as either misconduct or accepted practice. Moreover, this absence of an agreed upon definition is also characteristic for research on scientific misconduct itself. Especially studies analysing retraction notices exhibit a colourful variety of classification schemes for misconduct, demonstrating dissent about where to draw a line between misconduct and error and how to label specific forms of misconduct.
Ambiguity and equivocality also abound when looking at reactions to misconduct cases. Journal editors and university officials describe their cases as mostly handled on a case-by-case basis, even though more general policies exist. Cases are experienced as rare and peculiar events that share only minimal commonalities and hardly allow for institutionalization. Consequently, studies identify a variety of actions taken and possible outcomes of misconduct cases at journals and research institutions. Even seemingly clear-cut measures like retractions in fact can come in diverse forms and appear to be administered arbitrarily. As with the definition of misconduct, existing policies only inconsistently address appropriate measures. Studies of journal policies reveal that only a minority mention measures like retractions and withdrawals, and the question of when to retract and when to correct an article remains unclear. Interestingly, in addition to being ambiguous or incomplete, policies are not particularly well-known. Both students and journal editors frequently express not knowing about policies addressing misconduct at their respective institutions, which might both contribute to and result from the perception that misconduct cases are singular events.
Besides these commonalities, a number of questions can be identified that pertain either to the scientific publishing system or to reactions by other scientific institutions. There is a large body of research investigating the consequences of retractions, both in terms of loss of citations and professional and personal consequences for the incriminated researchers. In contrast, the aftermath of sanctions like reprimands, debarment from funds or having one’s name publicly exposed by the ORI is rarely examined. Presumably, this demonstrates that institutional measures are more often seen as full sanctions that are undeniably negative, regardless of their further effects, while retractions are much more ambiguous and only considered to be negative if they further result in serious repercussions like declining citations and publication activity. While sanctions administered by institutions appear to be unequivocal, substantial uncertainty about what a retraction actually means persists. As a sign of scientific misconduct, retractions hence are equivocal not only because they may point to error instead of misconduct, do not clearly identify the problematic behaviour and are employed inconsistently, but also because there seems to be uncertainty whether they are actually negative in a normative sense.
When investigating misconduct procedures at journals, there is a strong emphasis on whether the journals actually have policies and how they make use of these policies, rather than on the specific characteristics and outcomes of their procedures. Literature addressing investigations at institutions, on the contrary, has a different focus. Studies addressing universities cover questions that range from characteristics and outcomes of cases to how procedures are organized and formalized. Studies examining the ORI, however, focus on outcomes, with no studies addressing procedural details or factors influencing outcomes of the investigations. This is to assume that ORI procedures are not only well-defined but that the formal principles are also perfectly implemented, with little variation in how cases are actually handled. Processes at journals, in contrast, are thought of as inherently messy and disorganized, with the level of messiness and disorganization posing the most important research question. Because the processes seem mostly erratic and not easily accessible, their results do not attract a lot of research interest, as they are assumed to occur mostly at random and with no discernible pattern or regularity. Universities and research institutions take on an intermediary position, with enough variation within the procedures to make these inconsistencies a worthwhile research question, but also with enough stability to make investigating outcomes meaningful. This is to illustrate that the ORI enjoys a high level of trust within the community of researchers examining scientific misconduct, whereas the trust in universities and journals seems considerably lower.
The visibility of retractions notices – conclusion
From a more general perspective, two aspects of scientific misconduct are rendered visible through retractions. First, retractions identify individual instances of questionable research and label them as either morally deplorable or just erroneous. However, whether both labels are stigmatizing remains unclear and retractions are thus ambiguously interpreted. Second, retractions make visible that someone is issuing retractions and is justifying that act. Either the authors of the retraction notice or some other actor, e.g. university committees or the ORI, appear as instrumental in bringing a retraction forward. By referring to policies and guidelines the act of retracting an article is not only deemed justified but also presented as necessary and as a direct consequence of the authors’ misconduct. Intention and action of those who are involved in identifying and retracting problematic articles are concealed by such justifications. As a consequence, retractions are in the process of becoming naturalized through policies and guidelines as classification tools (Bowker and Star, 1999).
As a form of social control, retractions are made more effective by concealing the processes and actors through which the retractions are effected. In general, invisibility is an important factor in the effectiveness of social control, as ‘power can be conceived as a form of external visibility (visibility of effects) associated with internal visibility (invisibility of identification): the effects of power are visible to everyone, but what power is in its essence, where it is really located, will not be disclosed’ (Brighenti, 2007: 338). In the case of retractions, policies and guidelines show how retractions should be handled ideally but disguise the specific actions and actors involved in retracting an article. Retractions are thus effective at highlighting norms of good scientific practice while associating these norms, in the form of policies and guidelines, with those who are responsible for issuing retractions rather than associating them with the specific processes and actions. Clearly visible are thus individual cases of misconduct and the norms that were violated. Less visible or even invisible are the actors and the processes implementing the retractions. As a consequence, attention is mainly drawn to the fact that misconduct exists and that someone is dealing with it in the interest of the scientific community; who this is and how they are doing this, remains opaque.
With respect to visibility, the literature allows to draw inferences mainly on the media-type visibility of retractions, in a more limited fashion on their control-type visibility and almost nothing on their social-type visibility. As a media format, retractions render misconduct clearly visible but inconsistencies remain whether specific cases are judged as fraud or error. The connection between retractions and misconduct is reinforced through numerous competing policies from journals, universities, funders and others. More specifically, retractions themselves are visible through being labelled as such, e.g. through typographic means. By being identified as extraordinary, retractions receive exceeding attention initially (halo-effect), but, subsequently, are less visible in databases and repositories. Furthermore, the types of misconduct that are made visible through retractions frequently pertain to visible aspects of misconduct, e.g. image manipulations.
Noticeable are a series of effects that increase visibility through labelling or, vice versa, labelling through increased visibility. Most prominent is the phenomenon of repeat offenders. A substantial part of all retractions is related to just a small number of authors. 6 Conventionally seen, this lends support for the ‘just a few bad apples-theory’. However, as the societal reaction perspective in the sociology of deviance has shown, an initial label of misconduct can result in further incidences of misconduct (Grattet, 2011). Support for such a view comes from the observation that an initial retraction frequently leads to more systematic investigation into other publications from the same author. It seems self-evident that one retraction leads or should lead to further investigation concerning the same author, so much so, that this is practised in many investigations and not deemed noticeable in most studies on scientific misconduct. Retractions and scientific misconduct may even be exemplary cases for the sociology of deviance in general as the relation between a general level of social control in science and the increased visibility through labelling from a retraction seems exceptionally disproportionate.
How large such a labelling effect might be is impossible to estimate based on the current literature. That more visibility can lead to more scrutiny and thus to more retractions is plausible. Evidence is provided by Brookes (2014) comparing the number of retractions to studies publicized on the blog science-fraud.org to those not yet publicized. Also plausible is that retractions have negative effects most notably for the authors of retracted papers as evidenced by an ensuing loss of citations to the incriminated papers, but also to other papers by the same authors. These effects can be far reaching as complete research fields, affected by multiple retractions, can suffer significantly with respect to publication activity.
The way visibility relates to the effects from labelling through retractions is multifaceted. On the one hand, visibility resulting from retractions has mainly negative effects. Retractions lead to a loss of citations for individual authors, reduced publication activity for complete research fields, and thus to disadvantages in the reward structure of science (Merton, 1973). These disadvantages may vary according to whether retractions are mainly noticed within a specific research field or whether the visibility of retractions is more expansive, maybe even reaching extra-scientific audiences. There is tentative evidence that negative effects from retractions are increasing over time. While older studies (Pfeifer and Snodgrass, 1990; Whitely et al., 1994) find no or little citation loss, newer work (Furman et al., 2012; Lu et al., 2013) finds stronger effects in similar cases. This may be due to a general increase in the visibility of retractions through changed (electronic) communication practices.
On the other hand, the negative effects from retractions are moderated by previous visibility. Eminent authors are less vulnerable to retractions in that they are less likely to have a paper retracted, the duration from publication to retraction is longer, and citation loss is less severe. Two different aspects of visibility, being eminent and having a retraction, act as countervailing forces. Further research should clarify whether these findings are related to the findings that high impact journals are more likely to have misconduct policies and disclosure statements from authors. This would allow addressing the question of whether oversight from journals is effective in reducing the likelihood of retractions.
Furthermore, not only the effects of retractions but also the means through which retractions are generated are multifaceted and sometimes arbitrary. The heterogeneity of definitions of misconduct is one probable cause but also the retraction practices by journals play a role. Evidence comes from analysis of retraction notices showing that authors, editors and publishers are all involved in its formulation. The notices thus reflect the interests of multiple parties as there is no regulated process leading to retraction. As a case of media-type visibility, the retraction notices allow to draw inferences on what caused the retraction, who was involved in effecting the retraction, and even what kind of distortions may be in play when issuing retractions.
These distortions are instrumental in producing what can be called a fragmented pattern of visibility relating to the means through which retractions and misconduct are made visible. Even though retractions make misconduct clearly visible, this is only true in a limited way in that the specifics of the cases are rarely communicated, so that causes of misconduct and procedures of dealing with it remain hidden. This also pertains to the actors involved, as the authors of the retracted articles receive the spotlight while those effecting the retraction stay in the background or remain completely unidentified. Furthermore, journals differ substantially in the ways they deal with retractions and misconduct, which is especially noticeable when comparing journals with the ORI, funding bodies, or universities. Part of the cause for this situation relates to disagreements about the need and the ways to make misconduct visible, but also to the policies that could lead to the standardization of practices but which are too little known to be effective. Further explanations for this fragmented pattern of (in)visibility must come from future research.
Whether retractions are an effective instrument in dealing with scientific misconduct is a topic completely unexplored in the current literature. As a system of punishment it shows signs of bifurcation, known from the general literature on crime and punishment (Garland, 1996). The high visibility of individual cases, on the one hand, and the general policies, on the other, may be similar to the combination of increased punitiveness with managerial styles in national systems of law enforcement and punishment (Cavadino et al., 2013). Bifurcation is usually associated more with efficiency than with effectiveness. However, judging from the same literature, having a visible system of law enforcement and punishment, irrespective of its specifics, is more effective in preventing misconduct than having no system at all. Only in this sense can it be argued that retractions are an effective way of dealing with scientific misconduct.
Footnotes
Acknowledgements
We thank Natascha Trutzenberg and Susanne Förster for help with manuscript preparation and handling of the literature as well as two anonymous reviewers for helpful comments.
Funding
This article originates from project no. 01PY13009 funded by the German Ministry of Education and Research.
