Abstract
Keywords
Almost 50 years have elapsed since the Impact Factor (IF), a measure of the average frequency with which articles in a journal are cited, was devised by Eugene Garfield [1]. Surprisingly, little attention was devoted to the IF until the late 1990s. There is still virtually no discussion of it in the psychiatric literature [2–5]. That the IF has serious limitations, is being misapplied and has unwanted consequences has been increasingly noted (e.g [6–10]), but proposals for alternative ways to assess published work have not been forthcoming.
We critically examine the IF in this paper. We have conducted a comprehensive search of medical databases (Index Medicus/MEDLINE, EMBASE/Excerpta Medica and PsycLIT) and websites (e.g [11], [12]), and conferred with relevant experts in the research and publishing fields. In performing this task, we do not seek to disparage journals which have a high IF or belittle the pioneering effort of Garfield to sort out the wheat from the chaff. Further, we have a potential conflict of interest in being, respectively, Editor and Advisory Board member of the Australian and New Zealand Journal of Psychiatry (ANZJP), a journal with a modest IF. Nonetheless, we feel it incumbent on us, with a responsibility to readers, contributors and the profession as a whole, to examine the subject as objectively as we can and, if possible, to propound ideas to optimize evaluation of the quality of published work. Our first task is necessarily to define the IF.
What exactly is the IF?
IFs are calculated and then published annually in the Journal of Citation Reports (JCR) by the Institute of Scientific Information (ISI), a commercial organization founded by Garfield himself. The IF of a journal is the number of citations to its articles published in the preceding 2 years, divided by the number of articles (‘source items’) published in that journal during those 2 years. For example, the ANZJP's IF in 1999 of 1.197 (the most recent available, published in the 2000 JCR) represents the number of citations in 1999 to the ANZJP's articles appearing in 1997 and 1998 (237: this includes all article types), divided by the number of articles published during the same period (198: this excludes editorials, letters and meeting abstracts) [13]. A direct citation count is obviously meaningless given the varying number of articles published in scientific journals.
The IF has spawned a series of other measures, all linked to citation counts. These include the ‘Scopeadjusted IF’ (the IF adjusted for the number of subject areas published by the journal), ‘Discipline-specific IF’ (the journal's IF standardized according to citation rates in different disciplines), ‘Journal-specific Influence Factor’ (a measure of the influence of papers in one journal on those in another), ‘Immediacy Index’ (an estimate of how quickly the average article in a specific journal is cited) and ‘Cited Half-life’ (a score reflecting the rate of continuing citations to a journal's articles) [12], [14], [15]. Since none of these measures are as well known or as influential as the IF, and lack any attractive advantages to support them, our critique concentrates solely on the ‘parent’.
What problems bedevil the IF?
A host of problems are associated with the IF; these can be categorized into conceptual, technical, liability to manipulation, and misuse. We consider each in turn.
Conceptual
The IF is promoted as an index of scientific quality, yet the premise of citations correlating positively with quality is seriously flawed. In our view, this spurious notion embodies a false assumption – that authors cite an article because it is meritorious [9] (we will return to the thorny question of how we may judge merit). Just as plausible criteria for such citation are utility (e.g. an article describing psychometric properties of a questionnaire) and accessibility [9]. Similarly, that the count represents a measure of quality cannot be sustained. Indeed, papers may be cited frequently because they are regarded as poor (e.g. an obviously biased review attacking psychotherapy's effectiveness [16]) or even because they are known to be fraudulent, and an author wishes to highlight their limitations. Other papers may be commonly cited merely because they provided the first description of a certain research method (e.g. laboratory assay), which an author does not want to repeat in detail. Conversely, about 50% of articles may never be cited, according to one study [6]. Are we to assume that all of these are of poor quality? Finally, the 2 years arbitrarily set by ISI is nonsensical. Quality does not always declare itself in as brief a period. Many a Nobel Prize winner in medicine has received the honour several years after the discovery [10]. Nor should classic articles, like Cade's seminal paper on lithium [17], be tossed onto the scrap heap because they have supposedly exceeded a use-by date; their impact is enduring.
Technical
This discussion of the technical problems of the IF adapted from [7–9]. The IF is technically fraught. Several limitations relate to arbitrary criteria for including or excluding articles and references in the equation. For example, the number of journals in ISI's database, the so-called Science Citation Index (SCI), is only about 3500, a minute proportion of the world total of over 100 000 [12]. Journal selection for the SCI is questionable. For instance, English-language journals – particularly those published in the USA – seem to be favoured. Different fields are covered unequally. Coverage for chemistry is estimated at 90%, in contrast to a mere 30% for biology [7]. The pattern in psychiatry is unknown. Entry into the ISI database is often delayed so that articles published at the end of a year may be absent. Books are never incorporated into the source item pool. Is it not ludicrous that Erikson's Childhood and Society [18], or Ellenberger's Discovery of the Unconscious [19], or Goodwin and Jamison's Manic Depressive Illness [20] could never be acknowledged under the prevailing IF system? Editorials, letters and meeting abstracts are permitted to claim a spot in the numerator, but not the denominator. Journals with short publication lags and higher circulations tend to have higher IFs.
Hinging as it does on citations, the IF is directly affected by referencing errors and patterns. For example, misprints in lists of references are common (affecting up to one-quarter of references [9]), causing the same article to appear in the SCI more than once. Journal space restrictions preclude citations to all sources that authors draw upon. It has been estimated that two-thirds of the sources authors use in writing a scientific paper do not appear in the reference list [9]. Review articles tend to be cited more than original research. The ISI database does not correct for self-citations (i.e. authors citing their own work), which amount to about one-third of all citations [7]. ‘In-house’ citations (to close colleagues) and ‘sycophantic’ citations (e.g. to seniors, editors, hoped-for referees) also go unscrutinized.
Liable to manipulation
It follows that the IF can be readily manipulated. Imagine that Dr Touchup, the recently appointed, highly ambitious and scheming editor of a psychiatric journal, has just been handed a newly released list of IFs and notes, aghast, that his publication is buried away in the ‘Junior League’. Rather than sighing lugubriously, Dr Touchup resolves to boost the IF in any way he can, however, questionable. Accordingly, he publishes fewer articles per year, discourages submission of articles that are unlikely to attract citations, publishes many more reviews, and then by experts with an international reputation, and encourages contributors to take pride in their own work – they should show ‘continuity of [recent] scholarship’ and demonstrate ‘regional awareness’ by citing their previous publications in Dr Touchup's journal. He sleeps soundly, believing that he will be amply rewarded in due course and will rub shoulders with the editorial elite. However, he may not escape censure; the editor of the journal Leukaemia routinely sent submitting authors a letter asking them to increase the number of citations to papers published in that journal [21]. This drew strong criticism from the editors of another haematology journal and the Lancet [21], who viewed this as a blatant attempt to increase Leukaemia's IF.
Misapplication
The IF was originally developed to help librarians discern which journals were being used and therefore worth having on their shelves. Later, its utility was extended to determining, indirectly, the quality of the average paper in a specific journal. There is mounting evidence that the IF has been used over the last decade for a variety of other purposes, without corresponding data to support such use. This stems from conclusions being drawn about the quality of an individual article in a journal on the basis of the journal's IF. Since the frequency with which individual articles are cited varies substantially, such an approach is entirely inappropriate. In fact, the most-cited 50% of a journal's articles account for 90% of that journal's citations [8]. Thus, even if a correlation between citation and quality did exist (strongly contestable, as alluded to earlier), one could not possibly make any inference about the quality of a particular article on the basis of the IF of the journal in which it appears. Data also suggest that many articles in a journal with a relatively low IF may be cited more frequently than many of those in a journal with a much higher IF [6]. Despite these findings, decisions are being made, in varying measure, about candidates for academic positions or promotions and applicants for research grants according to the journals in which they have published their work rather than on the basis of the work itself. For example, the Italian Association for Cancer Research requires grant applicants to note the IF of each journal in which they have published for the last 5 years, then calculate a ‘weighted average IF’ [22]. Although the Australian National Health and Medical Research Council (NHMRC) has no policy governing the specific role of the IF in an applicant's track record, according to a former chair of the Council, most assessors rate publication in the highest IF journals very highly [Larkins R: personal communication, 1999]. Concerns have been expressed that a similar practice prevails in parts of the UK, Japan and USA [22], [23].
The thinking of authors and the shape of journals are being profoundly affected by the IF. We have observed frequently, for instance, that authors preferentially submit to high IF journals instead of selecting the most appropriate outlet on the basis of the work itself and the intended readership. As a result, readers of a more suitable journal may well be denied ready access to the material. The practice soon becomes established as junior authors imitate their senior colleagues. The shape of journals is already subject to influence by the IF because editors fear lagging behind in the rankings if they do not play what amounts to the only game in town. That some journals – psychiatric among them [24–26] – use the IF for marketing purposes emphasizes the pressures to increase the score. Our own Journal, for example, has been buffeted by authorial forces aggrieved by it not working hard enough to boost the IF. This pressure was initially resisted. Then a decision to decrease the frequency of publishing case reports was made, partly influenced by the limited contribution they make to the IF. Hence, an article that might advance knowledge and influence clinical practice [27] will have a much lower profile in that journal. A sense of editorial integrity and ethical responsibility forbade the ANZJP making any other concessions in the quest to boost its IF. Furthermore, a collegiate journal must necessarily satisfy a range of constituencies, especially clinicians interested in practical aspects of their profession. A journal's essence should not be determined by pursuit of a number.
It is worth digressing to note that, in the arts and humanities, a database records citations to each article published in a given half year [12], a system which allows authors and others to note in what ways the work is being used. However, there is no measure comparable to the IF in these fields. What inferences can we draw? Are the arts and humanities ‘missing the plot’, or are they showing more insight than the sciences in this matter?
A paradox
A measure found to be ill-conceived, unreliable and invalid in medicine will invariably fall out of favour and cease being used. Curiously, the IF has been spared this fate. Au contraire, it is attracting more attention and is being more frequently and pervasively applied. How can we explain this paradox? Could it be that the IF meets the interests of influential figures in publishing and academia by offering them a ‘quick fix’ – to cut a swathe through more time-consuming methods of appraisal? If so, that is insufficient a basis to persevere with its deployment. The perceived burden might well be eased by attending creatively, even radically, to source problems such as the ever-increasing number of medical journals (a virtual explosion to the point of absurdity), the ‘publish or perish’ ethos (no matter the need for certain material to be published in journal form, or at all) and ‘salami’ publishing (dividing data into ridiculously thin slices). Here, we focus our remedial thoughts on judging quality of published articles, and disseminating corresponding ‘league tables’.
The measurement of quality
Explicitly and implicitly, the IF is depicted as a measure of quality yet it fails dismally in this regard. Quality is an elusive concept. In the present context, it probably makes sense to link it to the cardinal purposes of publication, that is, in a review – to distill, analyse and critique; in a research report – to be original, methodologically sound and contain findings with significant clinical or theoretical implications. Even an extensive editorial or essay, such as Leon Eisenberg's ‘Mindlessness and brainlessness in psychiatry’ [28], can break new ground and be regarded as of high quality.
Interestingly, in the areas in which the IF is being used, methods are established to determine quality – albeit not flawless – which have been neglected in the quest for an instant panacea. For example, in evaluating candidates for an academic position or for promotion, quality can be judged by inspecting the overall track record, noting the number and nature of publications in peer-reviewed journals and asking candidates to submit a selection (say 5–10) of recent publications (including books, book chapters and reviews in ‘annuals’ like the American Psychiatric Press ‘Review of Psychiatry’ series) which they themselves regard as their best work. The degree to which published material advances practice, research or theory is evaluated directly through examination of that material. It is encouraging that the main government research agency in Germany (Deutsche Forschungsgemeinschaft) has issued guidelines to universities that they abandon evaluating candidates' published work on the basis of IFs, and instead examine their top five publications directly [29]. As the guidelines emphasize, ‘clearly neither counting publications nor computing their cumulative impact factors are… adequate forms of performance evaluation’ [29]. Perhaps hopes for change are not forlorn.
A proposal
In the sciences and the arts, a tradition prevails of recognizing and celebrating outstanding work of colleagues. The Nobel and Pulitzer Prizes exemplify a process that, by and large, functions most satisfactorily (an occasional surprise choice occurs but no system is foolproof) [30], [31]. Medical journals, such as the Medical Journal of Australia and Journal of the American Academy of Child and Adolescent Psychiatry have competitions in which papers judged to be superior are short-listed for one or more prizes. Common to this is the procedure of peer review and judges' familiarity with the pertinent work.
Psychiatric journals are well placed to adopt these principles and methods. The ANZJP has decided to serve as the guinea pig for the field! In 2001, we shall experiment as follows. Each year the 12 best articles will be chosen from those published, this number representing about 10% of the volume. The judging panel comprises five members of the International Advisory Board who have been selected by the Editorial Board for their scholarship, integrity and general psychiatric knowledge. We suspected the Editorial Board itself is too closely involved in deliberations about papers en route to publication to participate. More specifically, the panel collectively covers a range of areas of psychiatry. Membership includes representatives from different parts of the world. Judges are regular readers of the ANZJP and therefore in a position to monitor the content of each issue as it is published. They have been asked to note articles which satisfy one or more of the following criteria:
1. Adds consequentially to the field through original, innovative research findings;
2. Expands or challenges current knowledge;
3. Opens additional areas for new research activity;
4. Opens a pathway to advance knowledge;
5. Integrates discoveries obtained by different approaches and/or disciplines through creative synthesis – thus bringing new insights to bear on original research; and
6. Reflects critically on research findings to guide the direction of further research;
The article may be outstanding for other reasons and thus the above are not hard-and-fast criteria but rather a set of overlapping guidelines. The applicable criteria also depend on the article type (e.g. review, original research, conceptual).
After publication of the entire volume in late 2001, each judge will select, independently, the 12 best articles. Judging will be anonymous, thus permitting the panellists to include their own work. The data will be collated and the dozen articles gaining the most votes will be listed in early 2002, in alphabetical order in terms of first author. Ranking of the judge's selected articles will not be sought to avoid the process becoming overly onerous. We hope other psychiatric journals will follow suit or scrutinize the results of our trial and confer with us as appropriate. In any event, we plan to disseminate the fruits of our experience to fellow editors. Finally, we stress that the experiment is not to be construed as a competition and the journal's selection should not constitute a prize list (although we concede this may occur). Published papers which are not selected will be viewed as worthy in having negotiated the peer-review process successfully.
Conclusion
It would appear that the IF and its offspring no longer serve a useful purpose. A juggernaut has evolved; it is time to disembark. At stake is the inherent nature of psychiatric publishing, the way research is funded, and academic prospects. We appreciate the realities of the information explosion, keenly contested research grants and intense competition for shrinking university positions, as well as the wish of decision-makers in these areas to make life easier for themselves. However, as convenient and tempting as is a simple formula like the IF, there are no shortcuts for the proper appraisal of scientific endeavour and, in our view, there is nothing as reliable as the painstaking process of peer review.
Footnotes
Acknowledgements
We thank Per Seglen, Eugene Garfield, Michael Waxman, Martin Van Der Weyden, Richard Larkins, Uri Aviram, Perminder Sachdev, Christopher Tennant, Glenn Hunt, Anthony Jorm, Michael Salzberg, Wendy Jamieson, Ken McNab and Melissa Hardie for their assistance.
Professor Bloch is the Editor of the Australian and New Zealand Journal of Psychiatry. Dr Walter is a member of the Advisory Board of the Australian and New Zealand Journal of Psychiatry and Editor of Australasian Psychiatry.
