Abstract
In the last decade academic promotion and competitive research grant funding have moved away from relying principally on reports by referees – where anonymity and subjectivity risk biased ratings [1] – to more objectively quantifiable strategies [2]. Despite longstanding recognition of the many limitations of citation analysis – with even Garfield observing that the system is not without its flaws in identifying good science [3, 4] – the number of citations an article receives has become a key performance indicator.
When asked informally, most psychiatry researchers find it difficult to predict the future citation status of their publications, because the time lag between a paper's preparation, publication and critical appraisal often results in its citation pattern emerging only after years. Thus, the identification of factors predicting the future citation status of a scientific article would be of some worth.
A recent BMJ study reported that citation counts of articles 2 years after publication could be predicted by data available within 3 weeks of publication [5]. The authors evaluated 20 potential predictors after sampling 1274 articles from the top 105 journals in medicine, with a multiple regression analysis identifying nine as significant and having ‘about 60% surety’ [5] in predicting the 2 year citation rate. The publication variables were (i) larger number of authors; (ii) abstracted and referenced in a synoptic journal[Q1]; (iii) higher ‘clinical relevance’ score (rating by practitioners from the relevant discipline for clinical importance); (iv) longer article length; (v) presence of structured abstract; (vi) citing of more references; (vii) being an original article rather than a review; (viii) involvement of a multi-centred study; and (ix) focus on a therapy. Of the non-significant predictors, two are worth noting: there was no impact of the article's newsworthiness (as rated by practitioners from the relevant discipline) or the nation of the first author. The authors’ strategy for rating any article's clinical relevance and newsworthiness was to use the McMaster online rating of evidence system (MORE) [6], which has more than 4000 contributing practitioners and which rates articles by at least three practitioners.
Only five of the top 105 journals evaluated in the BMJ report were psychiatric journals. It remains to be established if the predictive factors hold for this discipline, in which journals generally appear at monthly or longer intervals [7]. Also psychiatry lacks laboratory tests and biological markers, and has few diagnostic tests (domains that tend to be rapidly cited); also, paradigm shifts (which evoke citation) occur slowly.
We therefore elected to undertake a psychiatry-specific study, to determine the salience of a number of the variables identified in the BMJ study [5], alone and in comparison to early citation status.
Methods
Using the Institute for Scientific Information Web of Science (ISI) [7] as the primary data collection site, we collated information from the top 30 psychiatric journals (determined by an impact factor ≥3) for papers published in January or February 2006 to determine whether (i) sources of information available at publication, and (ii) progressive citation counts (from 1 month to 1 year, up to January or February 2007) best predicted citation numbers at 2 years. The articles were subject to critical appraisal criteria, adapted from the MORE system [6], which rates whether an article meets specific inclusion criteria.
Inclusion was based on the article being an original research report and/or systematic review, thus excluding editorials, case reports, published abstracts and miscellaneous articles (e.g. book reviews, viewpoints, letters to the editor), because predicting the quality of the latter would require different assessment criteria.
Predictor variables consisted of (i) a set identified from the BMJ article (i.e. number of authors; number of pages; number of cited references; whether the article was a review or original research; whether the article topic involved a therapy or not; the number of words in the abstract, and whether the abstract was structured or unstructured) and (ii) citations at 1, 3, 6 and 12 months after publication. Several significant variables from the BMJ study could not be assessed, mostly due to inadequate sample numbers (e.g. indexing in an evidence-based medicine online database was not included because this occurred for only one of the articles).
Results
The ISI journal rating service [7] generated articles from the 30 psychiatric journals. One article was removed from analysis because its 2 year citation count was more than 10 SD in excess of the mean. Of the 428 articles analysed, reviews accounted for 6% (n=26), while 26% (n=111) were therapy-related articles; the rest (n=291) fell into other categories (e.g. ‘diagnosis’, ‘prognosis’, ‘aetiology’ etc.).
The mean number of citations at 2 years in the whole dataset was 7.00 ± 7.20 (range=0–54). Of these, 50% had more than five citations, while 23 articles had had no citations at 2 years. We first report on the capacity of citation numbers (dichotomized into never cited or cited at 1, 3, 6 and 12 months) to predict high citation status at 2 years (i.e. defined as being above the 2 year mean). Table 1 data show that citation status at all time periods was a significant predictor, but that the strength of the prediction increased with time.
Univariate logistic regression of the likelihood of increased no. citations at 2 years
CI, confidence interval; OR, odds ratio.
On multiple logistic regression analysis [8, 9] examining the raw number of citations at each of the four predictor periods, the only predictor to stay in the model was the number of citations at 12 months, with higher citations at this point predicting an increased number of citations at 2 years (R2=0.36, 95% confidence interval (CI)=2.17–3.44). When 12 month data were removed, the accounted variance dropped and only the 6 month data variable was left in the equation (R2=0.13, 95%CI=2.08–4.49). Similarly, when both the 12 month and 6 month data were removed, citation numbers at 1 month were dropped from the equation and only the citation number at 3 months remained significant (R2=0.04, 95%CI=1.68–9.11). The significant difference in accounted variance (R2 change on multiple regression=0.31, β=0.10, 95%CI=0.08–0.12) identified only the 12 month citation numbers as meaningful.
A multiple regression analysis then examined the prediction of both 12 month citation numbers and the derived BMJ predictors, and produced a highly significant result (R2=0.75, 95%CI=2.25–2.54), for three (of the 10 examined) variables entered in the equation, with entry set at a significance level of 0.05. Table 2 data quantify the distinctly greater contribution coming from citation count at 1 year, compared to the other two identified significant predictors (i.e. having more references and more authors). Importantly, the change in explained variance with the addition of the latter two predictors, although significant, accounted for only an additional 1.9% of the equation (R2 change=0.01, 95%CI=0.08–0.27).
Multiple regression for 428 articles, predicting citation counts at 2 years
CI, confidence interval; OR, odds ratio.
An additional multiple regression was undertaken to examine the capacity for early prediction, entering the BMJ-derived predictors with number of citations at 1 month. The results of this model were significant, although accounting for only 13.5% of the variance (R2=0.13, p<0.01). The predictors remaining in the model were number of citations at 1 month (β=3.47, 95%CI=1.63–5.31), structured versus unstructured abstract (β=1.60, 95%CI=0.28–2.91), number of authors (β=0.33, 95%CI=0.15–0.51) and number of cited references (β=0.70, 95%CI=0.03–0.11).
Discussion
The BMJ article quantified a 60% prediction (i.e. derived from an R2 of 0.60) of citation numbers 2 years later from a set of variables that were derivable within 3 weeks of publication [5]. Some appeared intrinsic to the research team (i.e. more authors and multicentred study), some to the paper (i.e. longer and more references, original article rather than a review), one to the journal (i.e. structured abstract), one to the field (i.e. a study about a therapy) and two to the externally evaluated worthiness of the article (i.e. inclusion in a synoptic journal and clinical relevance score).
The present study, examining first whether early citation rate predicts later (i.e. 2 year) citation rate, demonstrated strong support for that proposition as tested at four periods: 1 month, 3 months, 6 months and 12 months after publication. Such a result is not surprising but it does argue that, despite assessment and production delays between one paper being cited in a subsequently published paper, there is an early signal from citation numbers in the psychiatry domain. But that early 1 month signal was slight, with prediction capacity strengthening as the interval from publication date increased. Clearly, such findings are not without a cautionary note. Early citations to any article are likely to be a result of self-citation or citations by colleagues who have had access to preprint or early online versions. This potential confound, however, does reflect real-world publishing, where self-citations are common and not often controlled for in assessing the quality of a research article.
Because the BMJ article indicated that successful prediction could be made as early as 3 weeks, we undertook an analysis of 4 week data. At that time early citation number was the strongest predictor, while three BMJ article predictors (i.e. structured abstract, more references and more authors) also contributed to the outcome data. The overall prediction, however, was only 13% and far lower than the 60% quantified for the BMJ study. By contrast, when the present dataset included 12 month citations, prediction was high at 75% (and in excess of the BMJ data prediction), presumably because of our inclusion of citation numbers at that time, and because the citation trajectory of a paper was established by that period.
Thus, the template generated in the BMJ article for assessing general medical publications appears unlikely to have predictive validity for psychiatry. Psychiatry papers do appear to require a significant interval (of at least 1 year) to develop a citation trajectory that allows their citation status at 2 years to be strongly predicted. Analyses examining predictors of citation status after an extended period (say 10 years) might now be undertaken to examine the comparative utility of the BMJ article and related variables, as well as citations at defined periods, to determine whether current findings are confirmed across a lengthier review period. Additionally, a more substantive analysis may include journals of lower impact (rather than only high-impact factor journals) to assess the comparative utility of journal quality in predicting citation success.
Footnotes
Acknowledgements
Financial support for the present study was provided by grants from the National Health and Medical Research Council of Australia (510135) and NSW Department of Health. Our thanks to Kerrie Eyers for assistance.
