Abstract

To the Editor,
We read with interest the article by Danelakis et al. (1) about the utilization of machine learning models in predicting five-year citation counts and the translational impact of papers related to headache research. The authors hypothesized that machine learning models could identify important bibliometric features for predicting five-year citation counts of papers published in headache-oriented journals until 31 December 2017. They concluded that bibliometric data such as high page count, numerous authors, many references, and high author h-index were among the most important predictors for high citation counts. Furthermore, they argue that citation counts alone as a metric of the importance of scientific work must be used with caution, whereas models incorporating publication content could be used to guide researchers, editors, and funders in identifying the most relevant and potentially impactful scientific works.
Hereby, we want to reflect our opinions regarding the methodology, results, discussion, and conclusion of this study. Firstly, the authors identified 14,279 publications but only considered 8600 publications due to missing data. Omitting 40% of data might significantly change important predictors derived from machine learning models. They should have exploited several imputation methods for missing data in predictive modelling to overcome this issue (2), such as K-nearest neighbours, decision trees and random forests, generative adversarial networks, variational autoencoders, and Bayesian methods. In the paper, the explanation was that they chose not to impute data as many features had a near-infinite number of unique variables, such as author affiliations, which would not allow for accurate imputation. In the era of the flourishing of artificial intelligence, finding author affiliations should not be an impossible task. Moreover, in this study, the authors applied the ablation analysis method, which could be also applied to analyzing papers without information about author affiliations in the machine learning models. Furthermore, of 8600 publications, 2119 were editorials, errata, letters, notes, reviews, short surveys, or conference papers. We think that the inclusion of those papers might lead to bias in results because they will fall under the first group (papers with a few citations) and then page count and a number of references would be preferred for predictors in favour of original research and review articles and might represent a false positive result in the prediction model. Therefore, using only original articles in the machine learning models would provide predictors that would be more useful information for editors and their decision-making. Additionally, the authors used journal information, such as impact factor and immediacy index at the time of data download, for a prediction model which could introduce serious discrepancies since those values could significantly change from the time of paper submission. Even more, this parameter in the model could bias the actual potential of the paper to be citable, thus, not helping editors in decision-making. In addition, results from this paper suggest that there is no main predictor for the future impact of the paper and that there are a large number of predictors that contribute similarly to the significance of the machine learning models, thus from the editor's perspective it is still a highly intuitive process of decision-making whether to publish a paper or not. Therefore, information about page count, number of references, first and last author citation counts and h-index could be misleading in a significant number of cases.
On the other hand, the authors speculated that natural language processing (NLP) models analyzing titles, abstracts and keywords could be used to identify the most relevant and potentially impactful scientific works, although extracted text-based features showed moderate capability to be used for machine learning models (mean training performance for text-based features AUC = 0.57 compared to bibliometric data AUC = 0.78 and combined approach AUC = 0.71; and out-of-sample test set performance for AUC = 0.64 compared to bibliometric data AUC = 0.69 and combined approach AUC = 0.71). We agree that NLP models should be incorporated into future research in this field but current state-of-the-art models should be further customized for headache research and new models should be developed. By then, suggesting current NLP models should be taken with caution.
Finally, we believe that the selection of years up to 2017 is misleading, as from that year onwards there have been profound changes in the citability and translation from research to clinical practice of many published and highly citable works related to calcitonin gene-related peptide monoclonal antibodies, as well as guidelines, such as guideline on the use of monoclonal antibodies targeting the calcitonin gene-related peptide pathway for migraine prevention (3), ranked as the 5th most cited article in the Journal of Headache and Pain.
In conclusion, we praise Danelakis et al. (1) for spot-lighting this interesting topic. It would be interesting to follow up on results from this study by using the same already developed model for the prediction of five-year citation counts and the translational impact of papers published from 2018 and onwards, especially those published in 2023, that would test the machine learning model in a prospective study.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
