Abstract

Dear Editor-in-Chief,
We appreciate Dr Petrušić and Prof. Martelleti's interest in our study (1) and thank them for their constructive feedback (2). We appreciate the opportunity to clarify the rationale behind our methodological choices, and to elaborate on the ultimate objectives our modelling seeks to serve. Such discussions are most warranted in the era of ever-increasing output of machine learning papers with highly varying methodology, quality and transparency. Petrušić and Marteletti have pointed out several methodological points that we acknowledge. They depart from our views on the handling of missingness, the selection of cases and variables, and the utility of contemporary Natural Language Processing (NLP) models.
That of the 14,279 publications available on Scopus only 8600 were complete is indeed a limitation to any bibliographic analysis. As clearly pointed out, we chose not to impute data as many features had a near-infinite number of variables (i.e. author affiliations), which we believe would lead to imprecise imputations. Attempting to harvest data from the internet, manually or with Large Language Models (LLMs), is not straightforward, in many cases constitutionally indeterminate, and liable to inject noise and bias in favor of institutions with a large online footprint. The patterns of observed missingness suggest the presence of structure, i.e. Missing Not At Random (MNAR), a scenario for which no imputation mechanism can be confidently relied upon. We strongly believe imputation in this case would lead to noise and increased bias (2). Note that any bibliographic database is necessarily an incomplete sample of the population, and most studies in the field rely on samples. Indeed, our original study is the first to attempt a fully inclusive analysis of the published literature. Where a potentially structured variable is observed we take the view that it is better to model it explicitly than to leave it out. It is only by modelling article types that we could show that the type of publication is one of the important predictors for the future impact of a manuscript (Figures 3 and 4). We agree that using variables that change over time is a clear challenge for any kind of study. This needs to be addressed in future work. On the other hand, we wish to remain agnostic on the question of whether or not the predictive signal is distributed across many variables or concentrated in a few. Nothing about the domain allows us to presume the presence of a very low-dimensional intrinsic structure, and if the optimal model is high-dimensional we need to adjust our inferential stance rather than tolerate a reductive, less performant solution, at least if fidelity is our aim. A model does not need to be reductively explainable to be useful.
We were not able to include publications after 2017 as we aimed to predict five-year citation counts. This in turn meant that we were not able to incorporate the impact of works related to calcitonin gene-related peptide monoclonal antibodies, most of which were published in the last five years. Alternatives could be to use a shorter citation count window or use age-normalized citation counts. We certainly appreciate the suggestion of a future prospective study to evaluate the performance of the model on publications from 2018–2023.
On the other hand, we argue that developing the first machine learning models to predict citation counts and the translational impact (defined as inclusion in guidelines or policy documents) in the headache research, as done in Danelakis et al’s study (1), is a very important first step and can provide useful insights into the headache domain. For example, there is strong evidence that some established prediction factors recorded within the state-of-the-art of some biomedical fields (3–5) can be generalized in the headache domain as well.
We would also like to emphasize the translational impact models developed in the study. An important difference we have recorded here is that, in case of citation count prediction bibliometric predictors achieved the best results, while for translational impact prediction the best results were achieved by combining bibliometric and semantical (text) features (although bibliometrics still weight more). This could be an initial indication that text features do indeed include meaningful information for this case, and this should be also held as one of our take-home messages. This information may be extracted and modelled in a more robust way if more sophisticated NLP models are recruited, as suggested by recent research (6) but the advantage of moving to a semantic analysis is plain.
To conclude, the primary contribution of our work was not to create a perfect citation predictor but rather illustrate the concept, spotlight the topic, highlight the perspective of bibliometric versus semantical predictors and serve as a foundation for further research, particularly in the field of headache.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The research was funded by the Research Council of Norway through the Norwegian Centre for Headache Research.
