Abstract

It is well known that artificial intelligence (AI) is revolutionizing medicine, including headache medicine. Its potential clinical implication has been reviewed elsewhere (1). A notable breakthrough in AI is the recent emergence of an easily accessible natural language-based AI model. Anyone with access to the Internet can benefit from AI without understanding the slightest bit about programming. ChatGPT is one such easily-accessible AI program that has gained tremendous popularity in recent weeks (2). As an AI-based language model, its original purpose was to handle language processing tasks: understand the question and provide relevant answers in all accessible languages. Human knowledge is based on language. In this sense, when such a model works properly, it provides not only adequate linguistic responses but also relevant content to the question. Hence, the application’s use is wide: language translation, text summarization or rephrasing, computer code improvement/debugging, and even answering meaningful questions. Such capabilities, available to everyone, will revolutionize scientific research, manuscript preparation, grant writing, and in the end publication in scientific journals.
In lieu of the recent emergence of popularity, we used ChatGPT as an example to conduct experiments on the potential of AI in scientific publishing. The experiment began with language processing tasks. When fed with an abstract for an article, the AI easily performed the following tasks with exceptional quality: expanding/shortening the word count, rewriting the content for a different target audience, e.g., the general population, generating a letter to the editor for submission, or reformulating sentences to avoid verbatim copying. We even asked for an abstract written in the language of Shakespeare and were rather impressed. In a more serious example, the AI program was fed with abstracts published in two different languages for translation into English. In both examples, the translated text is natural and easy to understand. Cephalalgia, like many other journals, encourages the use of English language editing services, for its purpose is to improve clarity and the communication of the science behind it. What then is the difference to asking a native speaker to help out? In this regard, AI serves as a powerful tool for language editing, for both native and non-native English speakers alike. Pure language editing does not contradict any of the existing policies in Cephalalgia. However, when the language editing tool is powerful enough, the line between language editing and AI ghostwriting becomes blurred. Scientific objectivity is based on verifiability and transparency. Therefore, content edited or generated by AI services should be clearly labeled. Cephalalgia authorship guidelines state that the use of AI is allowed, however it needs to disclosed. Notably, (other) AI services have been developed to detect AI-generated content (3). AI-ghostwriters can potentially, and will eventually, be exposed.
In the next experiment, we tested whether the AI could correctly solve content-related tasks, not just language-related tasks. Real-life examples from articles submitted to Cephalalgia were used, such as whether the interpretation based on a specific analysis is correct, or based on the study design, i.e., did the authors choose the right statistical analysis to analyze the data. In our examples, these questions were correctly answered by AI. We could even describe a study design and ask the AI to suggest appropriate statistical tools for further analysis. Even non-statistical questions, such as ‘how to differentiate between different types of trigeminal autonomic cephalalgias?’ or ‘what is indomethacin-responsive headache?’ were answered correctly. Given appropriate keywords with further refinement, AI can be used to generate text about less ambiguous background information, such as that used in the introduction of a scientific article.
In the next step, we tested the AI with broader questions, where answers might not be clear and unequivocal, and evaluated whether the AI is capable of generating pertinent and accurate content. In other words, is the AI capable of generating mini-reviews on specific topics with acceptable quality? Will humans soon be replaced in tasks such as writing narrative reviews? The AI was fed with three topics to generate a short abstract for each. The AI-generated content was then evaluated for relevance and accuracy by two independent headache researchers. In addition, we verified the originality of the content by checking the AI-generated text with the plagiarism detection software iThenticate®, a process that is used per default for all manuscripts submitted to Cephalalgia. Is the AI-generated content a collage of text taken from published review articles or book chapters? Is it capable of generating content based on original data rather than existing review content?
In general, the AI-generated text for each topic is easy to read, probably (at least for now) with a target audience of the general population. Accuracy and pertinency vary widely: in some cases (e.g., example 1, the role of hypothalamus in cluster headache), the content is generally accurate, although some sentences are vague and potentially confusing. In the two other cases however (example 2: role of sensory thresholds in migraine and example 3: emerging migraine therapies), there are obvious errors and content of concern. For example, erenumab is categorized as an anti-CGRP antibody, not an anti-CGRP-receptor antibody. Psychedelic drugs were listed as emerging migraine therapies.
A key feature that distinguishes academic writing from general writing is that the former is based on reliable sources such as academic journals. Every single claim requires proper references and citations. Current AI algorithms have been trained on the information available on the Internet, with varying degrees of credibility. Furthermore, given the currently available AI tools, each argument or claim cannot be linked to specific external sources. The reliability of the information source cannot be verified. When the topic of our experiment was modified with the addition of ‘include proper references’, for example, or ‘generate an abstract on the role of the hypothalamus in cluster headache with references’, the AI actually listed three references in the proper format of scientific journals. At first glance, all of the authors in the list are reputable headache researchers who have published articles on the topic, and the references are well-known journals in the field. However, upon closer inspection, all three references are fictitious. None of them actually exists, all of them featured fake DOIs (digital object identifiers). The references, like the content, were made up. This problem may be easily solved in the foreseeable future when an academia-AI is developed with more reliable source information during the training phase. As such, we found the same results as a recently published article in bioRxiv (4).
The report generated by iThenticate yielded a low percentage (16%) of overall similarity to known literature. This percentage is lower than most of the manuscripts submitted to Cephalalgia, suggesting that AI did more than just copy/paste from existing literature and is capable of generating (original) sentences. In other words, AI-generated sentences can effectively evade plagiarism detectors. We then tested the AI-generated abstracts in two free versions of online AI-content detectors (3,5). One detector misclassified the first and third examples as human-generated text. The other detector classified all three examples as less than 50% human-generated text. For now, AI-generated text is potentially detectable- but not easily by humans. However, as the algorithms continue to be refined, both the AI-generated text and the AI detectors can be expected to be more sophisticated. Which side will triumph will probably depend on who gets more resources (which may or may not be based on commercial interests).
In general, the emergence of natural language AI provides a variety of applications in academia, especially for language-related tasks. For content-related tasks, the assessment must be cautious and prudent. The accuracy of the answers depends on whether there is a standardized answer to the given question. This is usually the case in school settings, and for tasks involving statistical analysis or debugging of programming, AI consistently generates reliable solutions. One can therefore consider incorporating AI in the review process for questions about methods. However, for questions involving uncertainty (i.e., unsolved or debated research questions) AI sometimes tends to be confused by the conflicting content. Judging the available content as ‘true’ or ‘false’ is not possible for AI without assistance. Furthermore, the lack of proper references renders the AI-generated content not directly usable in academia. But this, again, is a question of time.
AI has come to stay in academia. We predict that, whether we like it or not, writing a grant application, a review, or a research paper will be done differently in 10 years’ time. The immediate consequence we see is that ‘narrative reviews’ will lose impact whereas meta-analysis and systematic reviews keep their impact- until these tasks can be done by AI as well. In that case, somewhere in the future, the role of authorship will have to be discussed again. All this does not imply that research will be less valuable, however, it implies several not-yet-solved problems regarding intellectual property, authorship and opinion generation in the scientific and lay-press environment. When used carefully, AI can be an efficient and powerful tool. In theory, collaborative intelligence (humans and AI joining forces) can cover each other’s weaknesses and enhance each other’s strengths. We are at the brink of a new era.
