Abstract
Background:
Artificial intelligence (AI), particularly generative and large language models, is being used in nursing education, practice, and scholarly writing. Generative AI applications have been specifically examined for their use in conducting literature reviews with evidence supporting reduced production time of scholarly work. However, there has been limited investigation of their levels of accuracy with identifying references for a literature review.
Objective:
The purpose of this study was to compare human-generated citations of literature reviews with AI literature-review generated citations.
Methods:
Using a comparative exploratory design, references from 4 human-written literature reviews, 2 published and 2 unpublished, on 4 different topics, were compared to references derived from 2 AI literature applications, Consensus and Elicit. Three prompting strategies were utilized, including prompts generated using ChatGPT-4. Agreement between the AI and human references was evaluated.
Results:
The percent of agreement between AI and human generated reference lists ranged from 0% to 63.6%. The Consensus application had a greater overall mean rate of match (21.3%) as compared to Elicit (3.7%). The use of a ChatGPT-4 prompt did not significantly impact results, and there were no differences based on published or unpublished literature reviews.
Conclusion:
The 2 literature-based applications examined in this study offered a glimpse of their potential use and limitations. The use of an AI literature review application may support but not replace human work.
Get full access to this article
View all access options for this article.
