Abstract
The peer review process is fundamental to scientific advancement, fostering quality publications through constructive feedback, but identifying helpful reviewers remains challenging for journals facing increasing submission volumes. While Large Language Models show promise in manuscript evaluation and can reduce reviewer burden, they still have limitations, including potential biases, vague feedback, and context constraints that require significant human oversight. Our study collected 9 submissions with 33 human reviews and used Claude 3.5 Sonnet to create both an AI-submission reviewer and an AI-review reviewer, which evaluates both AI-generated and human reviews based on word count, coverage, and quality with zero-shot learning. Analysis shows AI reviewers provide better-structured reviews, though humans and AI use different rating approaches, with humans showing greater variance. Reviews create meaningful interactions between authors and reviewers, with the human element providing domain perspectives and personal flair. AI could improve reviews further by constructing a review critiquing system that actively engages reviewers rather than relegating them to passive automation consumers. This study suggests AI can augment human reviewers—not replace them—by integrating AI-generated holistic reviews with nuanced human insights to improve conference quality.
Get full access to this article
View all access options for this article.
