Abstract

In recent months, our journals have seen a surge in submissions evaluating Large Language Models (LLMs)—such as ChatGPT, Claude, or Perplexity—and their ability to answer common patient questions about orthopaedic conditions. Most studies follow a familiar structure: authors create a list of common questions on the topic, the chatbot is asked (usually one time) to respond to each question, the responses are recorded, and the answers are rated based on expert opinion. Although timely, the scientific and clinical value of these studies is uneven, and, in many cases, they are yielding diminishing returns.
These studies may be helpful when they highlight blind spots in AI-generated information, flag risks of patient confusion, or aid clinicians in anticipating what their patients may encounter online. But editors and reviewers are increasingly encountering common limitations:
Beyond these design issues lies a more fundamental concern:
So, What Should Be Published?
These papers are most valuable when they go beyond documenting chatbot outputs. Manuscripts should focus on the following:
Uncovering structural flaws in AI tools (eg, hallucinations, inconsistencies)
Evaluating patient perceptions, trust, and comprehension of AI-generated information
Developing and testing reproducible methods for evaluating AI-generated health content
Comparing AI-generated advice with clinician guidance in ways that inform practice
What Editors and Reviewers Should Ask?
Does the study add meaningful insight beyond what has already been published or generally known?
Are chatbot outputs evaluated with clear standards and transparency?
Is there value for clinicians or patients beyond a static content snapshot?
Editorial Responsibility
As editors, we must guide this emerging area with discernment. Not every chatbot response audit warrants publication. However, well-designed, methodologically sound studies that help define how AI tools can be used appropriately with patients do deserve our attention. At this time, such submissions will be considered for publication in Foot & Ankle Orthopaedics (FAO), rather than Foot & Ankle International (FAI), which prioritizes hypothesis-driven clinical research.
Authors considering such submissions should consult the new AI/Chatbot Evaluation Checklist now available in the FAO “Instructions for Authors,” which outlines minimum criteria for transparency, validity, and relevance.
Only those studies that advance our understanding of AI’s practical, patient-centered value in foot and ankle care will merit peer review and consideration for publication.
Footnotes
This editorial has been copublished in Foot & Ankle International.
