Abstract

Dear Editor,
Lee and Park deliver a rigorous, transparent comparison of AI- versus therapist-written assessment and plan sections on standardized occupational therapist cases, showing higher ratings for AI on completeness, correctness, concordance, and perceived empathy, while human notes exhibit stronger inter-rater reliability. 1 This pattern supports a pragmatic stance: AI as a draft generator under clinician oversight, not a substitute for clinical judgment. 1
We endorse this trajectory and, building on the authors’ call for real-world validation, recommend bridging studies when moving to newer models. Crucially, the study's six Likert-rated subscales (quality: completeness, correctness, concordance; empathy: cognitive, affective, behavioral) are outcome measures, whereas intraclass correlation coefficient (ICC) is a reliability index; they are not commensurate “co-equal metrics.” Replications should therefore reproduce the outcome framework and, in parallel, evaluate inter-rater reliability with ICC—reported as separate endpoints with distinct decision criteria. 1
We propose the following tightened methods for a bridging study: Keep the standardized cases, dual evaluator groups (therapists and patients/caregivers), and blinded ratings to preserve comparability. 1 Primary effectiveness endpoints: subscale and composite means for quality and empathy (5-point scales). Analyze with mixed-effects models including source (AI vs human) and rater group as fixed effects, case and rater as random effects; pre-specify non-inferiority margins for upgrades. Secondary reliability endpoints: ICC (2,1), absolute agreement for each subscale/dimension within source, comparing ICCs via confidence-interval overlap or bootstrap. Finally, replicate the within-source correlations between quality and empathy to test whether newer models tighten the coupling that is strong in human note but weak in AI outputs in the original. 1
Beyond research design, implementation should shift from risk avoidance (de-identified educational cases) to risk management in live environments. We recommend privacy-by-default workflows: enterprise/on-prem endpoints with zero data retention; minimum-necessary structured Subjective/Objective prompts; automated de-identification/pseudonymization prior to inference; output scanning for residual identifiers/disallowed content; human sign-off before Electronic Health Record commit; and full audit trails. These safeguards echo governance themes in recent reviews of AI documentation systems and are consistent with the authors’ emphasis on human oversight.1,2
Importantly, the broader 2024–2025 literature aligns with Lee and Park's direction. A 2025 systematic review reports consistent gains in note completeness and clinician satisfaction when AI documentation tools are embedded into workflows. 2 A randomized trial in orthopedics found improved documentation quality with AI assistance versus usual practice, and feasibility work in neurosurgery demonstrates acceptable drafts for discharge summaries and operative reports—when clinicians remain in the loop.3,4 Together, these findings strengthen the case for clinician-centered, privacy-safe AI assistance that improves efficiency without eroding reliability or confidentiality.2–4
In sum, we support the authors’ core conclusion: AI can reduce documentation burden and improve perceived quality and empathy, provided it operates within clinician-led workflows and robust privacy and governance guardrails. The next step is disciplined translation—model-and-prompt-locked pipelines, prespecified outcome and reliability thresholds, and continuous monitoring—so that laboratory gains become dependable clinical practice. 1
Footnotes
Acknowledgements
The authors gratefully acknowledge Si-An Lee and Jin-Hyuck Park who designed the research. The authors also thank Yun-Ling Liu (Taoyuan psychiatric center) for her helpful comments and suggestions on this manuscript.
Ethical approval
Since this submission is a Letter to the Editor, it does not fall under the scope of Institutional Review Board (IRB) research ethics review, as it does not involve human subjects, identifiable data, or experimental procedures requiring ethical approval.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
