Sage Journals: Discover world-class research

Abstract

This study examines the effectiveness of artificial intelligence (AI) in psychological report writing by comparing reports generated by human psychologists with those produced by OpenAI’s Generative Pre-trained Transformer Version 4 (ChatGPT-4). A total of 249 licensed psychologists evaluated the reports based on overall quality, readability, writing style, organization, summary quality, recommendations, preference, and willingness to sign off on the reports. Although human-generated reports were generally rated more favorably and participants expressed greater comfort in approving them, effect sizes were typically small. Two exceptions were noted: moderate effect sizes were found in favor of human-written summaries, while AI-generated reports showed moderate effect sizes for the quality of their recommendations. These findings suggest that AI shows potential for augmenting report writing. Comprehensive guidelines are necessary for the ethical and effective integration of AI into psychological practice. Further research is needed to enhance our understanding of AI’s role and capabilities in psychological assessment and reporting.

Keywords

artificial intelligence report writing psychology mental health care large language models

Get full access to this article

View all access options for this article.

References

Abramson

(2022). Children's mental health is in crisis. American Psychological Association. https://www.apa.org/monitor/2022/01/special-childrens-mental-health

Alkaissi

McFarlane

S. I.

(2023). Artificial hallucinations in ChatGPT: Implications in scientific writing. Cureus, 15(2), Article e35179. https://doi.org/10.7759/cureus.35179

Anthropic . (2024). Claude AI. Anthropic. https://claude.ai/

Artificial Analysis . (2024). Independent analysis of AI models and API providers. Artificial Analysis. https://artificialanalysis.ai/

Baum

K. T.

von Thomsen

Elam

Murphy

Gerstle

Austin

C. A.

Beebe

D. A.

(2017). Communication is key: The utility of a revised neuropsychological report format. The Clinical Neuropsychologist, 32(3), 345–367. https://doi.org/10.1080/13854046.2017.1413208

Benjamini

Hochberg

(1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society - Series B: Statistical Methodology, 57(1), 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

Boettcher

(2021). Studies of depression and anxiety using reddit as a data source: Scoping review. JMIR Mental Health, 8(11), 1–18. https://doi.org/10.2196%2F29487

Bradley-Johnson

Johnson

C. M.

(2006). A handbook for effective report writing (2nd ed.). PRO-ED.

Burr

Morley

Taddeo

Floridi

(2020). Digital psychiatry: Risks and opportunities for public health and wellbeing. IEEE Transactions on Technology and Society, 1(1), 21–33. https://doi.org/10.1109/TTS.2020.2977059

10.

Chu

Wang

Feng

Zhang

(2024). History, development, and principles of large language models-an introductory survey. ArXiv. 1-25. https://doi.org/10.48550/arXiv.2402.06853

11.

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Erlbaum.

12.

Dillman

D. A.

Smyth

J. D.

Christian

L. M.

(2014). Internet, phone, mail, and mixed mode surveys: The tailored design method (4th ed.). John Wiley & Sons Inc.

13.

Dunn

Hunter

Steffes

Whitney

Foss

Mammino

Nathoo

Hawkins

S. D.

Dane

Yungmann

(2023). Artificial intelligence–derived dermatology case reports are indistinguishable from those written by humans: A single-blinded observer study. Journal of the American Academy of Dermatology, 89(2), 388–390. https://doi.org/10.1016/j.jaad.2023.04.005

14.

Eager

(2023). Achieving better results from ChatGPT using incremental prompting. Bron Eager. https://broneager.com/incremental-prompting-ai-chatgpt

15.

Eysenbach

(2004). Improving the quality of web surveys: The checklist for reporting results of internet e-surveys (CHERRIES). Journal of Medical Internet Research, 6(3), Article e34. https://doi.org/10.2196/jmir.6.3.e34

16.

Farmer

Lockwood

Goforth

Thomas

(2024). Artificial intelligence in practice: Opportunities, challenges, and ethical considerations. Professional Psychology: Research and Practice, 5(1), 19–32. https://psycnet.apa.org/doi/10.1037/pro0000595

17.

Faul

Erdfelder

Buchner

Lang

A. G.

(2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149–1160. https://doi.org/10.3758/BRM.41.4.1149

18.

Filter

K. J.

Ebsen

Dibos

(2013). School psychology crossroads in America: Discrepancies between actual and preferred discrete practices and barriers to preferred practice. International Journal of Special Education, 28(1), 88–100. https://eric.ed.gov/?id=EJ1013688

19.

Goel

(2017). Developments in the field of natural language processing. International Journal of Advanced Research in Computer Science, 8(3), 23–28. https://www.ijarcs.info/index.php/Ijarcs/article/view/2944

20.

Google . (2024). Gemini AI. Google. https://gemini.google.com/

21.

Graham

Depp

Lee

E. E.

Nebeker

Kim

H. C.

Jeste

D. V.

(2019). Artificial intelligence for mental health and mental illnesses: An overview. Current Psychiatry Reports, 21(11), 1–18. https://doi.org/10.1007%2Fs11920-019-1094-0

22.

Griffith

L. E.

Cook

D. J.

Guyatt

G. H.

Charles

C. A.

(1999). Comparison of open and closed questionnaire formats in obtaining demographic information from Canadian general internists. Journal of Clinical Epidemiology, 52(10), 997–1005. https://doi.org/10.1016/s0895-4356(99)00106-7

23.

Groth-Marnat

Horvath

L. S.

(2006). The psychological report: A review of current controversies. Journal of Clinical Psychology, 62(1), 73–81. https://doi.org/10.1002/jclp.20201

24.

Habicht

Viswanathan

Carrington

Hauser

T. U.

Harper

Rollwage

(2024). Closing the accessibility gap to mental health treatment with a personalized self-referral chatbot. Nature Medicine, 30(2), 595–602. https://doi.org/10.1038/s41591-023-02766-x

25.

Harvey

V. S.

(2006). Variables affecting the clarity of psychological reports. Journal of Clinical Psychology, 62(1), 5–18. https://doi.org/10.1002/jclp.20196

26.

Hass

Carriere

J. A.

(2014). Writing useful, accessible, and legally defensible psychoeducational reports. John Wiley & Sons.

27.

Hatch

S. G.

Goodman

Z. T.

Vowels

Hatch

H. D.

Brown

A. L.

Guttman

… Braithwaite

S. R.

(2025). When ELIZA meets therapists: A Turing test for the heart and mind. PLOS Mental Health, 2(2), e0000145. https://doi.org/10.1371/journal.pmen.0000145

28.

Hodson

Williamson

(2024). Can large language models replace therapists? Evaluating performance at simple cognitive behavioral therapy tasks. JMIR AI, 3(1), 1–4. https://doi.org/10.2196/52500

29.

Hollander

Wolfe

D. A.

Chicken

(2014). Nonparametric statistical methods. John Wiley & Sons.

30.

Lancaster

(2023). Beyond chatbots: The rise of large language models. Forbes. https://www.forbes.com/sites/forbestechcouncil/2023/03/20/beyond-chatbots-the-rise-of-large-language-models/?sh=2bfaabeb2319

31.

Lichtenstein

Ecker

(2019). High-impact assessment reports for children and adolescents: A consumer-responsive approach. Guilford Publication.

32.

Lockwood

A. B.

Castleberry

(2024). Examining the capabilities of GPT-4 to write an APA-style school psychology paper. Contemporary School Psychology, 29(1), 1–8. https://doi.org/10.1007/s40688-024-00500-z

33.

Longworth

(2023). Talk to the bot: AI assistant certification marks breakthrough for UK mental health. Medical Device Network. https://www.medicaldevice-network.com/interviews/talk-to-the-bot-ai-assistant-certification-marks-breakthrough-for-uk-mental-health/

34.

Machin

M. A.

Machin

T. M.

Gasson

(2024). Comparing ChatGPT with experts’ responses to scenarios that assess psychological literacy. Psychology Learning and Teaching, 23(2), 265–280. https://doi.org/10.1177/14757257241241592

35.

Marr

(2023). A short history of ChatGPT: How we got to where we are today. Forbes. https://www.forbes.com/sites/bernardmarr/2023/05/19/a-short-history-of-chatgpt-how-we-got-to-where-we-are-today/

36.

Mastoras

S. M.

Climie

E. A.

McCrimmon

A. W.

Schwean

V. L.

(2011). A C.L.E.A.R. Approach to report writing: A framework for improving the efficacy of psychoeducational reports. Canadian Journal of School Psychology, 26(2), 127–147. https://doi.org/10.1177/0829573511409722

37.

Mat Hassan

Salim

H. S.

Amaran

Yunus

N. I.

Yusof

N. A.

Daud

Fry

(2023). Prevalence of mental health problems among children with long COVID: A systematic review and meta-analysis. PLoS One, 18(5), 1–27. https://doi.org/10.1371/journal.pone.0282538

38.

McCarthy

(2007). What is artificial intelligence? Stanford. https://www-formal.stanford.edu/jmc/whatisai/node1.html

39.

Moell

(2024). Comparing the efficacy of GPT-4 and chat-GPT in mental health care: A blind assessment of large language models for psychological support. ArXiv, 1–14. https://doi.org/10.48550/arXiv.2405.09300

40.

Naveed

Khan

A. U.

Qiu

Saqib

Anwar

Usman

Mian

(2023). A comprehensive overview of large language models. ArXiv, 1–47. https://doi.org/10.48550/arXiv.2307.06435

41.

OpenAI . (2024). Hello GPT-4. OpenAI. https://openai.com/index/hello-gpt-4o/

42.

Pichai

Hassabis

(2024). Google Gemini: The next generation model. Google. https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#sundar-note

43.

Pinto

D. S.

Noronha

S. M.

Saigal

Quencer

R. M.

(2024). Comparison of an AI-generated case report with a human-written case report: Practical considerations for AI-assisted medical writing. Cureus, 16(5), 1–11. https://doi.org/10.7759/cureus.60461

44.

Roe

R. A.

(2002). What makes a competent psychologist? European Psychologist, 7(3), 192–202. https://doi.org/10.1027/1016-9040.7.3.192

45.

Rollwage

Habicht

Juchems

Carrington

Hauser

T. U.

Harper

(2024). Conversational AI facilitates mental health assessments and is associated with improved recovery rates. BMJ Innovations, 10(1-2), 4–12. https://doi.org/10.1136/bmjinnov-2023-001110

46.

Srivastava

Rastogi

Rao

Shoeb

A. A. M.

Abid

Fisch

Wang

(2022). Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. ArXiv, 1–95. https://doi.org/10.48550/arXiv.2206.04615

47.

Templeton

(2024). Scaling monosemanticity: Extracting interpretable features from Claude 3 sonnet. Anthropic. https://transformer-circuits.pub/2024/scaling-monosemanticity/

48.

Torous

Bucci

Bell

I. H.

Kessing

L. V.

Faurholt-Jepsen

Whelan

Carvalho

A. F.

Keshavan

Linardon

Firth

(2021). The growing field of digital psychiatry: Current evidence and the future of apps, social media, chatbots, and virtual reality. World Psychiatry: Official Journal of the World Psychiatric Association (WPA), 20(3), 318–335. https://doi.org/10.1002/wps.20883

49.

Wiener

Costaris

(2012). Teaching psychological report writing: Content and process. Canadian Journal of School Psychology, 27(2), 119–135. https://doi.org/10.1177/0829573511418484

50.

Zell

Strickhouser

J. E.

Sedikides

Alicke

M. D.

(2020). The better-than-average effect in comparative self-evaluation: A comprehensive review and meta-analysis. Psychological Bulletin, 146(2), 118–149. https://doi.org/10.1037/bul0000218

51.

Ziwei

B. Y.

Chua

H. N.

(2019). An application for classifying depression in tweets. In Proceedings of the 2nd international conference on computing and big data (pp. 37–41). Association for Computing Machinery.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.34 MB

Human vs. Machine: Comparing AI-Generated and Human-Written Psychological Reports

Abstract

Keywords

Get full access to this article

References

Supplementary Material