Comparative Efficacy of AI LLMs in Clinical Social Work: ChatGPT-4,Gemini,Copilot

Abstract

Purpose

This study examines the comparative efficacy of three AI large language models (LLMs)—ChatGPT-4, Gemini, and Microsoft Copilot—in clinical social work.

Method

By presenting scenarios of varying complexities, the study assessed their performance using the Ateşman Readability Index and a Likert-type accuracy scale.

Results

Results showed that Gemini had the highest accuracy, while Microsoft Copilot excelled in readability. Significant differences were found in accuracy scores (p = .003), although readability differences were not statistically significant (p = .054). No correlation was found between case complexity and either accuracy or readability.

Discussion

Despite the differences, none of the models fully met all accuracy standards, indicating areas for further improvement. The findings suggest that while LLMs offer promise in social work, they require refinement to better meet the field's needs.

Keywords

ChatGPT Gemini Microsoft Copilot artificial intelligence clinical social work

Get full access to this article

View all access options for this article.

References

Al-Ashwal

Zawiah

Gharaibeh

Abu-Farha

Bitar

A. N.

(2023). Evaluating the sensitivity, specificity, and accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard against conventional drug-drug interactions clinical tools. Drug, Healthcare and Patient Safety, 15, 137–147. https://doi.org/10.2147/DHPS.S425858

Alhur

(2024). Redefining healthcare with artificial intelligence (AI): The contributions of ChatGPT, Gemini, and Copilot. Cureus, 16(4), Article e57795. https://doi.org/10.7759/cureus.57795

Asakura

Occhiuto

Todd

Leithead

Clapperton

(2020). A call to action on artificial intelligence and social work education: Lessons learned from a simulation project using natural language processing. Journal of Teaching in Social Work, 40(5), 501–518. https://doi.org/10.1080/08841233.2020.1813234

Ateşman

(1997). Türkçede okunabilirliğin ölçülmesi. Dil Dergisi, 58, 71–74.

Bronfenbrenner

(1979). The ecology of human development. Harvard University Press.

Brown

T. B.

Mann

Ryder

Subbiah

Kaplan

Dhariwal

Neelakantan

Shyam

Sastry

Askell

Agarwal

Herbert-Voss

Krueger

Henighan

Child

Ramesh

Ziegler

Winter

… Amodei

(2020). Language models are few-shot learners. In Larochelle

Ranzato

Hadsell

Balcan

M. F.

Lin

(Eds.), Advances in neural information processing systems 33(NeurIPS 2020) (pp. 1877–1901). Curran Associates, Inc.

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed). Routledge Academic.

Columbia Center for Teaching and Learning. (2023). Considerations for AI Tools in the Classroom. Retrieved May 3, 2024, from https://ctl.columbia.edu/resources-and-technology/resources/ai-tools/

Crawford

Cowling

Ashton-Hay

Kelder

Middleton

Wilson

G. S.

(2023). Artificial intelligence and authorship editor policy: ChatGPT, Bard, Bing AI, and beyond. Journal of University Teaching & Learning Practice, 20(5), 1–11. https://doi.org/10.53761/1.20.01.01

10.

DuBay

W. H.

(2004). The principles of readability. Impact Information.

11.

Flesch

(1948). A new readability yardstick. Journal of Applied Psychology, 32(2), 221–233. https://doi.org/10.1037/h0057532

12.

Giannakopoulos

Kavadella

Aaqel Salim

Stamatopoulos

Kaklamanos

(2023). Evaluation of the performance of generative AI large language models, ChatGPT, Google Bard, and Microsoft Bing Chat in supporting evidence-based dentistry: Comparative mixed methods study. Journal of Medical Internet Research, 25, e51580. https://doi.org/10.2196/51580

13.

Gupta

P. K.

Raturi

Venkateswarlu

(2023). Chatgpt for designing course outlines: A boon or bane to modern technology. Social Science Research Network. https://doi.org/10.2139/ssrn.4386113

14.

Haider

(2024). Exploring opportunities and challenges of artificial intelligence in social work education. In Przeperski

Baikady

(Eds.), The Routledge international handbook of social work teaching (pp. 46–62). Routledge.

15.

Ibrahim

A. T. H.

Saleh

E. F.

Al Mamari

W. S.

Elsherbiny

M. M.

K., & Mustafa

M. M.

(2023). Understanding the role of ChatGPT in social work: What we know and what we still need to discover. Social Issues, 1(1), 5–13.

16.

Isbanner

O’Shaughnessy

Steel

Wilcock

Carter

(2022). The adoption of artificial intelligence in health care and social services in Australia: Findings from a methodologically innovative national survey of values and attitudes (the AVA-AI study). Journal of Medical Internet Research, 24(8), e37611. https://doi.org/10.2196/37611

17.

Kaftan

A. N.

Hussain

M. K.

Naser

F. H.

(2024). Response accuracy of ChatGPT 3.5, Copilot, and Gemini in interpreting biochemical laboratory data: A pilot study. Scientific Reports, 14(1), Article 8233. https://doi.org/10.1038/s41598-024-58964-1

18.

Krause

(2023). Large language models and generative AI in finance: An analysis of ChatGPT, Bard, and Bing AI. Social Science Research Network.

19.

Kuroiwa

Sarcon

Ibara

Yamada

Yamamoto

Tsukamoto

Fujita

(2023). The potential of ChatGPT as a self-diagnostic tool in common orthopedic diseases: Exploratory study. Journal of Medical Internet Research, 25, e47621. https://doi.org/10.2196/47621

20.

Makrygiannakis

M. A.

Giannakopoulos

Kaklamanos

E. G.

(2024). Evidence-based potential of generative artificial intelligence large language models in orthodontics: A comparative study of ChatGPT, Google Bard, and Microsoft Bing. European Journal of Orthodontics. https://doi.org/10.1093/ejo/cjae017

21.

National Association of Social Workers. (2017). Standards for Technology in Social Work Practice. Retrieved May 3, 2024, from https://www.socialworkers.org/Practice/NASW-Practice-StandardsGuidelines/Standards-for-Technology-in-Social-Work-Practice

22.

Netto

N. R.

(2023). Use of case studies in social work assessments - ChatGPT’s kryptonite? Social Work Education, 43(9), 2473–2484. https://doi.org/10.1080/02615479.2023.2266461

23.

Ngo

T. T.

Tran

T. T.

G. K.

Nguyen

P. T.

(2024). ChatGPT for educational purposes: Investigating the impact of knowledge management factors on student satisfaction and continuous usage. IEEE Transactions on Learning Technologies, 17, 1341–1352. https://doi.org/10.1109/TLT.2024.3383773

24.

Özçetin

Karakuş

(2020). 5. Sınıf türkçe ders kitaplarındaki metinlerin okunabilirlik yönünden incelenmesi. Türkiye Eğitim Dergisi, 5(1), 175–190.

25.

Patton

D. U.

Landau

A. Y.

Mathiyazhagan

(2023). ChatGPT for social work science: Ethical challenges and opportunities. Journal of the Society for Social Work and Research, 14(3), 553–562. https://doi.org/10.1086/726042

26.

Reamer

(2023). Artificial intelligence in social work: Emerging ethical issues. International Journal of Social Work Values and Ethics, 20(2), 52–71. https://doi.org/10.55521/10-020-205

27.

Russell

Norvig

(2020). Artificial intelligence: A modern approach (4th edn.). Pearson.

28.

Sallam

Al-Salahat

Eid

Egger

Puladi

(2024). Human versus artificial intelligence: ChatGPT-4 outperforming Bing, Bard, ChatGPT-3.5, and humans in clinical chemistry multiple-choice questions. Advances in Medical Education and Practice, 15, 857–871. https://doi.org/10.2147/AMEP.S479801

29.

Seth

Lim

Xie

Cevik

Rozen

W. M.

Lee

(2023). Comparing the efficacy of large language models ChatGPT, BARD, and Bing AI in providing information on rhinoplasty: An observational study. Aesthetic Surgery Journal Open Forum, 5, 1–9. Article ojad084. https://doi.org/10.1093/asjof/ojad084

30.

Singer

J. B.

Báez

J. C.

Rios

J. A.

(2023). AI Creates the message: Integrating AI language learning models into social work education and practice. Journal of Social Work Education, 59(2), 294–302. https://doi.org/10.1080/10437797.2023.2189878

31.

Statista. (2023). Education worldwide-statistics & facts. Retrieved May 11, 2024, from https://www.statista.com/topics/7785/education-worldwide/#topicOverview

32.

Tepe

Emekli

(2024). Decoding medical jargon: The use of AI language models (ChatGPT-4, BARD, Microsoft Copilot) in radiology reports. Patient Education and Counseling, 126, 108307. https://doi.org/10.1016/j.pec.2024.108307

33.

Turkish Statistical Institute. (2023). National Education Statistics 2022. Retrieved May 5, 2024, from https://data.tuik.gov.tr/Bulten/Index?p=National-Education-Statistics-2022-49756

34.

Većkalov

van Stekelenburg

van Harreveld

Rutjens

B. T.

(2023). Who is skeptical about scientific innovation? Examining worldview predictors of artificial intelligence, nanotechnology, and human gene editing attitudes. Science Communication, 45(3), 337–366. https://doi.org/10.1177/10755470231184203

35.

Victor

B. G.

Kubiak

Angell

Perron

B. E.

(2023a). Time to move beyond the ASWB Licensing exams: Can generative artificial intelligence offer a way forward for social work? Research on Social Work Practice, 33(5), 511–517. https://doi.org/10.1177/10497315231166125

36.

Victor

B. G.

Sokol

R. L.

Goldkind

Perron

B. E.

(2023b). Recommendations for social work researchers and journal editors on the use of generative aı and large language models. Journal of the Society for Social Work and Research, 14(3), 563–577. https://doi.org/10.1086/726021