Sage Journals: Discover world-class research

Abstract

Background

This study compares three large language models (LLMs) in answering common HIV questions, given ongoing concerns about their accuracy and reliability in patient education.

Methods

Models answered 63 HIV questions. Accuracy (5-point Likert), readability (Flesch-Kincaid, Gunning Fog, Coleman-Liau), and reliability (DISCERN, EQIP) were assessed.

Results

Claude 3.7 Sonnet showed significantly higher accuracy (4.54 ± 0.44) compared to ChatGPT-4o (4.29 ± 0.49) and Gemini Advanced 2.0 Flash (4.31 ± 0.50) (p < .001). ChatGPT-4o had lower accuracy in disease definition, follow-up, and transmission routes, while Gemini Advanced 2.0 Flash performed poorly in daily life and treatment-related questions. Readability analyses indicated ChatGPT-4o produced the most accessible content according to Flesch-Kincaid and Coleman-Liau indices, whereas Claude 3.7 Sonnet was most comprehensible by Gunning Fog standards. Gemini Advanced 2.0 Flash consistently generated more complex texts across all readability measures (p < .001). Regarding reliability, Claude 3.7 Sonnet achieved “good” quality on DISCERN, while others were rated “moderate” (p = .059). On EQIP, Claude 3.7 Sonnet (median 61.8) and ChatGPT-4o (55.3) were classified as “good quality with minor limitations,” whereas Gemini Advanced 2.0 Flash (41.2) was rated “low quality” (p = .049).

Conclusions

Claude 3.7 Sonnet is preferable for accuracy and reliability, while ChatGPT-4o offers superior readability. Selecting LLMs for HIV education should consider accuracy, readability, and reliability, emphasizing regular assessment of content quality and cultural sensitivity.

Keywords

Human immunodeficiency virus AIDS artificial intelligence patient education as topic

Get full access to this article

View all access options for this article.

References

Data on the size of the HIV epidemic, [cited 2025 Apr 26]. Available from: https://www.who.int/data/gho/data/themes/hiv-aids/data-on-the-size-of-the-hiv-aids-epidemic

Cooper

Clatworthy

Harding

, et al. Measuring quality of life among people living with HIV: a systematic review of reviews. Health Qual Life Outcomes 2017; 15(1): 220.

Rueda

Mitra

Chen

, et al. Examining the associations between HIV-related stigma and health outcomes in people living with HIV/AIDS: a series of meta-analyses. BMJ Open 2016; 6(7): e011453.

Rzeszutek

Gruszczyńska

Pięta

, et al. HIV/AIDS stigma and psychological well-being after 40 years of HIV/AIDS: a systematic review and meta-analysis. Eur J Psychotraumatol 2021; 12(1): 1990527.

Tang

Goldsamt

Meng

, et al. Global estimate of the prevalence of post-traumatic stress disorder among adults living with HIV: a systematic review and meta-analysis. BMJ Open 2020; 10(4): e032435.

Babel

Wang

Alessi

, et al. Stigma, HIV risk, and access to HIV prevention and treatment services among men who have sex with men (MSM) in the United States: a scoping review. AIDS Behav 2021; 25(11): 3574–3604.

Adamopoulou

Moussiades

. An overview of chatbot technology. In: Maglogiannis

Iliadis

Pimenidis

(eds). Artificial Intelligence Applications and Innovations. Cham: Springer International Publishing, 2020, pp. 373–383.

Hamet

Tremblay

. Artificial intelligence in medicine. Metabolism 2017; 69S: S36–S40.

Liu

Wang

, et al. The value of artificial intelligence in the diagnosis of lung cancer: a systematic review and meta-analysis. PLoS One 2023; 18(3): e0273445.

10.

Kamitani

Mizuno

Khalil

, et al. Improving HIV preexposure prophylaxis uptake with artificial intelligence and automation: a systematic review. AIDS Lond Engl 2024; 38(10): 1560–1569.

11.

Achiche

, et al. The first AI‐based chatbot to promote HIV self‐management: a mixed methods usability study.

12.

Kizito

. ChatGPT has the potential to enhance antiretroviral therapy adherence among adolescents with HIV in Sub-Saharan Africa. Med Educ Online 2023; 28(1): 2246781.

13.

Amina

. The use of AI in enhancing patient communication. Res Invent J Sci Exp Sci 2024; 4(2): 45–48.

14.

HIV and AIDS

[cited 2025 Jul 23]. Available from: https://www.who.int/news-room/questions-and-answers/item/hiv-aids

15.

HIV and AIDS - basic facts | UNAIDS

[cited 2025 Jul 23]. Available from: https://www.unaids.org/en/frequently-asked-questions-about-hiv-and-aids

16.

Pozitif Yaşam Derneği . Pozitif yaşam derneği. Available from: https://www.pozitifyasam.org/

17.

Pozitif . Dernek | Pozitif-iz derneği | İstanbul. Available from: https://www.pozitifiz.org

18.

Sıkça sorulan HIV soruları . Sıkça sorulan HIV soruları I kırmızı Kurdele İstanbul I HIV bilgisi I. Available from. https://www.kirmizikurdele.org/sikca-sorulan-hiv-sorulari

19.

AIDS ve CYBH Derneği. [cited 2025 Jul 23]. Available from: https://www.aidsvecinselhastaliklar.com/bilgilen

20.

Durmaz Engin

Karatas

Ozturk

. Exploring the role of ChatGPT-4, BingAI, and gemini as virtual consultants to educate families about retinopathy of prematurity. Children 2024; 11(6): 750.

21.

Charnock

Shepperd

Needham

, et al. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health 1999; 53(2): 105–111.

22.

Moult

Franck

Brady

. Ensuring quality information for patients: development and preliminary validation of a new instrument to improve the quality of written health care information. Health Expect 2004; 7(2): 165–175.

23.

Readable . Flesch reading ease and the flesch kincaid grade level. Available from: https://readable.com/readability/flesch-reading-ease-flesch-kincaid-grade-level/

24.

Readable . The gunning Fog index. Available from: https://readable.com/readability/gunning-fog-index/

25.

Readable . The coleman Liau readability index. Available from: https://readable.com/readability/coleman-liau-readability-index/

26.

Liu

Zhang

, et al. Application of artificial intelligence in medicine: an overview. Curr Med Sci 2021; 41(6): 1105–1115.

27.

Fujimoto

Hunter

McCoy

, et al. Evaluating AI chatbots for HIV prevention: an assessment of response quality and user tailoring. Calif HIVAIDS Policy Res Cent 2024.

28.

De Vito

Colpani

Moi

, et al. Assessing chatgpt’s potential in HIV prevention communication: a comprehensive evaluation of accuracy, completeness, and inclusivity. AIDS Behav 2024; 28(8): 2746–2754.

29.

Koh

MCY

Ngiam

Yong

, et al. The role of an artificial intelligence model in antiretroviral therapy counselling and advice for people living with HIV. HIV Med 2024; 25(4): 504–508.

30.

Wang

Reddy

Ingersoll

, et al. Rapport matters: enhancing HIV mHealth communication through linguistic analysis and large language models. In: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery, 2024, pp. 1–8.

31.

Cheah

Gan

Altice

, et al. Testing the feasibility and acceptability of using an artificial intelligence chatbot to promote HIV testing and pre-exposure prophylaxis in Malaysia: mixed methods study. JMIR Hum Factors 2024; 11: e52055.

32.

Mehraeen

Safdari

SeyedAlinaghi

, et al. A mobile-based self-management application- usability evaluation from the perspective of HIV-positive people. Health Policy Technol 2020; 9(3): 294–301.

33.

Ateşman

. Measuring readability in Turkish. AU Tömer Lang J 1997; 58(2): 171–174.

34.

Criss

Nguyen

Gonzales

, et al. “HIV Stigma Exists” — exploring ChatGPT’s HIV advice by race and ethnicity, sexual orientation, and gender identity. J Racial Ethn Health Disparities 2024.

35.

Olaboye

Maha

Kolawole

, et al. Integrative analysis of AI-driven optimization in HIV treatment regimens. Comput Sci IT Res J 2024; 5(6): 1314–1334.

36.

Marcus

Sewell

Balzer

, et al. Artificial intelligence and machine learning for HIV prevention: emerging approaches to ending the epidemic. Curr HIV AIDS Rep 2020; 17(3): 171–179.

37.

Chandler

Warner

Aidoo-Frimpong

, et al. “What Did You Say, ChatGPT?” the use of AI in black women’s HIV self-Education: an inductive qualitative data analysis. J Assoc Nurses AIDS Care 2024; 35(3): 294–302.

38.

van Heerden

Bosman

Swendeman

, et al. Chatbots for HIV prevention and care: a narrative review. Curr HIV AIDS Rep 2023; 20(6): 481–486.

39.

Andigema

. AI in the management of HIV: case study Cameroon. Int J Virol AIDS 2023; 10(1): 89.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.14 MB

Artificial intelligence meets HIV education: Comparing three large language models on accuracy,readability,and reliability

Abstract

Background

Methods

Results

Conclusions

Keywords

Get full access to this article

References

Supplementary Material