Sage Journals: Discover world-class research

Abstract

Background

Professional opinion polling has become a popular means of seeking advice for complex nephrology questions in the #AskRenal community on X. ChatGPT is a large language model with remarkable problem-solving capabilities, but its ability to provide solutions for real-world clinical scenarios remains unproven. This study seeks to evaluate how closely ChatGPT's responses align with current prevailing medical opinions in nephrology.

Methods

Nephrology polls from X were submitted to ChatGPT-4, which generated answers without prior knowledge of the poll outcomes. Its responses were compared to the poll results (inter-rater) and a second set of responses given after a one-week interval (intra-rater) using Cohen's kappa statistic (κ). Subgroup analysis was performed based on question subject matter.

Results

Our analysis comprised two rounds of testing ChatGPT on 271 nephrology-related questions. In the first round, ChatGPT's responses agreed with poll results for 163 of the 271 questions (60.2%; κ = 0.42, 95% CI: 0.38–0.46). In the second round, conducted to assess reproducibility, agreement improved slightly to 171 out of 271 questions (63.1%; κ = 0.46, 95% CI: 0.42–0.50). Comparison of ChatGPT's responses between the two rounds demonstrated high internal consistency, with agreement in 245 out of 271 responses (90.4%; κ = 0.86, 95% CI: 0.82–0.90). Subgroup analysis revealed stronger performance in the combined areas of homeostasis, nephrolithiasis, and pharmacology (κ = 0.53, 95% CI: 0.47–0.59 in both rounds), compared to other nephrology subfields.

Conclusion

ChatGPT-4 demonstrates modest capability in replicating prevailing professional opinion in nephrology polls overall, with varying performance levels between question topics and excellent internal consistency. This study provides insights into the potential and limitations of using ChatGPT in medical decision making.

Keywords

Large language model ChatGPT artificial intelligence nephrology kidney disease disease management professional opinion polling digital health social media

Background

Healthcare is a constantly evolving landscape in which medical professionals frequently encounter complex clinical scenarios without straightforward solutions that align neatly with established guidelines.^1,2 This highlights the necessity of drawing on the expertise of colleagues who can share unique insight gleaned from personal experience.^3–6 Nephrology is a field in which this practice has become common due to the increased prevalence of co-morbid conditions, new etiologies of kidney injury (e.g. immunotherapies), and novel therapeutics (e.g. SGLT2 inhibitors, GLP-1 receptor agonists, endothelin receptor antagonists) that have added to disease complexity and burden.^7,8 The intricacy of nephrology cases and the absence of clear guidelines for specific situations accentuate the need for expert opinion and consensus in clinical decision making.^9,10

Professional collaboration with the goal of optimizing patient care is critically important, as kidney diseases impact millions worldwide. The Global Burden of Disease study indicates that chronic kidney disease (CKD) was a primary cause of morbidity and mortality globally in 2017, with an estimated prevalence of 9.1%.¹¹ As the field if nephrology adapts to meet these evolving challenges, professionals have recognized the importance of staying connected and leveraging collective expertise to address the complexities inherent in managing kidney disease. To that end, the #AskRenal community on X has emerged as a valuable platform for nephrologists to exchange knowledge, seek advice, and engage in discussions about challenging cases.¹² The community uses an automated account to engage the nephrology community by broadcasting questions related to the field. This approach has allowed those with smaller social media followings or who are still in training to participate in discussions, enabling the widespread dissemination of nephrology knowledge to all members of the community.¹² Queries are typically posed using X's polling feature, which allows users to quickly survey a large audience of nephrologists. The collective intelligence gained from this not only helps in navigating intricate medical scenarios but also fosters a deeper sense of community among specialists who might otherwise be isolated by the specifics of their practice.

ChatGPT, a sophisticated large language model (LLM) developed by OpenAI, has demonstrated remarkable capabilities in various fields including healthcare.^13–18 Its proficiency in interpreting natural language inputs and using deep learning techniques to produce human-like responses has spurred considerable interest in its potential applications in medical decision making.^19,20 However, its ability to address real-world scenarios that arise in day-to-day clinical practice remains unproven. We set out to examine this ability in the present study by assessing ChatGPT's effectiveness in answering polls posted by the nephrology community on X. These queries reflect typical issues encountered in practice, ranging from diagnosis and management of kidney diseases to patient care decisions underpinned by intricate medical data. Our objective was to determine how well ChatGPT aligns with the current medical consensus among nephrologists. By doing this, we aimed to identify both the strengths and potential limitations of using such advanced AI systems in a healthcare environment. This evaluation not only helps in understanding ChatGPT's capabilities but also assists in pinpointing areas where the model might need further refinement or additional training to enhance its utility in clinical decision support systems. Through this analysis, we seek to contribute to the broader discourse on the integration of AI technologies like ChatGPT in medicine, emphasizing the importance of aligning these tools with professional healthcare practices and standards.

Methods

This study was designed as an observational analysis comparing responses to nephrology-related polls from the social media platform X with those generated by the large language model ChatGPT-4. The research was conducted at Mayo Clinic in Rochester, Minnesota, USA, over a 2-week period from April 1 to April 15, 2024. We utilized publicly available poll data from the #AskRenal community on X, which represents an international group of medical professionals and individuals interested in nephrology-related topics. The study aimed to evaluate the alignment between prevailing professional opinions in nephrology and AI-generated responses across various subspecialty areas within the field.

#AskRenal dataset

Nephrology-related opinion polls were obtained from posts by independent users on the social media site X. Posts targeted toward the professional nephrology community were identified by their inclusion of the hashtag #AskRenal, and all polls posted between April 2021 and March 2024 were considered. To mitigate the potential impact of nonexpert responses, we implemented strict inclusion criteria for the polls. Each poll under consideration was reviewed qualitatively by members of our team. Polls were included if they were deemed to pose a medically relevant question pertaining to a topic within nephrology and had a definitive voting result (majority of respondents selecting a particular answer). Exclusion criteria included: non-multiple choice format, irrelevant topics (issues unrelated to nephrology or those soliciting personal opinions on nonmedical topics), insufficient response (fewer than 10 respondents), and lack of clarity (e.g. excessive typos, unclear phrasing, or extraneous text that created ambiguity in how the query could be interpreted). These criteria yielded 271 polls, which were then submitted to ChatGPT-4, the latest version of ChatGPT released by OpenAI in April 2024. Each poll was submitted to ChatGPT in its complete form with the answer choices provided. Polls were proofread prior to submission and edits were made in rare circumstances for the sake of clarity; for instance, if major typos were present that could cause reader confusion. In each circumstance, the content and phrasing of the text was preserved as much as possible to match the original post. Extraneous text and hashtags were removed if they were not germane to the query.

ChatGPT queries

ChatGPT was provided with the following prompt a single time at the beginning of the inquiry process: “I am going to ask you a multiple choice question. Please pick the best answer choice of the options provided.” The 271 polls were then entered individually into the ChatGPT interface (Figure 1). ChatGPT was blinded to the poll results and generated responses independently without knowledge of the outcome of the popular vote. Each response it gave was recorded. In cases where ChatGPT did not commit to a single answer in its response, it was re-prompted with the phrase “please choose the single best answer.” Agreement was documented if the ChatGPT response matched that of the popular vote for a given poll, and disagreement was documented otherwise. This process was performed twice, with the two inquiry rounds spaced one week apart from each other. The sequence in which the polls were entered into the ChatGPT interface was randomized for each of the two rounds.

Figure 1.

Examples of ChatGPT-4 responses to nephrology polls posted in the #AskRenal community on X.

Quantitative analysis

The percentage of agreement between ChatGPT's responses and the polling outcomes was recorded. In addition, Cohen's kappa statistic (κ) was calculated to quantify the degree of inter-rater agreement between ChatGPT-4 and the poll results for each of the two inquiry rounds separately, as well as the intra-rater agreement between the two rounds themselves. Cohen's kappa was chosen for our analysis as it provides a straightforward measure of agreement that accounts for chance agreement and is particularly suitable for categorical data. Kappa values, which range from 0 to 1 for agreement, were interpreted using the following thresholds: ≤ 0.20 (no agreement), 0.21 to 0.39 (minimal agreement), 0.4 to 0.59 (weak agreement), 0.6 to 0.79 (moderate agreement), 0.8 to 0.9 (strong agreement), and >0.9 indicating almost-perfect agreement.²¹ Each of the 271 questions was then categorized into one of the three categories based on their subject matter for subgroup analysis to explore variability in agreement across medical topics. Data were managed and analyzed using R statistical software (version 4.1.0).

Qualitative analysis

A qualitative assessment of ChatGPT's responses was conducted to complement the quantitative analysis, and aimed to identify patterns, strengths, and limitations in the answers. Two nephrologists on our team independently reviewed each response. They were evaluated with attention given to answer accuracy (alignment with established medical knowledge and guidelines), relevance (appropriateness of the response to the question asked), depth (the level of detail and explanation provided), and clarity (the ease with which the response could be understood by a medical professional). The responses were also assessed thematically to identify common themes, strengths, and potential areas for improvement in ChatGPT's performance. The nephrologists’ evaluations for each question were compared to each other, and any major discrepancies were resolved through discussion and consensus between the academic nephrologists on our team.

Results

Dataset characteristics

The dataset comprised 271 nephrology-focused poll questions spanning a diverse range of medical topics. A significant portion of the polls received dozens to hundreds of responses within a few hours of being posted. The majority of the polls were written in English and one poll was written in Spanish. The questions were categorized into three broad subject areas: 1) CKD, end-stage renal disease (ESRD), and kidney transplantation (n = 117); 2) glomerular disease, hypertension, acute kidney injury (AKI), and critical care (n = 79); and 3) homeostasis, nephrolithiasis, and pharmacology (n = 75) (Table 1). The homeostasis, nephrolithiasis, and pharmacology category was composed of all questions concerning electrolyte and acid-base disorders, mineral, bone, and stone diseases, and pharmacotherapy.

Table 1.

Reliability assessment of responses with poll results across two rounds and internal comparison for different medical categories.

Category	Metric	Round 1		Round 2		Internal comparison
Category	Metric	# (%)	K	# (%)	K	# (%)	K
All questions	n = 271	163 (60.2)	0.42 (0.38–0.46)	171 (63.1)	0.46 (0.42–0.50)	245 (90.4)	0.86 (0.82–0.90)
CKD, ESRD, and kidney transplantation	n = 117	73 (62.4)	0.43 (0.37–0.49)	75 (64.1)	0.45 (0.39–0.51)	105 (89.7)	0.85 (0.79–0.91)
Glomerular disease, hypertension, AKI, and critical care	n = 79	40 (50.6)	0.28 (0.20–0.36)	46 (58.2)	0.39 (0.31–0.47)	68 (87.3)	0.82 (0.74–0.90)
Homeostasis, nephrolithiasis, and pharmacology	n = 75	50 (66.7)	0.53 (0.47–0.59)	50 (66.7)	0.53 (0.47–0.59)	71 (94.7)	0.93 (0.87–0.99)

# (%) refers to the number of items and the percentage of the total and K the kappa statistic, a measure of inter-rater reliability.

CKD: chronic kidney disease; ESRD: end-stage renal disease.

Inter-rater agreement

ChatGPT responses agreed with the poll results for 163 of the 271 questions (60.2%; κ = 0.42, 95% CI: 0.38–0.46) in the first round of inquiry and 171 out of 271 (63.1%; κ = 0.46, 95% CI: 0.42–0.50) in the second (Table 1). Agreement was highest for questions related to homeostasis, nephrolithiasis, and pharmacology, with the same level of inter-rater agreement (66.7%; κ = 0.53, 95% CI: 0.47–0.59) observed across both rounds. For questions related to CKD, ESRD, and kidney transplantation, there was 62.4% (κ = 0.43, 95% CI: 0.37–0.49) agreement between ChatGPT and the poll results in the first round of inquiry and 64.1% (κ = 0.45, 95% CI: 0.39–0.51) in the second. The glomerular disease, hypertension, AKI, and critical care category had the lowest agreement rates; 50.6% (κ = 0.28, 95% CI: 0.20–0.36) in the first inquiry round and 58.2% (κ = 0.39, 95% CI: 0.31–0.47) in the second. Inter-rater results by subject are summarized in Figure 2(a).

Figure 2.

Agreement by question category. (A, top) Inter-rater comparison. The left panel represents the percentage of questions agreed upon by ChatGPT and the polling results by question category. The right panel represents Cohen's kappa values of agreement for each category. (B, bottom): Internal comparison. The left panel represents the percentage of internal agreement in ChatGPT's responses by question category. The right panel represents Cohen's kappa values for each category.

Comparison of the two sets of responses given by ChatGPT demonstrated internal agreement in 245 out of 271 responses (90.4%; κ = 0.86, 95% CI: 0.82–0.90) overall. Agreement was highest for questions related to homeostasis, nephrolithiasis, and pharmacology at 94.7% (κ = 0.93, 95% CI: 0.87–0.99). This was followed by the CKD, ESRD, and kidney transplantation category (89.7%; κ = 0.85, 95% CI: 0.79–0.91). Agreement was slightly lower for the glomerular disease, hypertension, AKI, and critical care category (87.3%; κ = 0.82, 95% CI: 0.74–0.90). Internal agreement results by subject are summarized in Figure 2(b).

Discussion

Previous studies have investigated the performance of LLMs in various medical disciplines, such as radiology and dermatology.^22–25 To our knowledge, this is the first study to assess the agreement between an LLM and expert opinion polls in nephrology. Published work thus far has generally shown the effectiveness of AI and machine learning models in processing and analyzing medical data, particularly in diagnostic imaging and patient data management. Their application in directly aiding clinical decision making through interpreting complex case questions remains less explored.^26–28 Assessing the alignment between a language model's responses and popular poll outcomes in complex nephrology cases offers insights into the potential of advanced LLMs to support healthcare professionals in navigating challenging situations. Furthermore, it identifies areas where a model's performance might need enhancement and requires further refinement. It also contributes to the expanding knowledge base on artificial intelligence (AI) applications in healthcare, setting the stage for future research and development in this area. This study's findings provide valuable insights into the use of AI in this context, contributing to the growing body of literature on its potential applications and limitations in healthcare decision making.^24–26

We found that ChatGPT demonstrated modest overall ability in replicating prevailing medical opinion in nephrology polls in terms of the percentage of agreement between its answers and the poll results. It demonstrated slight improvement from the first round of inquiry to the second. Cohen's kappa scores indicated weak overall inter-rater agreement in both rounds. Inter-rater agreement was minimal for questions related to glomerular disease, hypertension, AKI, and critical care nephrology in both rounds. ChatGPT exhibited excellent internal consistency in its answers between rounds, with near-perfect intra-rater agreement for questions related to homeostasis, nephrolithiasis, and nephrology. Though other suitable metrics of inter-rater agreement exist, Cohen's kappa was chosen for this study as it is a robust, validated, and widely accepted measure that accounts for chance agreement.²⁹ The variability in performance ChatGPT exhibited between different nephrology topics could suggest that its performance may be influenced by the depth and quality of its training data in specific medical subtopics. Areas with extensive and well-represented data likely led to better alignment with nephrologist responses, while those with less representation could have resulted in poorer performance. Though these results show some initial promise, further development is required to fully prove ChatGPT's utility in real-world clinical practice.

One major consideration regarding the use of ChatGPT and similar LLMs in research and medicine is their propensity for generating false or fabricated information, often referred to as “hallucinations.”^30,31 Our qualitative analysis of ChatGPT's answers to the multiple-choice questions did not reveal any obvious hallucinations or falsified information. However, it's important to note that the format of our study (multiple-choice questions) may have limited the opportunity for such fabrications to occur. Responses were found to be generally accurate and relevant to the questions posed. The model demonstrated a solid understanding of nephrology concepts, particularly in areas with well-established clinical guidelines and those concerning physiology such as homeostasis, nephrolithiasis, and pharmacology. The depth of ChatGPT's explanations varied in its answers to different questions. While some responses were detailed and provided comprehensive explanations, others were more simplistic. It typically excelled with questions requiring straightforward factual knowledge but showed limitations in navigating complex scenarios requiring nuanced clinical judgment. An interesting observation made during the inquiry process was that, for questions in which “it depends” was one of the available answer choices, ChatGPT would almost invariably choose it over one of the other pre-defined answer choices (Figure 1). In doing so, it seemed to give itself room to elaborate on the nuances of the other answer choices and “hedge” by elaborating on why they might be equally valid but more appropriate in specific settings. This approach, while cautious, may reflect an understanding of the complexity of medical decision making, and could prove useful in clinical practice when a given problem may have multiple solutions. In contrast to ChatGPT, the poll respondents tended to favor predefined answers. ChatGPT's performance was not as consistent in areas with less clear guidelines or that were more situationally dependent, such as glomerular disease, hypertension, AKI, and critical care. The consensus among nephrologists in our team was that there was need for further refinement in these areas before the model's utility could be relevant in clinical practice. Impressively, there was no lapse in performance with the use of emojis (e.g. up or down arrows) to replace text, or with context-specific abbreviations (e.g. p-uria for proteinuria). These findings suggest that ChatGPT is capable of processing a wide range of nontraditional natural language inputs, and could potentially be utilized in diverse linguistic and clinical settings, enhancing its adaptability and usefulness in global healthcare environments.

Machine learning-driven systems have been developed that can be used to assist nephrologists in specific aspects of clinical practice, such as predicting the development of ESRD in patients with chronic kidney disease and optimizing renal allograft allocation.^32,33 While the potential of AI tools in nephrology is promising, it is important to note that their impact on patient outcomes and healthcare costs remains largely theoretical at this stage. LLMs like ChatGPT could potentially support clinical practice, but rigorous clinical studies are needed to validate their effectiveness and safety in real-world healthcare settings. It is crucial to emphasize the importance of human oversight and collaboration in AI-assisted nephrology decision making.²³ While AI tools can provide valuable insights and support, they should not replace the expertise and judgment of nephrologists. Collaborative decision-making processes that combine AI-generated insights with the knowledge and experience of nephrologists are essential for ensuring the safe and effective use of AI in managing disease.

To facilitate the responsible integration of AI tools like ChatGPT into nephrology, several key aspects must be addressed. First, there is a need for standardized evaluation frameworks and benchmarks to assess the performance and reliability of AI tools in various nephrological domains, such as glomerular diseases, tubular disorders, and electrolyte imbalances. This would enable the comparison of different AI models and help ensure their safe and effective integration into nephrological practice. Second, the continual monitoring and updating of AI models in nephrology are essential to ensure their performance remains optimal as nephrological knowledge evolves and new clinical data on kidney diseases becomes available. The potential for real-time learning and adaptation could enhance the long-term utility of AI tools in managing kidney health. Finally, interdisciplinary collaboration among nephrologists, AI researchers, ethicists, and policymakers is crucial for driving the responsible development and deployment of AI tools in nephrology. Such collaborations can help address the technical, ethical, and regulatory challenges associated with AI-assisted nephrology decision making, ultimately improving patient care and outcomes in the management of kidney diseases.

The integration of advanced AI tools like ChatGPT into medical decision-making processes highlights a transformative phase in healthcare, particularly in specialized fields such as nephrology. The utilization of the #AskRenal dataset to assess the agreement between ChatGPT's generated responses and popular poll outcomes is an innovative approach that not only tests the applicability of AI in real-world medical scenarios but also explores its potential as a supportive tool for healthcare professionals. As AI continues to permeate the medical field, addressing ethical considerations and ensuring transparency in AI decision-making processes is paramount. It is essential to maintain a clear protocol for AI's use in clinical settings, safeguard patient privacy, and provide transparent documentation of AI's reasoning paths, which could be crucial for gaining trust among healthcare providers and patients alike.^27,34,35 The study underscores the potential and challenges of using AI tools like ChatGPT in medical decision making. As AI technology evolves, its integration into healthcare could significantly enhance the efficiency and accuracy of medical consultations and patient care strategies.

Limitations

Despite the promising findings, this study has several limitations. First, the #AskRenal dataset, while diverse, may not be representative of all nephrology questions encountered in clinical practice. Second, the study relied on a specific version of ChatGPT (4.0) and may not reflect the performance of other language models or future iterations. Given the dynamic nature of AI models like ChatGPT, future research should establish protocols for regular re-evaluation of these tools in nephrology contexts. This could involve creating a standardized set of nephrology questions that can be used to benchmark different versions of AI models over time, allowing for tracking of performance improvements or changes across iterations. Third, the multiple-choice nature of the poll questions may not capture the full spectrum of expert opinions. Fourth, while the polls were specifically sourced from nephrologists and meant to be answered by the nephrology community, they were open to public engagement, and it was not possible to ascertain the qualifications of all the respondents. We aimed to use polls with a large number of respondents to minimize the effect of this potentially confounding factor and to best capture prevailing professional medical opinions. Analyzing data from platforms such as X demonstrates how AI tools like ChatGPT perform in practical real-world settings where information is crowdsourced and not always curated. However, we acknowledge that this approach may not provide the same level of certainty as a comparison with verified nephrology experts. Future studies should consider comparing ChatGPT's performance against responses from a panel of verified nephrology experts. This could involve creating a standardized set of nephrology questions and having both ChatGPT and a group of certified nephrologists answer them. Such an approach would provide a more controlled evaluation of ChatGPT's capabilities and could serve as a complementary analysis to the real-world, crowdsourced data used in the present study. Lastly, future studies with larger datasets are needed to conduct additional subgroup analyses based on question types (e.g. diagnosis vs. data interpretation vs. management).

Future directions

This study's reliance on a single AI model and the static nature of the dataset may limit its applicability to the full range of clinical nephrology practice. To establish the generalizability of these findings, further validation using different AI platforms across various medical settings and disciplines is necessary. Future research should explore factors influencing ChatGPT's performance in different nephrology topics and investigate strategies to improve its accuracy and consistency. Incorporating a broader and more varied dataset, as well as integrating domain-specific knowledge and expert feedback into the training process, may enhance AI performance. Training AI models on specific subfields within nephrology could improve their agreement with human expert opinions and provide more precise support to nephrologists, accommodating the unique challenges of each subfield. Additionally, the development of specialized AI tools tailored for different subfields and implementing systems capable of real-time learning from ongoing medical cases and expert feedback could refine AI's decision-making capabilities, making them more adaptable to the dynamic nature of medical knowledge. Expanding AI capabilities through diversified training and interactive medical settings and integrating AI in multidisciplinary teams could lead to more personalized and precise medical care, enhancing decision-making processes in nephrology and beyond.

Conclusion

This study provides insights into the potential and limitations of using AI-based language models like ChatGPT-4 in medical decision making, specifically in complex nephrology cases. The language model demonstrated modest capability and excellent internal consistency in replicating prevailing professional judgments. While this indicates that ChatGPT has potential to provide relevant medical insights in real-world scenarios, its full capabilities as a clinical adjunct remain unproven. The complexity of medical decision making will require continuous enhancements in AI technology. This study contributes to the understanding of AI's current capabilities and limitations in healthcare and sets the stage for future advancements that could revolutionize how medical knowledge is utilized and disseminated in the field of nephrology. As AI becomes an integral part of medical practice, continuous evaluation and adaptation will be essential to fully realize its potential in improving patient care outcomes. It will also be crucial to engage healthcare professionals, researchers, and ethicists in the development and evaluation of these technologies to ensure their safe, effective, and equitable integration into clinical practice.

Footnotes

Contributorship

JHP and WC were involved in conceptualization, funding acquisition, visualization, and writing—original draft; JHP in data curation, formal analysis, investigation, and resources; JHP, JM, and IMC in methodology; CT and SS in project administration; CT, JM, and IMC in supervision; WC in validation; and CT, SS, JM, and IMC in writing—review & editing. All authors have read and agreed to the published version of the manuscript.

Data availability statement

The data underlying this article will be shared on reasonable request to the corresponding author.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethics approval

This study does not require Ethics Committee or Institutional Review Board approval because it does not involve human or animal subjects, nor does it include patient information or identifiable personal data. Consequently, participant consent was waived for the same reasons.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Language model use

The use of ChatGPT in this study was strictly limited to the response-generating protocol described in the methods section. ChatGPT was not used for data analysis, writing, or any other aspects of the production of this manuscript.

Guarantor

ORCID iDs

Justin H. Pham

Charat Thongprayoon

Jing Miao

Wisit Cheungpasitporn

References

Pereira

Silva

Carvalho

VKS

, et al. Strategies for the implementation of clinical practice guidelines in public health: an overview of systematic reviews. Health Res Policy Syst 2022; 20: 13. 2022/01/26.

Jayasinghe

. Describing complex clinical scenarios at the bed-side: is a systems science approach useful? Exploring a novel diagrammatic approach to facilitate clinical reasoning. BMC Med Educ 2016; 16: 264. 2016/10/12.

Samuriwo

. Interprofessional collaboration-time for a new theory of action? Front Med (Lausanne) 2022; 9: 876715. 2022/04/05.

Bouton

Journeaux

Jourdain

, et al. Interprofessional collaboration in primary care: what effect on patient health? A systematic literature review. BMC Primary Care 2023; 24: 53.

Stephens

William

Lim

L-L

, et al. Complex conversations in a healthcare setting: experiences from an interprofessional workshop on clinician-patient communication skills. BMC Med Educ 2021; 21: 343.

Miller

Singh

Arnold

, et al. Clinical decision-making in complex healthcare delivery systems. 2nd ed. Cambridge, MA: Elsevier, 2020, p.858–864.

Marassi

Fadini

. The cardio-renal-metabolic connection: a review of the evidence. Cardiovasc Diabetol 2023; 22: 95.

Xie

Bao

Jiang

, et al. The association of metabolic syndrome components and chronic kidney disease in patients with hypertension. Lipids Health Dis 2019; 18: 229. 2019/12/29.

Dahm

Raine

Slade

, et al. Shared decision making in chronic kidney disease: a qualitative study of the impact of communication practices on treatment decisions for older patients. BMC Nephrol 2023; 24: 383. 2023/12/22.

10.

Apel

Hornig

Maddux

, et al. Informed decision-making in delivery of dialysis: combining clinical outcomes with sustainability. Clin Kidney J 2021; 14: i98–i113. 2022/01/07.

11.

Bikbov

Purcell

Levey

, et al. Global, regional, and national burden of chronic kidney disease, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet 2020; 395: 709–733.

12.

Shankar

Sparks

. The evolution of social media in nephrology education: a mini-review. Front Nephrol 2023: 3. Mini Review. DOI: https://doi.org/10.3389/fneph.2023.1123969

13.

De Angelis

Baglivo

Arzilli

, et al. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health 2023; 11: 1166120. DOI: https://doi.org/10.3389/fpubh.2023.1166120

14.

Dada

Puladi

, et al. ChatGPT in healthcare: a taxonomy and systematic review. Comput Methods Programs Biomed 2024; 245: 108013. DOI: https://doi.org/10.1016/j.cmpb.2024.108013

15.

Dave

Singh

. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 2023; 6: 1169595. 2023/05/22.

16.

Cascella

Montomoli

Bellini

, et al. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst 2023; 47: 33. 2023/03/05.

17.

Miao

Thongprayoon

Garcia Valencia

, et al. Performance of ChatGPT on nephrology test questions. Clin J Am Soc Nephrol 2024; 19: 35–43. 2023/10/18.

18.

Miao

Thongprayoon

Craici

, et al. How to improve ChatGPT performance for nephrologists: a technique guide. J Nephrol 2024 2024/05/21. DOI: https://doi.org/10.1007/s40620-024-01974-z

19.

Xiao

Meyers

Upperman

, et al. Revolutionizing healthcare with ChatGPT: an early exploration of an AI language model's impact on medicine at large and its role in pediatric surgery. J Pediatr Surg 2023; 58: 2410–2415. 2023/08/07.

20.

Chatterjee

Bhattacharya

Pal

, et al. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop 2023; 10: 28.

21.

McHugh

. Interrater reliability: the kappa statistic. Biochem Med (Zagreb) 2012; 22: 276–282. 2012/10/25.

22.

Benary

Wang

Schmidt

, et al. Leveraging large language models for decision support in personalized oncology. JAMA Network Open 2023; 6: e2343689–e2343689.

23.

Najjar

. Redefining radiology: a review of artificial intelligence integration in medical imaging. Diagnostics (Basel) 2023; 13: 2760. 2023/09/09.

24.

Behrmann

Hong

Meledathu

, et al. Chat generative pre-trained transformer’s performance on dermatology-specific questions and its implications in medical education. J Med Artif Intell 2023; 6: 16.

25.

Du-Harpur

Watt

Luscombe

, et al. What is AI? Applications of artificial intelligence to dermatology. Br J Dermatol 2020; 183: 423–430. 2020/01/22.

26.

Giordano

Brennan

Mohamed

, et al. Accessing artificial intelligence for clinical decision-making. Front Digit Health 2021; 3: 645232. 2021/10/30.

27.

Moazemi

Vahdati

, et al. Artificial intelligence for clinical decision support for monitoring patients in cardiovascular ICUs: a systematic review. Front Med (Lausanne) 2023; 10: 1109411. 2023/04/18.

28.

Sanchez-Martinez

Camara

Piella

, et al. Machine learning for clinical decision-making: challenges and opportunities in cardiovascular imaging. Front Cardiovasc Med 2021; 8: 765693. 2022/01/22.

29.

Zhao

Feng

, et al. Interrater reliability estimators tested against true interrater reliabilities. BMC Med Res Methodol 2022; 22: 232. 2022/08/30.

30.

Emsley

. ChatGPT: these are not hallucinations - they're fabrications and falsifications. Schizophrenia (Heidelb) 2023; 9: 52. 2023/08/20.

31.

Miao

Thongprayoon

Suppadungsuk

, et al. Integrating retrieval-augmented generation with large language models in nephrology: advancing practical applications. Medicina (Kaunas) 2024; 60. 2024/03/28. DOI: https://doi.org/10.3390/medicina60030445

32.

Bai

Tang

, et al. Machine learning to predict end stage kidney disease in chronic kidney disease. Sci Rep 2022; 12: 8377. 2022/05/20.

33.

Yoo

Divard

Raynaud

, et al. A machine learning-driven virtual biopsy system for kidney transplant patients. Nat Commun 2024; 15: 554. 2024/01/17.

34.

Amann

Blasimme

Vayena

, et al. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak 2020; 20: 310. 2020/12/02.

35.

Čartolovni

Tomičić

Lazić Mosler

. Ethical, legal, and social considerations of AI-based medical decision-support tools: a scoping review. Int J Med Inform 2022; 161: 104738. 2022/03/18.

Digital health tools in nephrology: A comparative analysis of AI and professional opinions via online polls

Abstract

Background

Methods

Results

Conclusion

Keywords

Background

Methods

#AskRenal dataset

ChatGPT queries

Quantitative analysis

Qualitative analysis

Results

Dataset characteristics

Inter-rater agreement

Discussion

Limitations

Future directions

Conclusion

Footnotes

Contributorship

Data availability statement

Declaration of conflicting interests

Ethics approval

Funding

Language model use

Guarantor

ORCID iDs

References