Abstract

This is a response to a published article on “Artificial Intelligence can Facilitate Application of Risk Stratification Algorithms to Bladder Cancer Patient Case Scenarios.” 1 This study is an intriguing test of ChatGPT’s capacity to use National Comprehensive Cancer Network (NCCN) guidelines to risk stratify patient scenarios based on theoretical assumptions about non-muscle-invasive bladder cancer (NMIBC). Although the findings provide valuable insights into artificial intelligence (AI) capabilities, some elements of the study warrant further investigation. One significant restriction is the use of hypothetical patient scenarios, which may not accurately reflect the complexity and diversity of real-world clinical patients. Patient data in clinical practice are dynamic and varied, and risk classification frequently necessitates the integration of several clinical, psychological, and diagnostic data sources. The use of basic theoretical situations may fail to represent the complexities of treatment decisions, limiting the findings’ generalizability.
Furthermore, the technique might be clarified in terms of the nature of the alarms sent to GPT-3.5 and GPT-4, how these scenarios were formed, and what precise criteria were used to determine “correct” or “wrong” stratifications. The lack of precise information on the accuracy evaluation criteria, particularly what constitutes “overestimation of risk,” leads to ambiguity when evaluating the results. Furthermore, while the study evaluated performance in a number of scenarios (both textual and visual), it would be good to know how the model’s replies compared to human expert opinions. Or using the gold standard established for NMIBC risk stratification? Without this comparison, it is difficult to determine how well an AI model compares to a medical professional.
In fact, AI is valuable in enhancing proper bladder cancer patient management. It could be useful not only as tool for compiling data but also in refining diagnosis as well as stratification. For example, AI can assist in variant histologies or subtypes identification, as outlined in the World Health Organization (WHO) 2022 criteria, which is core part in diagnosis. Nevertheless, there is still a challenge in diagnosis these entities during TURB, even for experienced pathologists, which can highlight the potential for AI to improve accuracy in this area.2-5
This study raises the question of how well an AI system such as ChatGPT can reliably duplicate complex medical instructions and incorporate them into tailored patient care. Given that GPT-4 performed well in textual contexts but demonstrated discrepancies while dealing with moderate-risk NMIBC, this raises serious questions regarding AI’s trustworthiness in clinical settings, especially when patient cases are more sensitive. Should AI be used to make decisions in high-risk sectors like cancer, despite its limitations in accuracy? Also, how can AI models be constantly trained and enhanced to account for new changes in clinical practice or forthcoming research?
Future avenues for this research could involve testing AI models in more complex real-world patient scenarios with a broader range of variables, such as comorbidities, demographic characteristics, and insufficient or contradicting diagnostic data. Furthermore, the employment of AI in conjunction with human supervision may provide significant benefits for clinical practice. Artificial intelligence could assist health care practitioners in risk stratification, but the final choice will most likely be made collaboratively. In terms of innovation, future research may look into the integration of multimodal inputs (for example, text, photos, and patient history) as well as the creation of real-time learning systems that respond to user feedback and evolving clinical guidelines. Furthermore, more thorough validation studies are required to test AI’s ability to enhance risk assessment, treatment recommendations, and prognosis in\complex clinical contexts.
Footnotes
Acknowledgements
NA
Ethics approval and consent to participate
NA
Consent for publication
NA
Author contributions
HP—design of the work, drafting, writing, approval for submission.
VW—design of the work, drafting, supervising, approval for submission.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
AI declaration
Data availability statement
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
