Sage Journals: Discover world-class research

Abstract

Objective

This study aimed to evaluate the effectiveness of ChatGPT in improving the standardization of ultrasound report writing among junior physicians.

Methods

A total of 20 junior residents involved in ultrasound work from January to August 2025 were randomly divided into an experimental group (ChatGPT-assisted group) and a control group (traditional training group). The experimental group utilized ChatGPT for real-time assistance in report writing and review, while the control group followed a conventional training model. The quality of report writing was compared between the two groups, focusing on completeness of content, accuracy of descriptions, use of professional terminology, and diagnostic accuracy.

Results

The quality scores for ultrasound reports in the experimental group were significantly higher than those in the control group across gastrointestinal, gynecological, and thyroid systems (all P < 0.05).

Conclusion

ChatGPT can effectively enhance the standardization of ultrasound report writing among junior physicians, demonstrating significant value for clinical education and training.

Keywords

ChatGPT junior physicians ultrasound report writing medical intelligence

Background

In the context of continuous innovation in medical technology, ultrasound medicine has emerged as a core support technology in modern clinical diagnosis and treatment due to its significant advantages, such as non-invasiveness, ease of operation, and real-time dynamic imaging.¹ As an important part of the medical talent hierarchy, junior physicians face practical challenges in the standardized writing of ultrasound reports, including insufficient professional experience and deviations in the execution of standardized processes.

Junior physicians encounter the following challenges in ultrasound report writing:

Insufficient Standardization of Terminology: The field of ultrasound medicine encompasses a vast array of specialized terminology. Junior physicians, with limited clinical experience, often lack a systematic understanding of the semantic connotations and application norms of fundamental acoustic concepts—such as “hyperechoic,” “hypoechoic,” and “isoechoic”—across different anatomical sites and pathological conditions. This knowledge gap can lead to misapplication of terms or ambiguous semantics, thereby diminishing the diagnostic value of ultrasound reports. For example, when assessing hepatic lesions, junior physicians may struggle to accurately differentiate between hemangiomas and hepatocellular carcinoma based on echogenic features, often resorting to vague terms like “abnormal echogenicity,” which severely impacts the precision of clinical decision-making.

Lack of Logical Structure in Descriptions: Ultrasound imaging contains multidimensional diagnostic information. During the process of writing structured reports, junior physicians frequently fail to follow standardized information organization strategies, resulting in poor logical coherence of the report’s content. This is evident when lesion descriptions do not adhere to the scientific ordering principles based on clinical risk levels, or when they present adjacent tissue relationships before core lesion features, disrupting the narrative flow and clarity of information. For instance, in cases of multiple nodular lesions in the breast, presenting the locations and sizes of nodules in a disordered manner without highlighting high-risk nodules can significantly reduce the efficiency of clinical assessment.

Inadequate Management of Key Information: Junior physicians demonstrate significant shortcomings in information filtering and integration during ultrasound report writing. On one hand, they often lack sensitivity to critical indicators for assessing malignancy risk, such as the aspect ratio, margin morphology, and microcalcifications of thyroid nodules, leading to omissions of vital diagnostic information and severely undermining the clinical reference value of the reports.² On the other hand, they may present excessive, non-diagnostic information about normal tissues, creating distractions that significantly diminish the report’s readability and diagnostic efficacy, thereby affecting subsequent treatment strategies such as regular follow-up, further examinations, or surgical interventions.³

This study aims to address the issues of imprecise terminology usage, disorganized logical descriptions, and omissions or redundancies of key information faced by junior physicians in ultrasound report writing. We explore the practical effectiveness of ChatGPT in standardizing report writing. By analyzing ChatGPT’s role in correcting and optimizing report terminology, structuring logical frameworks, and reviewing the completeness of key information, we seek to determine whether it can effectively enhance the standardization, professionalism, and practicality of ultrasound reports by junior physicians, thereby providing higher-quality reference data for clinical diagnosis and treatment.

Materials and methods

Study subjects

A total of 20 junior physicians specializing in ultrasound were selected. Inclusion criteria were: young physicians with 1-3 years of experience in ultrasound diagnostics, possessing a basic understanding of ultrasound principles and image recognition skills, and capable of independently performing simple ultrasound examinations. Exclusion criteria included any prior experience with AI-assisted ultrasound report writing and those with severe communication or cognitive impairments.

Grouping method

The junior physicians were paired and divided into an experimental group and a control group, with 10 participants in each group. Physicians in the experimental group utilized ChatGPT for assistance while writing ultrasound reports, while the control group used traditional independent writing methods. Randomization and fairness in grouping were ensured.

Intervention measures

Experimental group

To ensure that all resident physicians in the experimental group standardized their use of the GPT-4-turbo-2025-08-26 and to minimize variability during the intervention, this study provided a structured training session comprising 1 hour of theoretical instruction and 2 hours of hands-on practice, conducted collaboratively by an experienced ultrasound physician (with 12 years of clinical experience) and the hospital’s IT engineer. All 10 resident physicians in the experimental group participated in this training to guarantee consistency in the intervention.

The theoretical instruction covered the following key areas: (1) Technical Principles of the GPT-4-turbo-2025-08-26: This section explained the natural language processing capabilities and the logic behind medical terminology optimization; (2) Scope and Limitations of the Tool: It clarified the appropriate contexts for using the tool, highlighting its strengths in terminology correction, logical structuring, and key information reminders, while emphasizing that it cannot replace physicians’ interpretations of imaging or clinical diagnostic decisions; (3) Data Security and Privacy Protocols: This part stressed the importance of adhering to the hospital’s data protection policies.

Step-by-Step Process Demonstration: The training guided participants through the entire AI-assisted report writing process, including: (1) Drafting the initial report in the ultrasound system; (2) Utilizing standardized prompts to optimize the initial report; (3) Accessing a side-by-side comparison interface (original draft vs. AI revision) to interpret modifications highlighted in red; (4) Cross-validating AI suggestions with ultrasound images and clinical data; (5) Manually adjusting and finalizing the report.

Given the substantial performance and reliability differences across releases, all participants in this study used the identical snapshot GPT-4-turbo-2025-08-26 to ensure procedural consistency and full reproducibility.^4,5 Before the experiment began, sixty model reports were uploaded to ChatGPT to prime the system, twenty for each of the gastrointestinal, thyroid and gynecologic domains. The physician input a standardized prompt: “Please optimize this ultrasound report according to clinical ultrasound reporting guidelines, including correcting unprofessional terminology, organizing descriptive logic in a structured manner, supplementing missing key diagnostic information, and removing redundant content. Ensure the revised report complies with the professional norms of [digestive/gynecological/thyroid] ultrasound.”ChatGPT generated a revised report within 10–30 seconds, highlighting modifications (e.g., terminology corrections, logical rearrangements, supplementary information) with bold formatting for the physician’s reference. Throughout each iterative exchange they could vary the length and focus of the submitted text at will, without following any standardized prompt format. Junior physicians in the experimental group could use ChatGPT for assistance during ultrasound examinations and report writing. This included inputting preliminary report content into ChatGPT to receive suggestions for terminology correction, logical structuring, and key information supplementation. Only the draft text of ultrasound reports was pasted into the ChatGPT dialog box; no images or DICOM data were uploaded. Consequently, the AI’s role was restricted to language refinement and did not include image interpretation.

Control group

Physicians in the control group completed ultrasound report writing independently according to traditional training methods, referencing ultrasound textbooks, diagnostic guidelines, and internal report writing standards. They were allowed to consult relevant ultrasound diagnostic literature and previous report templates but could not use any AI-assisted tools.

Both groups maintained consistent working conditions, patient sources, and received the same routine business guidance throughout the study. In accordance with standard departmental practice, participants in both groups cross-validated all ChatGPT-generated text against the original images and clinical history, and amended the reports accordingly, before final submission.

Observation indicators and data collection methods

Ultrasound report standardization scoring

The standardization capabilities of ultrasound report writing were assessed for both groups of junior physicians (Supplement 1). A uniform standardized case set was used, including two gastrointestinal reports, two thyroid reports, and two gynecological reports. Two experienced ultrasound physicians (with ≥10 years of experience) scored the reports blindly, and the average score was taken as the final score for each report. The accuracy and response time for identifying critical cases were also evaluated.

A standardized scoring sheet for ultrasound reports was developed, assessing completeness of content, accuracy of descriptions, appropriateness of terminology, and diagnostic accuracy, with a total score of 100 points. Completeness of content accounted for 40 points, requiring coverage of all key information from the ultrasound examination, such as organ size, morphology, echogenicity, presence of abnormal lesions, and lesion characteristics. Accuracy of description, corresponding to the actual images without errors or omissions, accounted for 25 points. Appropriate and accurate use of professional terminology was worth 15 points. Diagnostic accuracy, including correctness of the diagnostic conclusion and any omissions, accounted for 20 points.

Feedback from junior physicians on ChatGPT

A questionnaire was designed to survey the experimental group physicians regarding their satisfaction and experiences with ChatGPT in assisting ultrasound report writing. The questionnaire included satisfaction levels with ChatGPT’s functionalities (rated as very satisfied, satisfied, neutral, dissatisfied, and very dissatisfied), perceived helpfulness for report writing (rated as extremely helpful, fairly helpful, somewhat helpful, not very helpful, and not helpful at all), issues encountered during use, and suggestions for improvement. The questionnaire was administered to the experimental group immediately after the study, achieving a 100% effective response rate.

Statistical methods

Data were analyzed using SPSS 27.0 software. The Shapiro-Wilk test and histogram were employed to assess the normality of the distribution of continuous data, and Levene’s test was used to analyze the homogeneity of variances. Normally distributed data were expressed as xˉ±s, with comparisons between groups performed using t-tests. Non-normally distributed data were expressed as M (Q1, Q3), and comparisons were conducted using the Mann-Whitney U test. Categorical variables were expressed as counts (percentages), with group comparisons performed using chi-square tests. A p-value of <0.05 was considered statistically significant.

Results

Baseline characteristics of subjects

As shown in Table 1, baseline characteristics of the two groups of junior physicians, including age, gender, education level, training duration, and ultrasound knowledge assessment scores, were compared after grouping. The results indicated no significant differences between the two groups across all parameters, confirming their comparability (all p > 0.05).

Table 1.

Baseline characteristics of the experimental and control groups.

	Experimental group (n=10)	Control group (n=10)	P value
Age	27.80 ± 1.48	27.70 ± 1.42	0.879
Gender			1.00
Male	5 (50.00%)	5 (50.00%)
Female	5 (50.00%)	5 (50.00%)
Education			1.00
Bachelor’s Degree	5 (50.00%)	5 (50.00%)
Master’s Degree	5 (50.00%)	5 (50.00%)
Training Duration			1.000
1 year	3 (30.00%)	3 (30.00%)
2 years	3 (30.00%)	3 (30.00%)
3 years	4 (40.00%)	4 (40.00%)

Scores for standardization of ultrasound reports

Table 2 and Figure 1 demonstrate that the mean total score for ultrasound reports in the experimental group was 82.47 ± 2.42, significantly higher than the control group’s score of 66.13 ± 4.41 (p < 0.001). The standardization score for the experimental group was 84.50 (82.00, 87.00), again significantly better than the control group’s score of 65.00 (59.75, 66.50), indicating that ultrasound reports written with the assistance of ChatGPT were markedly superior in standardization.

Table 2.

Overall performance of the experimental and control groups in generating ultrasound reports.

Subject	Experimental group (n=60)	Control group (n=60)	t/u value	Mean difference/Mediandifference (95% CI)	P value
Completeness of Content (40 points)	33.72 ± 1.57	29.97 ± 1.71	12.509	3.750 (3.156 - 4.344)	<0.001
Accuracy of Descriptions (25 points)	22.00 (21.00,22.00)	17.50 (15.25,18.00)	38.50	4.00 (4.00 - 4.00)	<0.001
Use of Professional Terminology (15 points)	11.30 ± 1.33	8.58 ± 1.89	9.105	2.717 (2.126-3.308)	<0.001
Diagnostic Accuracy (20 points)	16.00 (15.00,16.00)	10.00 (10.00,11.00)	1.00	6.00 (5.00 – 6.00)	<0.001
Total Score (100 points)	84.50 (82.00,87.00)	65.00 (59.75,66.50)	2.50	17.00 (15.00 – 18.00)	<0.001
Time Taken (minutes)	10.00 (9.00,10.00)	13.00 (12.00,14.00)	209.50	3.00 (3.00 - 4.00)	<0.001

Figure 1.

Violin plots comparing ultrasound reports generated by the experimental group (n = 60) and the control group (n = 60) in terms of content completeness (a), descriptive accuracy (b), use of professional terminology (c), diagnostic accuracy (d), total score (e), and time required for report generation (f).

In terms of completeness of content, the experimental group scored an average of 33.72 ± 1.1, while the control group scored 29.97 ± 1.71 (P < 0.001). For accuracy of descriptions, the experimental group achieved a score of 21.35 ± 0.90 compared to the control group’s score of 17.17 ± 1.57 (P < 0.001). Regarding the use of professional terminology, the experimental group averaged 11.30 ± 1.33, while the control group scored 8.58 ± 1.89 (P < 0.001). For diagnostic accuracy, the experimental group scored an average of 15.95 ± 1.10, in contrast to the control group’s score of 10.42 ± 1.29 (P < 0.001). Each sub-score indicated that the experimental group outperformed the control group. Individual analyses of the six reports also showed that the experimental group excelled in all aspects (Supplement 2).

Comparison of report completion time

The average time to complete reports in the experimental group was 9.77 ± 1.06 minutes, while the control group took 13.15 ± 1.72 minutes. Independent samples t-test results indicated that the difference in report completion times between the two groups was statistically significant (P < 0.001), suggesting that with the assistance of ChatGPT, junior physicians in the experimental group were able to complete ultrasound report writing more quickly. Individual analyses of the six reports also revealed that the experimental group consistently finished faster than the control group (Supplement 1).

Feedback from junior physicians on ChatGPT

In the satisfaction survey regarding ChatGPT’s functionalities, 60% of respondents were very satisfied, 30% satisfied, 10% neutral, and 0% dissatisfied or very dissatisfied. Seventy percent believed that ChatGPT provided significant assistance in report writing, while 20% felt it was fairly helpful, and 10% thought it offered some assistance; none reported that it was unhelpful.

The main issues reported by junior physicians during usage included that some content generated by ChatGPT did not fully match the actual ultrasound images, requiring further verification and modification (60%); descriptions of certain complex cases were not accurate or detailed enough (30%); and occasional internet connectivity issues affected usability (10%). Additionally, residents suggested improvements such as enhancing ultrasound image recognition capabilities, optimizing the medical knowledge database, and increasing the accuracy and relevance of generated content.

Discussion

ChatGPT, based on the Transformer architecture, has built a powerful language model through unsupervised learning on a vast amount of text data.⁶ In the medical field, ChatGPT has already shown potential applications across various domains.^7–10 In medical education, it can assist medical students by answering their queries and helping them understand complex medical concepts.^11–13 In ultrasound medicine, studies have begun to explore the use of ChatGPT for generating ultrasound examination reports. Its advantages include the ability to quickly create standardized report frameworks, prompt junior physicians on key information to include, reduce omissions and errors in report writing, and improve writing efficiency, allowing junior physicians to devote more time and energy to analyzing and diagnosing ultrasound images.

ChatGPT possesses strong natural language processing capabilities.¹⁴ Trained on extensive text data, it can accurately understand human natural language expressions. In the context of ultrasound report writing, junior physicians can describe ultrasound findings in everyday language, and ChatGPT can swiftly convert these into formal, professional ultrasound terminology. For instance, when a physician describes, “There’s a round bright thing in the liver,” ChatGPT can accurately translate this to “A round hyperechoic nodule is observed within the liver parenchyma,” significantly reducing the difficulty junior physicians face in memorizing and applying complex terminology, thus ensuring the accuracy and professionalism of the report language.

The extensive knowledge base of ChatGPT encompasses various medical fields, including typical and atypical ultrasound manifestations of diseases in ultrasound diagnostics, diagnostic criteria, and key points for differential diagnosis. Junior physicians can consult ChatGPT at any time while writing reports or assessing cases. For example, when encountering an uncommon superficial mass in ultrasound imaging, physicians can inquire about the ultrasound features of the mass, potential pathological types, and differentiation methods from other superficial lesions. ChatGPT can swiftly provide detailed and authoritative answers, acting like an experienced expert available online, thereby offering solid knowledge support to junior physicians.

ChatGPT’s contributions to standardizing ultrasound report writing manifest in several ways: Firstly, terminology Correction and Optimization: After completing a draft of an ultrasound report, junior physicians can input the text into ChatGPT to request checks and corrections for professional terminology. ChatGPT identifies erroneous expressions based on medical terminology standards and common practices in ultrasound diagnostics, providing correct suggestions. For example, it might correct “hyperechoic area in the liver” to “high echogenic area within the liver,” while explaining the precise meanings and applicable contexts of different terms, thereby helping junior physicians deepen their understanding of terminology. Long-term use can significantly enhance their ability to use terminology correctly, aligning report language with professional standards. Secondly, logical Structure Organization: ChatGPT can analyze the logical structure of report content and reorganize descriptive sequences according to ultrasound report writing standards. It can prioritize multiple lesions by size and importance, describing core features first before detailing surrounding tissue relationships and other relevant information, resulting in clear and well-structured reports. For instance, in a thyroid ultrasound report, ChatGPT can organize descriptions of nodules by listing larger, characteristic nodules first, followed by their location, size, echogenic structure, echogenicity, shape, margins, and echogenic foci, enhancing the clinical applicability of the report.^15,16 Thirdly, completeness of Key Information Review: By learning from ultrasound report templates and common diagnostic points, ChatGPT can check for the completeness of key information in reports. It prompts junior physicians to include important ultrasound features that may be overlooked, such as blood flow signals in the assessment of thyroid nodules, while helping to eliminate redundant descriptions of normal tissues. For example, if a physician fails to describe the biliary system in a liver ultrasound report, ChatGPT will remind them to include this information, ensuring that the report is comprehensive and contains necessary diagnostic details, thereby improving report quality and diagnostic value. However, ChatGPT is not without its flaws; its medical knowledge may experience some lag in updates, and its accuracy in judging complex and rare cases still requires further enhancement.¹⁷

Notably, our study identified no instance in which ChatGPT misled junior radiologists into a diagnostic error, delayed recognition of a critical finding, or prompted an inappropriate management decision. This reassuring outcome is best explained by the mandated verification protocol: every AI-generated statement had to be reconciled with the original images and clinical history, and amended as necessary, before the report was finalized. This systematic workflow of human AI collaboration followed by human cross-check ensured that potential inaccuracies, including hallucinations, were intercepted and corrected, preventing any erroneous content from being incorporated into the signed report.

Although ChatGPT assistance significantly improved short term report quality scores, its long term educational impact must be interpreted cautiously. From a clinical-practice perspective, junior residents already demonstrate limited proficiency in recognizing critical sonographic features, such as microcalcifications within thyroid nodules or marginal irregularities within breast masses. If ChatGPT generates a report that is terminologically precise and structurally polished, yet contains factual errors, for instance, stating microcalcifications are present when none exist, the trainee may be reassured by the apparent professionalism of the text and accept the erroneous conclusion without further verification. This type of occult mistake is far more hazardous than a report that is stylistically imperfect but factually accurate, the latter is easily detected and corrected by a supervising physician, whereas the former can lead to missed or incorrect diagnoses, for example, upgrading a benign nodule to a high-risk category and prompting unnecessary interventions. Educationally, an exclusive emphasis on using ChatGPT to improve report standardization, without parallel training on validating AI-generated facts, risks entrenching a form-over-verification mindset. Such an approach may weaken independent judgement and the habit of cross-checking imaging features, undermining the fundamental teaching goal of fostering autonomous diagnostic reasoning. By providing immediate terminology corrections and structural optimizations, the model reduces opportunities for active information retrieval, error recognition, and self correction, potentially diminishing the depth of knowledge encoding and subsequent retention. In contrast, participants in the conventional group, deprived of instantaneous external support, repeatedly consulted guidelines, templates, and images, underwent a longer cognitive processing cycle, and thereby may have established more stable conceptual networks and durable clinical reasoning pathways. Owing to the single time point design of the present study, delayed testing or post intervention reassessment without AI was not performed; consequently, we cannot determine whether the experimental cohort would sustain comparable performance once ChatGPT was withdrawn, nor can we exclude the risk of cognitive dependence. Future investigations should incorporate longitudinal follow up, repeating report quality evaluations at four and twelve weeks after training under AI free conditions, and employ retention rates and error recurrence as outcome measures to objectively appraise skill durability and independence. Additionally, mandatory reflective steps—requiring users to document the rationale for each revision—can be embedded within the AI assisted workflow to foster deeper processing and to ensure that gains in efficiency are accompanied by genuine competency development.

Despite the promising findings, this study has certain limitations. It was conducted in a single hospital with a relatively small sample size, which may affect the generalizability and applicability of the results. The short duration of the study did not allow for a long-term follow-up assessment of the effects of ChatGPT usage on the report writing abilities of junior physicians. Furthermore, this research focused solely on the standardization and time efficiency of report writing without delving into the clinical application value, such as its impact on diagnostic accuracy. Moreover, the present study was restricted to a single, cross-sectional assessment with a standardized case battery administered at the conclusion of the training block. Future work should therefore incorporate longitudinal learning-curve analyses to quantify how AI-supported instruction influences residents’ progressive mastery of ultrasonography in everyday clinical practice. Finally, ChatGPT is just one of the more representative large language models. Future studies could expand the sample size, conduct multi-center research, and extend the duration to comprehensively evaluate the overall effectiveness of various large language models in standardizing ultrasound report writing at different environments.

Conclusion

With the real-time assistance of GPT-4-turbo-2025-08-26, ultrasound reports generated by junior physicians can achieve a level of quality comparable to that of senior physicians, characterized by standardized terminology, clear structure, and comprehensive key information, while reducing writing time by nearly one-fourth. This advancement provides a valuable quality control tool for clinical practice education and holds significant promise for expansion into multi-center and multi-modal imaging applications.

Supplemental material

Supplemental material - Leveraging ChatGPT to Assist Junior Physicians in Standardizing Ultrasound Report Writing: A Promising Tool

Supplemental material for Leveraging ChatGPT to assist junior physicians in standardizing ultrasound report writing: A promising tool by Li Zhu, Huohu Zhong, Jing Bai, Xiaoying Wang, Peifeng Huang and Zhenhong Xu in Digital health.

Supplemental material

Supplemental material - Leveraging ChatGPT to Assist Junior Physicians in Standardizing Ultrasound Report Writing: A Promising Tool

Footnotes

Acknowledgements

The research is supported by the Science and Technology Bureau of Quanzhou.

ORCID iD

Huohu Zhong

Ethical considerations

The study was approved by the Second Affiliated Hospital of Fujian Medical University (2025-127).

Consent for publication

The study was reviewed and approved by the senior authors’ institutional review board and approved for publication.

Author contributions

L.Z.: Conceptualization; H.Z.: Methodology; J.B.: Validation; X.W.: Formal analysis; X.Z.: Investigation; P.H.: Resources; H.Z.: Data Curation; L.Z.: Writing - Original Draft; H.Z.: Writing - Review & Editing; J.B.: Visualization; Z.X.: Supervision, Project administration, L.Z.: Funding acquisition. All authors read and approved the final manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Science and Technology Bureau of Quanzhou [2025QZNY085].

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The datasets used and analyzed in the current study are available from the corresponding author, [Z.X.], upon reasonable request.*

Supplemental material

Supplemental material for this article is available online.

References

Goodman

Patrinely

Jr Osterman

, et al. On the Cusp: Considering the Impact of Artificial Intelligence Language Models in Healthcare. Med 2023; 4: 139–140. https://doi.org/10.1016/j.medj.2023.02.008

Xia

Hua

Mei

, et al. Clinical Application Potential of Large Language Model: A Study Based on Thyroid Nodules. Endocrine 2025; 87: 206–213. https://doi.org/10.1007/s12020-024-03981-3

Liu

Wright

Patterson

, et al. Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J. Am. Med. Inf. Assoc 2023; 30: 1237–1245. https://doi.org/10.1093/jamia/ocad072

Zhou

Schellaert

Martínez-Plumed

, et al. Larger and more instructable language models become less reliable. Nature 2024; 634(8032): 61–68. https://doi.org/10.1038/s41586-024-07930-y

Shumailov

Shumaylov

Zhao

, et al. AI models collapse when trained on recursively generated data. Nature 2024; 631(8022): 755–759. https://doi.org/10.1038/s41586-024-07566-y, Epub 2024 Jul 24. Erratum in: Nature. 2025 Apr;640(8058):E6.

Biswas

. Role of chat GPT in public health. Ann Biomed Eng 2023; 51(5): 868–869. https://doi.org/10.1007/s10439-023-03172-7

Sarraju

Bruemmer

Van Iterson

, et al. Appropriateness of cardiovascular disease prevention recommend dations obtained from a popular online chatbased artifcial intelligence model. JAMA 2023; 329: 842–844. https://doi.org/10.1001/jama.2023.1044

Yeo

Samaan

, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin. Mol. Hepatol 2023; 29: 721–732. https://doi.org/10.3350/cmh.2023.0089

Haver

Ambinder

Bahl

, et al. Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 2023; 307: e230424. https://doi.org/10.1148/radiol.230424

10.

Jiang

Xia

Yang

, et al. Transforming free-text radiology reports into structured reports using ChatGPT: A study on thyroid ultrasonography. Eur. J. Radiol 2024; 175: 111458. https://doi.org/10.1016/j.ejrad.2024.111458

11.

Kung

Cheatham

Medenilla

, et al. Performance of Chatgpt on Usmle: Potential for AI-Assisted Medical Education Using Large Language Models. PLoS Digit. Health 2023; 2: e0000198. https://doi.org/10.1371/journal.pdig.0000198

12.

Zhang

Kamel Boulos

. Generative AI in medicine and healthcare: promises, opportunities and challenges. Future Internet 2023; 15: 286. https://doi.org/10.3390/fi15090286

13.

Sng

GGR

Tung

JYM

Lim

DYZ

, et al. Potential and pitfalls of ChatGPT and natural-language artificial intelligence models for diabetes education. Diab. Care 2023; 46: e103–e105. https://doi.org/10.2337/dc23-0197

14.

Haupt

Marks

. AI-generated medical advice–GPT and beyond. JAMA 2023; 329(16): 1349–1350. https://doi.org/10.1001/jama.2023.5321

15.

Tessler

Middleton

Grant

, et al. Acr Thyroid Imaging, Reporting and Data System (Ti-Rads): White Paper of the Acr Ti-Rads Committee. J. Am. Coll. Radiol 2017; 14: 587–595. https://doi.org/10.1016/j.jacr.2017.01.046

16.

Velez-Florez

Dougherty

Ginader

, et al. Hands-on Ultrasound Training for Radiology Residents: The Impact of an Ultrasound Scanning Curriculum. Acad. Radiol 2023; 30: 2059–2066. https://doi.org/10.1016/j.acra.2023.01.027

17.

Rahsepar

Tavakoli

Kim

GHJ

, et al. How AI responds to common lung cancer questions: ChatGPT vs Google bard. Radiology 2023; 307(5): e230922. https://doi.org/10.1148/radiol.230922