Sage Journals: Discover world-class research

Abstract

Large language models (LLMs) such as ChatGPT are entering clinical practice, yet how their clinical reasoning compares with speech-language therapists (SLTs) is not well understood. This comparative multi-case qualitative study used 3 hypothetical vignettes. Ten experienced SLTs (≥10 years) participated in semi-structured interviews, providing assessment, diagnosis and therapy plans for each vignette. ChatGPT-4o was presented with identical, standardized Turkish prompts over five consecutive days to evaluate the model’s temporal consistency in clinical reasoning. All outputs were analyzed with content analysis, and day-to-day consistency of ChatGPT themes was examined. ChatGPT-4o and SLTs showed substantial overlap in core practices such as case history, spontaneous speech analysis, key diagnostic labels, and emphasis on generalization and caregiver involvement. However, SLTs utilized broader, locally normed assessment tools and offered more flexible, individualized and context-sensitive therapy approaches. ChatGPT-4o’s responses were more standardized and showed stable thematic patterns across days, yet they did not reflect the clinical nuance or contextual adaptation observed in SLTs’ reasoning. ChatGPT-4o can approximate expert-like reasoning in structured scenarios and may serve as a clinical decision support aid. Nonetheless, it does not replace experienced SLTs, particularly for culturally grounded, person-specific assessment and intervention planning.

Keywords

ChatGPT speech and language therapy clinical decision-making artificial intelligence qualitative analysis

Introduction

Just as the invention of writing, the development of the printing press, the rise of the steam engine, and the introduction of smartphones into everyday life have created profound transformations throughout human history, the rise of artificial intelligence (AI) is regarded as a similarly pivotal turning point in the 21st century.¹ Artificial intelligence is a broad field that includes computational methods such as machine learning, which in turn support a range of application domains, including natural language processing, speech and language technologies, robotics, and AI-assisted systems.² Moreover, due to its ability to analyze large volumes of data, it can support evidence-based clinical decision-making processes and is therefore rapidly advancing and becoming increasingly important across a wide range of domains, particularly in medical and health services.^3
-7 Recently, it has been reported that AI’s success in medical applications has increased markedly.³ Accordingly, understanding how healthcare professionals can employ AI technologies is of great importance for improving and enhancing healthcare services.²

The rapid expansion of AI in healthcare also extends to language-based generative models. In recent years, Generative Artificial Intelligence (GAI) tools powered by Large Language Models (LLMs) have demonstrated significant utility in healthcare. These systems assist practitioners by automating clinical documentation, simplifying complex medical information for patient education, refining the linguistic structure of scientific writing, and providing interactive scenarios for professional development.^1
-8

ChatGPT and Its Use in Healthcare

The integration of LLMs enables these GAI systems to provide easy accessibility and the capacity to generate contextually relevant, human-like responses suitable for clinical dialogs.⁸ With a level of convenience similar to instant access through smartphones, the widespread availability of Generative AI models-exemplified by ChatGPT, which offers both free and subscription-based professional versions-has significantly impacted modern technology. Its ability to generate human-like responses across a broad range of topics, combined with its capacity to process multimodal inputs such as text, visual, and auditory data, has established it as a prominent tool in the field.⁹ ChatGPT has attracted unprecedented public attention; reports indicate that it reached around 100 million active users by January 2023, making it one of the fastest-growing consumer applications to date.^10,11

While several LLMs offer rapid processing and documentation support, ChatGPT is particularly noteworthy as a primary, highly accessible interface that has set a benchmark for public and professional AI interaction in healthcare.^12,13 Beyond its functional utility in facilitating documentation and preliminary patient explanations, its widespread adoption has catalyzed global discussions on the integration of Generative AI into clinical workflows. As newer versions continue to evolve, diverse medical use cases-such as clinical documentation, patient education/communication, and diagnostic decision support in emergency departments-have rapidly appeared in the literature.^14
-16

However, the risk of generating inaccurate information, the need for external verification of its responses, and its occasional misinterpretation of inputs make its current use in healthcare limited yet valuable in terms of potential.¹⁶ Recent reviews emphasize that ChatGPT performs at only a moderate level across many clinical tasks; that it can produce “hallucinated,” information presented in a highly confident tone, which may lead users to perceive the output as factual even when it contains inaccuracies.¹⁷ Additionally, the lack of transparency in LLM decision-making processes, ambiguity regarding responsibility for erroneous recommendations, and the risk of reproducing biases present in training data-thereby disadvantaging certain patient groups-are identified as key ethical limitations.¹⁸

ChatGPT and Its Use in the Field of Speech and Language Therapy

In recent years, AI has also begun to be used in assessment and therapy processes within the field of speech and language therapy.¹⁹ Multimodal models such as ChatGPT-4o, with their capacity to analyze complex linguistic inputs, have been shown in several studies to offer new possibilities for clinical practice.^20
-22

One such study examined the applicability of AI-based visual generation (DALL-E 2) to develop materials for aphasia assessment and intervention and found that 94.5% of 200 target stimuli accurately represented their core concepts.²³ Similarly, recent studies support the usability of AI-generated images in language assessment processes. For example, images generated using Bing Image Creator (DALL·E 3) for Turkish nouns and verbs produced naming accuracy and response time patterns consistent with classical psycholinguistic findings in neurotypical adults; moreover, they largely met criteria for imageability and clinical usability.²⁴ Taken together, these findings suggest that AI-based visual generation offers significant potential for developing low-cost, rapid, and culturally adaptable materials for speech and language disorders, while also underscoring the need for careful evaluation of model outputs in terms of accuracy and cultural appropriateness.

Consistent with these clinical applications, Birol et al²¹ further demonstrated how ChatGPT-4o could be effectively integrated into assessment, diagnosis, and therapy processes within the SLT context. In this study, ChatGPT’s responses related to language, speech, and swallowing disorders were evaluated by 15 experts in terms of accuracy, comprehensiveness, and appropriateness. ChatGPT was found to provide innovative support in functions such as preparing clinical documents and generating materials; however, improvements were recommended regarding its effectiveness in therapeutic processes. Another study presented the most frequently asked questions related to stuttering to the ChatGPT-4o mini model and had SLTs evaluate the responses in terms of content quality and readability.²⁵ The findings indicated that ChatGPT has promising potential in providing appropriate responses to common stuttering-related questions. However, the study emphasized that generative AI tools are intended solely for educational purposes and should not replace diagnosis or treatment provided by qualified SLTs.²⁵

Nevertheless, reviews summarizing the current literature highlight that research on ChatGPT remains largely in its early stages; that most studies examine the model descriptively or exploratorily within limited contexts such as academic writing, education, or patient communication; and that significant limitations persist due to issues such as information accuracy, reliability of responses, and hallucinatory outputs.^26
-28 Existing studies typically evaluate ChatGPT’s responses through qualitative or quantitative expert ratings, thereby offering valuable insights into content quality. Yet, to understand how ChatGPT might be positioned within clinical decision-making processes, it is necessary to examine not only expert evaluations but also the extent to which the model’s clinical reasoning across the assessment, diagnosis, and therapy planning chain aligns with the decision patterns of experienced clinicians.

This study aims to systematically compare ChatGPT-4o’s assessment, diagnosis, and therapy planning responses to case histories involving speech and language disorders with the responses of speech and language therapists who have at least 10 years of clinical experience. This 10-year threshold was specifically chosen to ensure that the human comparison group represents “expert” clinical reasoning, as a decade of experience is a widely recognized benchmark for achieving professional mastery in healthcare. In doing so, the study examines the stages of the clinical reasoning chain in which ChatGPT-4o converges with expert decisions, the stages in which it diverges, and the degree of internal consistency within its own responses to the same cases. The findings are expected to clarify the validity and safe use boundaries of AI-supported clinical decision-making systems within the SLT field. By directly comparing ChatGPT-4o’s responses to structured hypothetical case scenarios with the decisions of these highly experienced therapists, this study also evaluates the reliability and clinical nuance of AI-generated assessment and therapy planning across identical cases.

Materials and Methods

This study is a comparative, multi-case qualitative study conducted using 3 hypothetical clinical cases: adult stuttering, developmental language disorder (DLD), and speech sound disorder (SSD). The study’s design and reporting follow the Consolidated Criteria for Reporting Qualitative Research (COREQ) guidelines to ensure methodological rigor and transparency.²⁹ The aim of this qualitative, multi-case design is to enable an in-depth and detailed examination of ChatGPT-4o’s clinical reasoning by systematically comparing its outputs with responses from licensed Speech and Language Therapists (SLTs). Each participant was required to have at least 10 years of clinical experience in the assessment, diagnosis, and therapy of the specified disorders. Furthermore, the study evaluates the temporal consistency and variations of ChatGPT-4o’s responses by examining outputs generated across five consecutive days using identical standardized prompts.

Participants

A purposeful sampling strategy was used to recruit licensed Speech and Language Therapists (SLTs) with at least 10 years of clinical experience in the field. Recruitment occurred via professional networks and clinical associations across Türkiye, utilizing a snowball sampling approach to identify and contact highly experienced clinicians with diverse specialization profiles. Ten SLTs (9 females, 1 male) working in diverse clinical and academic contexts participated in the study.

The inclusion criteria required participants to: (a) hold at least a master’s degree in speech and language therapy, (b) be registered and actively practicing, (c) have a minimum of 10 years of professional experience, and (d) provide informed consent to participate. To ensure the specificity and quality of the clinical insights, the following exclusion criteria were applied: (1) having less than 10 years of professional experience, (2) a lack of active clinical practice within the last 2 years, and (3) failure to complete the full interview process for all 3 clinical vignettes. No potential participants or data points were excluded based on these criteria in the final analysis.

Participants represented a variety of professional backgrounds, including university-affiliated clinics, hospitals, private practices, and rehabilitation centers. This diversity in professional settings was intended to capture a broad range of clinical perspectives. While these settings do not represent the entire heterogeneity of the field, they provide a multifaceted view of clinical reasoning across frequent disorder areas with distinct diagnostic and intervention frameworks.

Participants represented 3 primary clinical settings, including university-affiliated clinics, private practices, and rehabilitation centers. Recruitment occurred via professional networks and snowball sampling to identify experienced clinicians across these professional contexts. Each participant completed a brief demographic questionnaire capturing age, gender, years of experience, highest degree, and primary workplace. They were also asked to list the most frequent disorder areas they encounter in their daily clinical practice. The characteristics of the participating SLTs are summarized in Table 1.

Table 1.

Demographic and clinical characteristics of the participants.

Participant Code	Gender	Years of Experience	Highest Degree	Primary Workplace	The Most Frequent Disorder Areas
P1	Male	15	PhD	Private practice	DLD, communication disorders associated with additional disabilities, SSD
P2	Female	16	MSc	Private practice	Fluency Disorders, DLD, SSD
P3	Female	14	MSc	Private practice	DLD, Communication Disorders associated with ASD
P4	Female	19	MSc	Private practice	Language Delay, Fluency Disorders, SSD
P5	Female	14	MSc	Private practice	DLD, Fluency Disorders, Communication Disorders associated with ASD
P6	Female	15	MSc	Rehabilitation center, Private practice	Communication Disorders associated with ASD, Language Delay, SSD
P7	Female	15	MSc	Rehabilitation center, Private practice	Communication Disorders associated with ASD, DLD
P8	Female	17	PhD	University clinic	SSD, Speech disorders associated with cleft lip and palate
P9	Female	19	MSc	Private practice	DLD, SSD, Fluency Disorders
P10	Female	14	Associate Professor	University clinic	Fluency Disorders

DLD: Developmental Language Disorder; SSD: Speech Sound Disorder; ASD: Autism Spectrum Disorder.

Development of Interview Questions and Case Scenarios

Three hypothetical clinical case vignettes-adult stuttering, developmental language disorder (DLD), and speech sound disorder (SSD)-were developed by the researchers, along with a semi-structured interview guide, to elicit and compare responses from both experienced speech-language therapists and ChatGPT-4o regarding assessment, diagnosis, and therapy. These instruments were designed based on current evidence-based clinical guidelines in speech-language pathology. While the vignettes and interview guide were not previously validated as standardized scales, they underwent a pilot-testing phase with 2 independent senior SLTs (not included in the final study sample) to ensure content validity, clinical realism, and linguistic clarity. Minor refinements were made to the phrasing of the prompts and questions based on the pilot feedback. The complete set of vignettes and the semi-structured interview guide are provided as a separate “Supplemental File” for transparency.

The initial case drafts were developed and edited by a primary research team consisting of 2 PhD students and 1 PhD-level researcher, all of whom are licensed Speech and Language Therapists (SLTs). Each vignette was written to reflect typical clinical decision points relevant to its disorder domain (for instance, fluency features and avoidance behaviors in stuttering; receptive–expressive balance and contextual language use in DLD; and phonological processes and auditory discrimination in SSD). Phrases that could directly imply a diagnosis or bias responses toward specific interpretations were intentionally avoided. To maintain neutrality, diagnostic labels such as “severe stuttering” or “phonological delay” were replaced with descriptive clinical data; for example, instead of stating a diagnosis, the vignettes provided objective observations such as “the speaker exhibits involuntary repetitions of initial syllables in 12% of words” or “the child consistently replaces velar stops with alveolar stops.” This approach required both ChatGPT-4o and the expert SLTs to independently synthesize the raw information to reach their clinical conclusions.

To ensure objective validation, these drafts were subsequently reviewed by an independent expert panel comprising 1 Associate Professor, 1 PhD-level SLT, and 3 PhD students who were not involved in the initial development of the vignettes. This panel, also consisting entirely of licensed SLTs, assessed the cases for content coverage, linguistic clarity, and neutrality. The clinical background of both the research team and the independent panel ensured that the vignettes were technically accurate and representative of real-world SLT scenarios. While this domain expertise was essential for creating high-fidelity research materials, potential for subjective bias was minimized by using these vignettes to elicit independent, blind-coded responses from 10 external expert participants, rather than relying on the internal panel’s clinical judgments for the final AI comparison. Following the panel’s feedback, necessary revisions were made and the final versions were constructed for use in data collection.

Finally, a concise, semi-structured interview guide paralleled the vignette design and was organized into 3 neutral prompts per case:

Assessment: planned procedures and rationale;

Diagnosis: most likely diagnosis, differentials, and justification;

Therapy: initial targets, hierarchy/procedures, and generalization strategies.

Procedure

Interview with SLTs

The 3 finalized vignettes were presented in the same fixed order for every participant. Each vignette was standardized in length, ranging between 150 and 200 words, to ensure a comparable level of clinical detail across the different disorder domains. The semi-structured guide from the development phase was used verbatim (Assessment → Diagnosis → Therapy).

Individual interviews were conducted online (MS Teams) by the first author, typically lasting 30 to 45 min. During the sessions, the vignettes were presented visually via the “screen sharing” feature, allowing participants to read the text at their own pace. Simultaneously, the interviewer read the vignettes aloud to ensure full comprehension. Participants provided their professional clinical reasoning orally, which was captured in real-time. The interviewer used minimal, non-leading follow-ups only when clarification was needed (eg, “Could you elaborate?”), as specified in the guide.

All interviews were audio- and video-recorded with participants’ consent and transcribed verbatim. Transcripts were checked for accuracy and anonymized by removing any identifying information.

ChatGPT-4o’s Responses

Model responses were collected under 2 independent laptop-based environments running Windows and macOS. To prevent any prior conversational context or personalization effects, 2 new OpenAI accounts were created specifically for this study, 1 for each operating system environment.

For each vignette, the model was instructed to assume the role of a speech-language pathologist with at least 10 years of clinical experience and to provide detailed responses addressing assessment, diagnosis, and therapy components. The prompts were entered in Turkish to match the linguistic context of the participating clinicians and were identical in wording across all sessions to ensure consistency. The standardized prompt presented to

ChatGPT-4o stated: “Imagine that you are a speech and language therapist with at least ten years of clinical experience. Based on the case vignettes I will share, provide detailed explanations under the following three headings:

1. Assessment: Which assessment tools or procedures would you use in this case? Explain why you would select each and what information you aim to obtain.

2. Diagnosis: What is the most likely diagnosis for this client? Describe the findings or observations that led you to this conclusion.

3. Therapy: What therapy goals and methods would you plan for this client? Explain the process in detail (e.g., specific techniques, session structure, and generalization strategies).

Please respond with the level of detail expected from a clinician with at least ten years of professional experience.

If anything is unclear, please indicate. When you are ready, I will share the first case vignette.”

Each vignette was queried on five consecutive days within each account and environment. Every run began in a fresh session with no retained chat history or contextual memory. Outputs were exported verbatim and systematically labeled with the corresponding case ID, environment (Windows/macOS), date, and run number to facilitate alignment across days and between sources.

Data Analysis

The qualitative data analysis process was carried out systematically and transparently. First, the audio recordings of interviews with 10 Speech and Language Pathologist (SLTs) were transcribed verbatim by the first and second authors, who are licensed SLTs and PhD candidates with expertise in both clinical terminology and qualitative methodology. To ensure the high fidelity and accuracy of the textual data, a cross-check procedure was implemented where each transcript was reviewed against the original audio recordings by a third member of the research team, a senior PhD-level SLT. This resulted in 1438 lines and 38 pages of textual data. Similarly, the responses generated by ChatGPT-macOS and ChatGPT-Windows for the 3 clinical vignettes were converted into text format (641 lines, 26 pages) and incorporated into the analysis.

All transcripts were imported into MAXQDA software and analyzed through conventional content analysis.^30,31 Each transcript was examined line by line, and coding was conducted according to the study’s analytical framework. During the process, the responses of the human participants and the ChatGPT models were coded separately, and 3 overarching themes were defined for each vignette: Assessment, Diagnosis, and Therapy.

In alignment with these themes, the data were organized into detailed thematic categories, enabling a systematic and comparative examination of how clinical reasoning patterns emerged across human and AI-generated responses.

Trustworthiness and Rigor

To ensure analytic rigor and transparency, multiple strategies were applied throughout the data analysis process. First, the content validity of the research instruments (vignettes and interview guide) was established through a pilot-testing phase and expert review by 2 independent senior SLTs. Two researchers independently coded all transcripts, followed by consensus meetings to refine category definitions and resolve discrepancies. Inter-coder agreement was established on a random 20% subset of the data, yielding satisfactory reliability (κ ≥ 0.87).

Furthermore, data saturation was rigorously monitored; recruitment and analysis continued until no new clinical themes emerged, confirming that the sample size (n = 10) was sufficient to capture the depth of the clinical reasoning patterns. All transcripts were anonymized prior to analysis, and a detailed audit trail was maintained to document coding decisions, thematic refinements, and interpretive shifts. Additionally, researcher reflexivity was carefully considered throughout the analysis to minimize interpretive bias.

A multi-level comparative analysis was conducted across primary data sources (human vs. ChatGPT-4o responses) and clinical vignettes (stuttering, DLD, SSD). Additionally, data collection for ChatGPT-4o was performed across 2 distinct technical environments (Windows/Chrome and macOS/Safari) to verify the stability and replicability of the AI’s responses. While the operating system and browser were included in the initial procedural design to rule out any session-specific technical artifacts, our analysis confirmed that these factors had no discernible impact on the content or clinical accuracy of the model’s outputs. Therefore, findings reflected consistent systematic patterns rather than isolated instances or environment-specific variations.

Ethical Approvement

Inclusion criteria required participants to: (a) hold at least a master’s degree in speech and language therapy, (b) be registered and actively practicing, (c) have a minimum of 10 years of professional experience, and (d) provide written informed consent prior to the commencement of the study. The written consent form, approved by the Bahçeşehir University Scientific Research and Publication Ethics Committee, outlined the study’s purpose, the voluntary nature of participation, and the confidentiality of the data. This study was reviewed at the meeting of the Bahçeşehir University Scientific Research and Publication Ethics Committee dated November 6, 2025 (Meeting No: 2025/09; Document No: E-85646034-604.01-115398) and was approved, indicating that it complies with the principles of scientific research and publication ethics.

Results

The findings obtained from the analysis of the data are presented in this section, in which the diagnostic, therapeutic, and assessment processes for each case are compared between the SLT and ChatGPT and supported with direct quotations. To provide a holistic overview before detailing each disorder domain, Table 2 summarizes the primary similarities and differences across the assessment, diagnosis, and therapy themes for Stuttering, DLD, and SSD. This summary serves as a roadmap for the more granular comparisons presented in Figures 1 to 9.

Table 2.

Summary of Clinical Reasoning Patterns Across SLTs and ChatGPT-4o.

Clinical Domain	Shared Clinical Focus (Consensus)	SLT-Specific Emphasis (Human Nuance)	ChatGPT-4o Specific Emphasis (AI Pattern)
Case 1: Stuttering	Case history, spontaneous speech analysis, frequency of disfluency, and therapy planning.	Psychosocial focus: Deep analysis of “Unhelpful Thoughts and Beliefs” (UTBAS) and secondary behaviors.	Standardized Assessment: Suggestion of formal emotional assessment scales and structured clinical tools.
Case 2: DLD	Use of standardized language tests, balance of receptive/expressive language, and intervention goals.	Contextual performance: Evaluation of pragmatic skills and language use in classroom/social settings.	Developmental Markers: Broader focus on general language acquisition milestones and structured coding.
Case 3: SSD	Phonological process analysis, oral-motor examination, and stimulus-specific therapy goals.	Auditory/Sensory: Focus on auditory discrimination, sensory processing, and stimulus-specific errors.	Systematic Screening: Consistent recommendation of formal speech-sound screening tools and error-coding protocols.

Figure 1.

Case I-assessment.

Figure 2.

Case I-diagnosis (Please refer to Figure 1 for the visual interpretation key and coding legend).

Figure 3.

Case I-therapy (Please refer to Figure 1 for the visual interpretation key and coding legend).

Figure 4.

Case II-assessment (Please refer to Figure 1 for the visual interpretation key and coding legend).

Figure 5.

Case II-diagnosis (Please refer to Figure 1 for the visual interpretation key and coding legend).

Figure 6.

Case II-therapy (Please refer to Figure 1 for the visual interpretation key and coding legend).

Figure 7.

Case III-assessment (Please refer to Figure 1 for the visual interpretation key and coding legend).

Figure 8.

Case III-diagnosis (Please refer to Figure 1 for the visual interpretation key and coding legend).

Figure 9.

Case III-therapy (Please refer to Figure 1 for the visual interpretation key and coding legend).

Analysis of the Thematic Consistency of ChatGPT Responses Across Days

When the ChatGPT outputs obtained on five different days were coded in a binary present/absent manner with respect to the pre-specified core themes for the 3 cases (diagnostic formulation, core assessment components, and main intervention approaches), theme-level consistency for the core themes was 97.5% in Case 1 and 100% in Cases 2 and 3. When auxiliary themes (eg, technology-supported approaches, group/family involvement, and the structuring of goals) were also included in the analysis, overall theme-level consistency was calculated as 88% for Case 1, 92% for Case 2, and 100% for Case 3.

Case I (Stuttering)

The responses of the SLT and ChatGPT regarding the diagnostic, therapeutic, and assessment processes for the first case are presented comparatively in Figures 1 to 3.

According to Figure 1, in the assessment process for the first case, both overlapping and distinct assessment methods were used by the SLTs and ChatGPT. Common assessment methods proposed by both SLTs and ChatGPT include taking a case history (8), spontaneous speech analysis (7), analysis of secondary behaviors (4), monitoring blocks (4), calculating the percentage of stuttered syllables (4), and tracking overall fluency severity (4).

However, among the assessment methods suggested only by the SLTs were monitoring disfluency samples (2), the use of formal assessment tools (2), and the use of UTBAS (Unhelpful Thoughts and Beliefs About Stuttering) (1).

P10: “First, if he doesn’t speak – I mean, I first try to obtain some output over 1–2 sessions, to build that kind of ‘telepathic’ connection. If we really can’t get anywhere, then perhaps UTBAS could be administered as a tool to talk things through, or I might ask him to write a piece about his stuttering. I would also ask for this in order to use it as a tool in therapy, to work on how it affects his life.”

On the other hand, among the assessment approaches suggested by ChatGPT were the use of formal tools for emotional assessment (2), the assessment of fluency (2), and formal cognitive-behavioral assessments of stuttering (1).

ChatGPT: “Emotional and Cognitive Assessment: Attitude Scales Related to Stuttering: OASES (Overall Assessment of the Speaker’s Experience of Stuttering): Assesses the psychosocial impact of stuttering on the individual.

Cognitive-Behavioral Assessment:

Attitude Scales Related to Stuttering:

KiddyCAT (for children), CAT (for adults): Measure emotional and cognitive attitudes toward stuttering. Murat’s cognitive distortions such as ‘I must always speak fluently’ should be identified.”

According to Figure 2, for the diagnosis of the first case, the labels stuttering (7) and fluency disorder (4) stand out among the speech and language therapists. A similar pattern is observed in the ChatGPT responses as well; of the 3 outputs produced, 2 use the diagnosis stuttering (2), while 1 uses fluency disorder (1).

P1: “Of course, this person does have a stuttering condition. I don’t focus too much on the percentage of stuttered moments. I carry out an assessment. I look at his normal disfluencies, whether these lead to avoidance, whether he changes words, whether there is a tense posture, and based on these I go through a diagnostic process. Afterwards, following a stuttering assessment, by looking at his overall behavioural attitudes and his stuttering pattern, I can say that there is a stuttering condition.”

ChatGPT: “Looking at Murat’s history, symptoms of stuttering that began in childhood (around age 4) and developed over time are observed. In particular: Core stuttering behaviours: syllable repetitions, long blocks, word-substitution strategies. Secondary behaviours: avoiding eye contact, bodily tension. Cognitive components: increasing reactions to disfluencies, perfectionism. Impact on quality of life: it influences even his choice of profession.”

According to Figure 3, in the therapeutic approaches proposed for the first case, there are strategies that substantially overlap as well as diverge between the SLTs and the ChatGPT responses. Common therapy approaches suggested by both SLTs and ChatGPT include the accompaniment of psychotherapy (8), desensitization (7), fluency-shaping techniques (6), a modification approach (5), and cognitive-behavioral therapy (3).

ChatGPT: “Stuttering Modification Techniques: These techniques teach the individual to manage and cope with their stuttering: Cancellation: After completing a word during which stuttering occurred, the word is repeated, but this time produced in a more controlled way. Pull-out: The person becomes aware while they are stuttering and tries to continue speaking in a more fluent manner. Preparatory Set: When the person senses that stuttering may occur, they plan the word in advance and say it in a more fluent way.”

P10: “I would probably start with desensitisation and then work on strengthening his communication skills a bit, helping him to speak more. Once we obtain a sufficient amount of spoken output and his communication skills are strengthened, then, depending on his and my preference, if we agree, we could work a bit more on his speech within certain agreed goals. In that case, I think it would be beneficial to proceed mainly with a modification-based approach. ACT could be added, ACT components.”

However, among the therapy methods suggested only by the SLTs were an additional component of cognitive-behavioral therapy (1), components of ACT (Acceptance and Commitment Therapy) (1), and work on strengthening communication skills (1).

On the other hand, among the therapy approaches suggested only by the ChatGPT responses were transfer and generalization work (2), technology-supported therapy with DAF (Delayed Auditory Feedback) (1), and techniques to reduce avoidance behaviors (1).

ChatGPT: “Technology-Supported Therapies: DAF (Delayed Auditory Feedback): Technological devices used to help control speaking rate and reduce disfluencies. Mobile applications or digital speech therapy tools.”

Case II (Developmental Language Disorders)

The responses of the SLTs and ChatGPT regarding the diagnosis, therapy, and assessment processes for the second case are presented comparatively in Figures 4 to 6.

As shown in Figure 4, both similarities and differences were observed between SLTs and ChatGPT in the assessment of the second case. The assessment tools commonly used by both SLTs and ChatGPT included the Turkish Early Language Development Test (TELD 3-TR), case history, natural language sample analysis, analysis of pragmatic language skills (2), APT (Turkish Articulation and Phonology Test) (1), AS (Articulation Subtest-APT) (1), ADS (Auditory Discrimination Subtest-APT) (1), CLT-TR (Crosslinguistic Lexical Task-Turkish version) (1), and the TİFALDİ (Turkish Inventory for Expressive and Receptive Language Development) (1).

The assessment methods suggested only by ChatGPT were more focused on international, standardized language tests, connected speech analysis, and vocabulary assessment. Codes appearing exclusively on the ChatGPT side include connected speech analysis (2), PLS-5 (Preschool Language Scale- Fifth Edition) (2), Turkish Communicative Development Inventory (TİGE) (2), vocabulary analysis (1), vocabulary tests (1), and EOWPVT (Expressive One-Word Picture Vocabulary Test) (1).

According to Figure 5, broadly similar patterns were observed between SLTs and ChatGPT in the diagnosis of the second case. The majority of SLTs described the case as a developmental/specific language disorder (8), while 2 therapists (2) indicated an expressive language disorder. ChatGPT showed a comparable distribution: 2 responses (2) classified the case as a developmental/specific language disorder, whereas 1 response (1) proposed a diagnosis of expressive language disorder.

According to Figure 6, the common therapy approaches proposed for the second case by both SLTs and ChatGPT include collaboration with the family and teacher (9), play-based therapy (8), interactive book reading (6), expanding the child’s vocabulary (5), generalization work (4), engaging in structured activities (4), planning an interaction-focused therapy process (4), and supporting peer interaction (2).

P6: “I would ask the family to send me videos. I do work with the child myself, of course, but educating the family is also one of my main goals. Regarding school, depending on which school the child attends, I usually get in touch with the teacher. And I always ask about the child’s interactions with peers, because I only see the child individually and cannot observe them in a group setting. I ask the teachers how the child behaves in the group. Usually they are either very quiet or they are the children who show behavioural problems, or they just keep to themselves, maybe have one or two close friends and stay on their own. First I try to find out whether the child has any behavioural problems. Because if there is a communication problem in the group, especially if the child cannot use grammatical morphemes and the others cannot understand them, any bullying or negative behaviours will show up in the group setting.”

ChatGPT: “Learning through play: While playing with toy cars, verbs such as ‘go’, ‘stop’ and ‘fast’ can be used. . . .” “Peer interaction should be encouraged: The child should be supported to communicate by playing with same-age peers.”

However, among the therapy methods suggested only by SLTs were a phonological approach (4), psychologist support (4), a DIR/Floortime-based therapy process (3), the use of natural language techniques (3), and continuing with traditional articulation therapy (1).

P1: “There is probably a risk that the child is also experiencing difficulties in their phonological system. With these children, we focus on phonological structures and strengthen the phonological system. In the end, there will likely be some residual articulation problems. We then correct the articulation issues one by one and conclude the therapy process. The likelihood of a cognitive problem is low, and such a condition has not been identified anyway.”

P7: “For that reason, in addition to speech and language therapy, I would want to know how the family responds to this issue, in what ways they support the child, and how they reinforce the child’s strengths. If necessary, I would also want them to receive some form of psychological support – for example, play therapy between the parent and child or another type of psychological intervention that the child might need.”

P9: “With funny sound repetitions, like making the car go ‘vroom,’ for example, and because I work on the floor using Floortime and play-based methods, I structure the play around my language goals and then simply leave the toys in the room, always following the child’s lead. Because when the child chooses themselves, they feel as if they have chosen and worked on it on their own. It doesn’t feel like something imposed on them, and I gain a lot from that.”

Among the therapy approaches suggested only by ChatGPT were the use of words in meaningful contexts (2), modeling (2), and creating opportunities for expressive language (1).

ChatGPT: “Meaningful and functional word teaching should be provided. New words should be modelled during daily activities and play. For example, while working in the kitchen, action words such as ‘I’m stirring’ and ‘I’m cutting’ can be taught. . . .”“Modelling: The therapist teaches the target words by repeating and emphasising them. . . .”“Increasing Speech Initiations: Opportunities are created for Ozan to express himself. The therapist uses modelling to help the child feel more comfortable expressing himself.”

Case III (Speech Sound Disorders)

According to Figure 7, there are substantially overlapping as well as diverging approaches between SLTs and ChatGPT in the assessment processes for the third case. The assessment methods commonly used by both SLTs and ChatGPT include taking a case history (6), administering auditory discrimination tests (5), analyzing spontaneous speech (4), conducting phonological process analysis (3), oral–motor examination (3), analyzing error types and consistency (3), phoneme recognition tests (2), analysis of speech in context (2), and the Ankara Articulation Test (AAT) (2).

ChatGPT: “Detailed Case History: Pregnancy and birth history. Key milestones in speech and language development. The family’s observations regarding client’s speech intelligibility and sound errors. . . .” “Phonological Process Analysis: His speech is recorded and analysed. It is examined whether phonological processes have been suppressed in an age-appropriate manner. The phonological processes that are still present are identified. . . .” “Error Types and Consistency: It is evaluated whether the errors are consistent and how they change across different contexts.”

P8: “A detailed case history, an oral–motor examination, SST (Turkish Articulation and Phonological Test) or AAT, and a natural language sample – I forgot to mention it a moment ago, but these could also be considered for the previous case. The Intelligibility in Context Scale can be used to see how the family perceives the child’s speech – how intelligible they think it is, a scale that can be completed by the parents. Although no vowel errors are reported here and the processes seem to be more developmental, if the therapist has observed inconsistent errors, they can also carry out an assessment focusing on inconsistency. This would again be to rule out consistent vs. inconsistent phonological disorder. Conducting a stimulability assessment is also important here, as it is a procedure we need in order to determine therapy targets.”

Among the assessment methods suggested only by SLTs were the SST (7) and the Turkish Early Language Development Test (TELD 3-TR) (1).

By contrast, among the assessment approaches suggested only by ChatGPT were the Minimal Pairs Test (1), target–actual production comparison (1), GFTA-3 (Goldman-Fristoe Test of Articulation) (1), Turkish phonological process tests (1), and articulation and phonology tests (1).

ChatGPT: “Minimal Pairs Test: It is assessed whether the client can auditorily distinguish similar sounds (for example, /t/ and /k/).” . . . “Target vs. Actual Production Comparison: It is recorded which sounds the child uses instead of the intended target sounds. . . .” “Turkish Phonological Process Tests: These assess whether the phonological processes are age-appropriate. . . .” “GFTA-3 (Goldman-Fristoe Test of Articulation): Can be used to evaluate speech sounds.”

The responses of the SLTs and ChatGPT regarding the assessment, diagnosis and therapy, processes for the third case are presented comparatively in Figures 8 and 9.

According to Figure 8, there is a high degree of consistency between SLTs and ChatGPT in the diagnostic evaluations for the third case. The vast majority of SLTs classified the case as a phonological disorder (8), and ChatGPT likewise proposed a diagnosis of phonological disorder (2).

ChatGPT: “Based on that case, a diagnosis of phonological disorder can be considered. This diagnosis is characterised by an age-inappropriate delay or deviation in the child’s ability to organise and use speech sounds. In the case description, it is stated that the client continues to use phonological processes such as fronting, stopping, and gliding of liquids, and has difficulty auditorily discriminating certain phonemes. This indicates that the phonological disorder involves both production and auditory discrimination components.”

According to Figure 9, the therapy methods commonly recommended by both groups for the third case include auditory discrimination training (10), the minimal pairs method (7), generalization work (7), collaboration with the family (7), reducing stopping processes (7), reducing gliding processes (7), reducing fronting processes (6), articulation therapy (3), developing phonological awareness skills (3), and motor exercises (2).

However, among the therapy methods suggested only by SLTs were the maximal oppositions approach (1), the minimal oppositions approach (1), and collaboration with the teacher (1).

P6: “I do phonological awareness activities with them. I use minimal pairs, maximal pairs. If there are many sound errors, I work on sounds that are completely unrelated to each other and focus on helping the child generalise them. I largely leave the generalisation to the parents. I always work in collaboration with the teacher as well, because the teacher may already have been guided about what to do. Is there anything written about that here? No. But in such cases, the teachers have usually been guided. And since they have been guided, I actually work more with the teacher on how they handled the word lists, which words we worked on, and what path I followed. At these ages, schools in Türkiye do a lot of phonological awareness activities anyway, so the information I provide is very useful for the teachers. They also use it with other children, and I continue in that way.”

On the other hand, among the therapy approaches suggested only by ChatGPT were sound manipulation activities (1), therapy targeting the reduction of phonological processes (1), and auditory listening activities (1). These findings suggest that ChatGPT places greater emphasis on technically focused and structured approaches within the therapy process.

ChatGPT: “Sound Manipulation Activities: The child creates a new word by changing the first sound in a word. For example, the word ‘bal’ can be turned into ‘tal’ by changing the initial sound to /t/. Sound Manipulation Activities: The child creates a new word by changing the first sound in a word. For example, the word ‘bal’ can be turned into ‘tal’ by changing the initial sound to /t/. . . Auditory Listening Activities: The therapist plays recordings of correct and incorrect productions and helps the child notice the difference. For example, when the words ‘kap’ and ‘tap’ are presented, the child is asked to identify which one is correct.”

Discussion

In this study, the consistency of responses provided by ChatGPT-4o regarding assessment, diagnosis, and therapy processes based on specific case histories with the responses given by SLTs working in the field are compared. The findings obtained from the study indicate that ChatGPT displays approaches similar to therapists in providing standardized information, decision-making and managing structured processes. However, the results also show that ChatGPT falls behind human experts in areas requiring clinical experience, individualized (case-specific) and multi-faceted diagnostic assessment. This supports the prevailing view in the literature that AI is a complementary and supportive technology designed to lighten the therapist’s workload and support decision-making mechanisms, rather than a tool that can replace the therapist. This is reflected in our data, where ChatGPT primarily generated structured, literature-based and standardized recommendations, while SLTs incorporated individualized, context-dependent, and experience-driven clinical reasoning. These findings suggest that AI contributes to organizing clinical processes and offering preliminary guidance, whereas final clinical judgments remain grounded in human expertise.

With respect to the consistency of ChatGPT responses across days, these findings indicate that, at least at the level of broad clinical themes, ChatGPT-4o behaves in a highly reproducible way across days for the same case vignettes. In other words, while the exact wording or the specific examples suggested by the model may vary, the core diagnostic formulation, key assessment components and main intervention approaches are largely preserved. This pattern is partly in line with previous work showing moderate to high repeatability of ChatGPT when the same clinical or examination questions are posed multiple times, even though item-by-item agreement is not always perfect. Importantly, this consistency was observed across both system environments (Windows and macOS), suggesting that the operating system did not have a discernible impact on the model’s clinical reasoning outputs.³²

According to the findings of the current study, both ChatGPT-4o and SLTs use common fundamental assessment tools-such as taking case histories, analyzing spontaneous speech, and auditory discrimination-when evaluating speech and language disorders. This finding aligns with the work of Birol et al,²¹ which noted that ChatGPT-4.0 demonstrates high accuracy and appropriateness in report writing, clinical decision support, and creating assessment materials. It can be stated that ChatGPT versions and SLTs generally use similar diagnostic categories in the current study. Similarly, as noted in a review by Balo et al,²⁰ AI algorithms show promise for diagnostic classification across many areas, from swallowing disorders to voice disorders. The results of another study also provide a foundation for advancing AI supported diagnostic tools adapted to varied linguistic environments, thereby strengthening early intervention approaches within pediatric speech pathology.³³

On the other hand, in this study it is observed that SLTs adopt a broader perspective when making a diagnosis, considering multiple diagnostic possibilities based on clinical experience, whereas AI tends to adhere more strictly to standard tests found in the literature. This tendency may be related to the nature of large language models, which are trained predominantly on large-scale, publicly available datasets, including a substantial proportion of English-language scientific literature. As a result, AI systems may prioritize widely established, standardized assessment frameworks over context-specific, culturally embedded clinical reasoning. This may also help explain why, despite the Turkish context of the cases, the model occasionally suggested non-Turkish assessment tools. This situation is consistent with the results of a systematic mapping study in which it is emphasized that while AI is successful in the screening, it cannot fully replace speech and language therapists in the diagnosis, where data must be interpreted to reach a final decision.³⁴ Correspondingly, the findings of another study showed that ChatGPT is an additional source to support patients to find suitable clinical services and make informed decisions.³⁵

Another study drew attention to a similar distinction, stating that while AI is successful in analyzing measurable data such as acoustic features, it remains more limited in multi-dimensional clinical assessment.³⁶ Consequently, the finding that SLTs in this study viewed cases from a wider perspective supports the conclusion that AI lacks the human flexibility required to interpret potentially complex clinical pictures. As in Dronkers et al³⁷ study, ChatGPT and Llama show restrictions in accurately suggesting the proper treatments for a complex disorder like bilateral vocal fold paralysis (BVFP).

The finding that both AI and SLTs place common importance on generalization studies in therapy and collaboration with the client’s caregiver is thought to stem from AI’s capacity to scan evidence-based information in the literature. However, the most distinct difference revealed in the current study is that while SLTs prefer individualized techniques, AI focuses more on technology-supported and structured techniques.^38
-40

In another recent study it is noted that SLTs tend to use AI to reduce administrative burdens and produce materials such as personalized stories and visuals.⁴⁰ Nevertheless, the structured suggestions offered by AI cannot always meet cultural and individual needs. A recent study on this subject reported that ChatGPT experiences limitations in creating therapy materials (particularly those specific to culture and language) and can make erroneous suggestions involving phonological impossibilities in languages like Turkish.²¹ In another study regarding stuttering, Saeedi and Bakhtiar²⁵ stated that even when AI responses are correct, they are sometimes overly complex and do not always fully align with clinical observation. The fact that SLTs in this study used a wider range of flexible, appropriate, and individualized (person-specific) techniques while AI lagged in this regard may be attributed to AI’s limited contextual adaptation. This is reflected in the data, where SLTs incorporated context-sensitive practices such as collaboration with teachers and caregivers and tailoring therapy based on individual needs, whereas ChatGPT predominantly suggested more structured and technique-driven activities, such as sound manipulation and auditory listening tasks. These differences highlight the model’s tendency to rely on standardized approaches rather than adapting to the specific clinical and environmental context of the client.

The finding in this study that SLTs adopt individual assessment methods based on clinical observation and experience, while AI suggests standard tests, supports the warning by Green¹⁹ that computer algorithms should not be confused with clinical observation. While AI is quite successful at processing big data to reveal patterns, it faces challenges regarding transparency and clinical trust.²² Furthermore, as stated in the study by Azevedo et al³ on aphasia rehabilitation, AI currently provides support mostly in classification and diagnosis. Rather than acting as a tool to replace therapeutic interaction, it assumes the role of an assistant that saves time for the therapist. Despite a positive outlook and belief that AI tools are supportive for clinical practice, Austin et al⁴¹ indicate that ChatGPT and other AI tools are rarely used by SLTs and students for clinical goals, generally limited to administrative activities.

Montazeri et al¹⁶ also drew attention to AI’s lack of emotional intelligence and empathy, emphasizing that the bond established with the patient and the ability to understand the patient’s emotional state cannot be easily mimicked AI. Consistent with this perspective, our findings suggest that AI’s reliance on standard and structured methods limits its ability to integrate extra-data variables-such as the case’s immediate psychosocial state, motivation, and environmental factors-into clinical observation. This limitation may partly explain why AI systems struggle to capture the nuanced, context-dependent, and emotionally informed aspects of clinical decision-making. Even though the cases provided to both AI and SLTs were in Turkish, ChatGPT’s suggestions of non-Turkish assessment tools (eg, GFTA-3, PLS-5, EOWPVT etc.) raise concerns regarding its reliability. This may be related to the nature of its training data, which predominantly consists of large-scale, publicly available datasets with a strong representation of English-language research literature. As a result, the model may prioritize widely recognized international assessment tools over context-specific or locally validated instruments.

Limitation

This study has several limitations. First, the sample consisted of a limited number of SLTs from a single country, which may restrict the generalizability of the findings. Second, the comparison between human clinicians and AI was based on written responses to brief case vignettes involving only 3 cases; real-world, longitudinal assessment and therapy processes were not observed. Third, only a single general-purpose large language model (ChatGPT-4o, accessed in 2025) was evaluated, so the results cannot be generalized to other AI tools or future model versions. Fourth, the clinical vignettes and interview guide used for data collection were researcher-developed rather than formally validated standardized instruments. While these tools underwent a pilot-testing phase with independent experts to ensure clinical realism and content validity, this may still limit the psychometric consistency of the findings. Finally, the consistency and appropriateness of the responses were examined mainly through qualitative analysis; it is recommended that future studies complement this approach with quantitative indicators of reliability and clinical impact.

Conclusions

In conclusion, the findings of this study, consistent with current literature, reveal that ChatGPT and similar AI tools hold potential as strong clinical decision support systems in speech and language therapy. AI stands out as a time-saving tool, particularly in reporting, decision-making and standard assessment processes. However, it is evident that it cannot reach the competence of SLTs in the human, individualized, case-specific, and cultural aspects of therapy and assessment. It is believed that the most efficient clinical outcomes can be achieved when the structured and literature-based framework offered by AI is combined with the clinical experience and flexibility of SLTs. It is suggested that future studies focus on how AI can be transformed from a tool that merely suggests standard tests into a more inclusive model capable of understanding cultural and individual differences and constructing guidelines for its appropriate use. Additionally, it should be noted that current chatbots and large language models are general-purpose systems rather than tools specifically designed for speech and language therapy, which may partly explain their limitations in domain-specific clinical reasoning. Future developments in domain-adapted or specialized AI systems may help address these limitations.

Supplemental Material

sj-docx-1-inq-10.1177_00469580261445316 – Supplemental material for Artificial Intelligence in Speech and Language Therapy: A Qualitative Comparative Analysis of Clinical Applications and Outcomes

Supplemental material, sj-docx-1-inq-10.1177_00469580261445316 for Artificial Intelligence in Speech and Language Therapy: A Qualitative Comparative Analysis of Clinical Applications and Outcomes by İbrahim Can Yaşa, Muhsin Dölek, Seda Eyilikeder Tekin, Ayşe Serra Kaya, Pınar Akgün, Sakine Deniz Yılmaz and Selin Tokalak in INQUIRY: The Journal of Health Care Organization, Provision, and Financing

Supplemental Material

sj-docx-2-inq-10.1177_00469580261445316 – Supplemental material for Artificial Intelligence in Speech and Language Therapy: A Qualitative Comparative Analysis of Clinical Applications and Outcomes

Supplemental material, sj-docx-2-inq-10.1177_00469580261445316 for Artificial Intelligence in Speech and Language Therapy: A Qualitative Comparative Analysis of Clinical Applications and Outcomes by İbrahim Can Yaşa, Muhsin Dölek, Seda Eyilikeder Tekin, Ayşe Serra Kaya, Pınar Akgün, Sakine Deniz Yılmaz and Selin Tokalak in INQUIRY: The Journal of Health Care Organization, Provision, and Financing

Footnotes

Acknowledgements

We would like to thank all the therapists who agreed to participate in this study.

ORCID iD

İbrahim Can Yaşa

Ethical Considerations

The study was conducted in accordance with the Declaration of Helsinki and approved by the Bahçeşehir University Clinical Research Ethics Committee (Approval No: E-85646034-604.01-115398; Date: 06.11.2025).

Consent to Participate

Written informed consent was obtained from all participants involved in the study. The consent form, approved by the Bahçeşehir University Ethics Committee, outlined the study’s purpose, the voluntary nature of participation, and the confidentiality of data.

Author Contributions

Conceptualization I.C.Y. Methodology I.C.Y., M.D., S.E.T. Formal analysis M.D., S.E.T., A.S.K., P.A., S.D.Y., S.T. Investigation M.D., S.E.T., A.S.K., P.A., S.D.Y., S.T. Writing original draft M.D., S.E.T., A.S.K., P.A., S.D.Y., S.T. Writing review and editing I.C.Y. Supervision I.C.Y. Project administration I.C.Y. All authors approved the final version of the manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Supplemental Material

Supplemental material for this article is available online.

References

Mesko

The ChatGPT (Generative Artificial Intelligence) revolution has made artificial intelligence approachable for medical professionals. J Med Internet Res. 2023;25:e48392.

Chen

Decary

Artificial intelligence in healthcare: an essential guide for health leaders. Health Manage Forum. 2020;33:10-18.

Azevedo

Kehayia

Jarema

Le Dorze

Beaujard

Yvon

How artificial intelligence (AI) is used in aphasia rehabilitation: a scoping review. Aphasiology. 2024;38:305-336.

Jiang

Zhi

, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017;2:230-243.

Liu

Faes

Kale

, et al. A comparison of deep learning performance against healthcare professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1:e271-e297.

Rajpurkar

Chen

Banerjee

Topol

EJ.

AI in health and medicine. Nat Med. 2022;28:31-38.

Topol

EJ.

High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44-56.

De Angelis

Baglivo

Arzilli

, et al. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health. 2023;11:1166120.

Chow

JCL

Sanders

. Impact of ChatGPT on medical chatbots as a disruptive technology. Front Artif Intell. 2023;6:1166014.

10.

Eysenbach

The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ. 2023;9:e46885.

11.

Hulman

Dollerup

Mortensen

, et al. Chatgpt- versus human-generated answers to frequently asked questions about diabetes: a turing test-inspired survey among employees of a Danish diabetes center. PLoS One. 2023;18:e0290773.

12.

Neha

Bhati

Shukla

Amiruzzaman

ChatGPT: transforming healthcare with AI. AI. 2024;5:2618-2650.

13.

Sallam

ChatGPT utility in healthcare education, research, and practice: a systematic review. Healthcare. 2023;11:1374.

14.

Hoppe

Auer

Strüven

Massberg

Stremmel

ChatGPT with GPT-4 outperforms emergency department physicians in diagnostic accuracy: retrospective analysis. J Med Internet Res. 2024;26:e56110.

15.

Kharko

McMillan

Hagström

, et al. Generative artificial intelligence writing open notes: a mixed methods assessment of the functionality of GPT 3.5 and GPT 4.0. Digit Health. 2024;10:1-14.

16.

Montazeri

Galavi

Ahmadian

What are the applications of ChatGPT in healthcare: gain or loss?

Health Sci Rep. 2024;7:e1878.

17.

Bélisle-Pipon

JC.

Why we need to be careful with LLMs in medicine. Front Med. 2024;11:1495582.

18.

Haltaufderheide

Biller-Andorno

Elger

Ethical challenges of large language models in healthcare. J Med Ethics. 2024; 661-675.

19.

Green

JR.

Artificial intelligence in communication sciences and disorders: Introduction to the forum. J Speech Lang Hear Res. 2024;67:4157-4161.

20.

Balo

Ökte

Selvi Balo

Artificial intelligence in assessment and intervention of speech and language disorders: a literature review. Eur J Res. 2025;11:1235-1243.

21.

Birol

Çiftci

Yılmaz

Çağlayan

Alkan

Is there any room for ChatGPT AI bot in speech-language pathology?

Eur Arch Otorhinolaryngol. 2025;282:3267-3280.

22.

Georgiou

GP.

Transforming speech-language pathology with AI: opportunities, challenges, and ethical guidelines. Healthcare. 2025;13:2460.

23.

Pierce

JE.

AI-generated images for speech pathology-an exploratory study. Am J Speech Lang Pathol. 2024;33(1):443-451.

24.

Özdemir

İyigün Uzunöz

Kayhan Aktürk

Tunçer

. Having the best of both worlds: naming performances of neurotypical individuals through AI-generated images. Aphasiology. 2026;40(1):187-207.

25.

Saeedi

Bakhtiar

Assessing the response quality and readability of ChatGPT in stuttering. J Fluency Disord. 2025;85:106149.

26.

Fatima

Shafique

Alam

Ahmed

TKF

Mustafa

MS.

ChatGPT in medicine: a cross-disciplinary systematic review. Med. 2024;103:e39250.

27.

Ghasemi

Amiri

Galavi

Advantages and limitations of ChatGPT in healthcare: a scoping review. Health Sci Rep. 2025;8:e71219.

28.

van Dis

EAM

Bollen

Zuidema

van Rooij

Bockting

CL.

ChatGPT: five priorities for research. Nature. 2023;614:224-226.

29.

Tong

Sainsbury

Craig

Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. Int J Qual Health Care. 2007;19(6):349-357.

30.

Creswell

, ed. Educational Research: Planning, Conducting, and Evaluating Quantitative Research. 5th ed. Pearson; 2015.

31.

Creswell

Poth

, eds. Qualitative Inquiry & Research Design: Choosing Among Five Approaches. 5th ed. Sage; 2018.

32.

Funk

Hoch

Knoedler

, et al. ChatGPT’s response consistency: a study on repeated queries. Eur J Investig Health Psychol Educ. 2024;14:657-668.

33.

Turki

AF.

Machine learning-based identification of phonological biomarkers for speech sound disorders in Saudi Arabic-speaking children. Diagnostics. 2025;15:1401.

34.

Tbaishat

Al-Shafei

Odeh

The role of AI in the diagnosis of speech and language disorders: a systematic mapping study. Digit Health. 2025;11:20552076251379769.

35.

Saeedi

Rong

Assessing the performance and Reliability of ChatGPT in answering patients’ questions on voice disorders across time. J Voice. Published online November 21, 2025. doi: 10.1016/j.jvoice.2025.10.044.

36.

Lee

Kim

, et al. Exploring voice acoustic features associated with cognitive status in Korean speakers: a preliminary machine learning study. Diagnostics. 2024;14:2837.

37.

Dronkers

EAC

Geneid

al Yaghchi

Lechien

JR.

Evaluating the potential of AI chatbots for treatment decision-making in vocal fold paralysis. J Voice. 2025;39:871-881.

38.

Juefei-Xu

. Generative AI for therapy? Opportunities and barriers for ChatGPT in speech-language therapy. In: Proceedings of ICLR, 2023.

39.

Gallano

Giglio

Ferre

Artificial Intelligence in speech-language pathology and dysphagia: a review from Latin American perspective and pilot test of LLMs for rehabilitation planning. J Voice. Published online April 30, 2025. doi: 10.1016/j.jvoice.2025.04.010

40.

Lewis

Dangol

Suh

Olszewski

Fogarty

Kientz

. Exploring AI-based support in speech-language pathology for culturally and linguistically diverse children. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 2025, pp.1-19.

41.

Austin

Benas

Caicedo

Imiolek

Piekutowski

Ghanim

Perceptions of artificial intelligence and ChatGPT by speech-language pathologists and students. Am J Speech Lang Pathol. 2025;34(1):174-200.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.02 MB

0.00 MB

0.02 MB