Assistive,not autonomous: Generative artificial intelligence in head and neck cancer care

Abstract

Objectives

To synthesize current evidence on the clinical applications of generative artificial intelligence (GenAI), particularly large language models (LLMs), in head and neck oncology, with a focus on translational readiness, clinical safety, and real-world applicability.

Methods

A scoping review was conducted using structured searches of PubMed and Scopus for studies published between January 1, 2020, and December 15, 2025. Search strategies combined controlled vocabulary and free-text terms related to generative AI and head and neck oncology. Eligible studies evaluated GenAI/LLMs in tasks including TNM staging, treatment planning, tumor board support, and patient education. Non-GenAI and non-oncologic studies were excluded. Following duplicate removal, records underwent title and abstract screening with full-text review of potentially relevant studies. Due to heterogeneity in study design, outcomes, and reporting, findings were synthesized qualitatively.

Results

Evidence remains early-stage and heterogeneous, dominated by simulation-based and small cohort studies with limited real-world validation. GenAI performs best in structured, language-based tasks such as clinical documentation, case summarization, and patient education. Moderate agreement with clinical standards is reported for TNM staging and guideline navigation in common scenarios, with reduced reliability in complex cases. In tumor board settings, GenAI supports summarization but produces variable treatment recommendations. Patient-facing outputs are generally readable but may lack accuracy or completeness. Common limitations include hallucination, omission of key clinical factors, and overgeneralization.

Conclusion

GenAI shows promise as an assistive tool in head and neck oncology but is not yet suitable for autonomous clinical decision-making. Prospective, workflow-integrated evaluation and standardized validation are needed before safe clinical adoption.

Keywords

generative artificial intelligence large language models head and neck oncology,TNM staging multidisciplinary tumor boards clinical decision support,Patient education

Introduction

Generative artificial intelligence (GenAI), driven largely by advances in transformer-based large language models (LLMs), has rapidly transitioned from experimental technology to a visible presence across clinical medicine. In oncology, GenAI systems are increasingly explored for tasks ranging from clinical decision support and guideline interpretation to documentation, patient education, and research synthesis.^1–4 This rapid uptake has been accompanied by growing scrutiny. Recent oncology-wide reviews describe both the transformative potential of GenAI and its intrinsic risks, including hallucinations, guideline drift, uncalibrated probability estimates, and unresolved medico-legal responsibility.^5–8

Despite this expanding literature, head and neck oncology remains notably under-represented in existing syntheses. This omission is consequential. Head and neck cancer (HNC) care is distinguished by exceptional clinical complexity; deeply subsite-specific anatomy, laterality-dependent treatment decisions, granular and evolving Tumor–Node–Metastasis (TNM) staging systems, and therapeutic trade-offs that directly affect speech, swallowing, airway protection, and appearance.^9–12 Decision-making is inherently multidisciplinary, relying on close coordination among surgeons, radiation and medical oncologists, radiologists, pathologists, speech-language pathologists, and supportive care teams. In this setting, even modest inaccuracies or overconfident recommendations may propagate through workflows with disproportionate clinical consequences.^9–12

Crucially, GenAI systems differ fundamentally from traditional diagnostic AI. Whereas diagnostic models are trained and validated to optimize specific predictive endpoints, GenAI systems generate probabilistic language outputs based on learned statistical patterns.^13,14 Their apparent “reasoning” reflects linguistic coherence rather than causal or mechanistic understanding. As a result, fluent outputs may convey a misleading sense of authority, a phenomenon increasingly described as synthetic confidence (defined as the tendency of GenAI systems to generate fluent, authoritative-sounding outputs regardless of underlying accuracy) especially when responses are delivered in professional clinical language.^14–16 In oncology, where uncertainty is common and nuance matters, this epistemic mismatch is not trivial.^13–16

Beyond these general limitations, head and neck oncology poses distinctive cognitive and anatomical challenges that make it an especially rigorous test case for GenAI. Staging frequently depends on subtle descriptors, paraglottic fat invasion, pre-epiglottic space involvement, skull base foraminal extension, retropharyngeal nodal spread, where seemingly minor misinterpretations can dramatically alter treatment plans and functional outcomes. Treatment decisions regularly require balancing oncologic control against airway preservation, swallowing integrity, voice outcomes, and long-term quality of life. These complex trade-offs, central to HNC practice, are rarely represented in the textual patterns that LLMs learn from and thus expose the limits of GenAI’s ability to engage with uncertainty, nuance, and competing clinical priorities.^9,11,17

At the same time, early head and neck–specific studies have begun to emerge. These include evaluations of GenAI for TNM staging from real-world clinical records, tumor board case summarization, and generation of patient education materials.^11,18–20 Collectively, these studies suggest that GenAI may perform adequately, or even well, in certain constrained, language-centric tasks, while remaining unreliable for autonomous clinical decision-making.^11,18–22 However, the evidence is fragmented, heterogeneous in methodology, and often interpreted without sufficient attention to the unique risks of the head and neck oncology context.

To date, no comprehensive review has synthesized this emerging literature through a head and neck–specific lens. To our knowledge, this is the first review to systematically collate GenAI applications specific to head and neck oncology and interpret them through a clinical risk–stratified framework. This gap limits the field’s ability to distinguish where GenAI offers genuine value from where its limitations pose unacceptable risk. Accordingly, this review critically examines the peer-reviewed evidence on GenAI in head and neck oncology, emphasizing its performance in TNM staging and treatment support, its influence within multidisciplinary tumor boards, its role in patient education, and its ethical, governance, and safety implications. Rather than advocating uncritical adoption, this work seeks to articulate a principled framework for responsible integration, positioning GenAI as an assistive technology that may augment, rather than replace, expert clinical judgment in one of oncology’s most complex domains.

Methods

This scoping review synthesizes early applications of GenAI, primarily LLMs, in head and neck oncology, with an emphasis on translational readiness, clinical safety, and real-world applicability. A structured literature search was conducted in PubMed and Scopus for articles published between January 1, 2020, and December 15, 2025. Search strategies combined controlled vocabulary (MeSH terms in PubMed) and free-text terms related to generative AI (e.g., “generative artificial intelligence,” “large language models,” “ChatGPT,” “GPT-4”) and head and neck oncology (e.g., “head and neck neoplasms,” “HNSCC,” and subsite-specific malignancies). Searches were limited to peer-reviewed, English-language human studies. Records were exported to a reference management software (Zotero), and duplicates were identified and removed prior to screening. Following duplicate removal, titles and abstracts of all unique records were screened for relevance, and potentially eligible studies underwent full-text review. Studies meeting predefined inclusion criteria were selected for qualitative synthesis. A detailed summary of the search strategy and screening approach is provided in Table 1. Reference lists of relevant articles were hand-screened to identify additional contextual publications.

Table 1.

Search strategy.

Item	Specification
Date of search	December 2025 (final update prior to manuscript preparation)
Databases searched	MEDLINE (PubMed); Scopus
Search strategy overview	Structured searches combining generative artificial intelligence (GenAI) terms with head and neck oncology terms. Database-specific syntax and controlled vocabulary were applied where appropriate.
PubMed search string	(“generative artificial intelligence” [Title/Abstract] OR “large language model*” [Title/Abstract] OR “LLM” [Title/Abstract] OR “ChatGPT” [Title/Abstract] OR “GPT-4” [Title/Abstract] OR “GPT-4o” [Title/Abstract] OR “GPT” [Title/Abstract]) AND (“head and neck cancer” [Title/Abstract] OR “head and neck neoplasms” [MeSH Terms] OR “head and neck oncology” [Title/Abstract] OR “HNSCC” [Title/Abstract] OR “squamous cell carcinoma” [Title/Abstract] OR “oropharyngeal cancer” [Title/Abstract] OR “laryngeal cancer” [Title/Abstract] OR “hypopharyngeal cancer” [Title/Abstract] OR “oral cavity cancer” [Title/Abstract] OR “thyroid cancer” [Title/Abstract])
PubMed results (n)	142
Scopus search string	TITLE-ABS-KEY (“generative artificial intelligence” OR “large language model” OR “LLM” OR “ChatGPT” OR “GPT-4” OR “GPT-4o” OR “GPT”) AND TITLE-ABS-KEY (“head and neck cancer” OR “head and neck oncology” OR “head and neck neoplasm” OR “HNSCC” OR “squamous cell carcinoma” OR “oropharyngeal cancer” OR “laryngeal cancer” OR “hypopharyngeal cancer” OR “oral cavity cancer” OR “thyroid cancer”) AND PUBYEAR > 2019 AND PUBYEAR < 2026
Scopus results (n)	175
Total records identified	317
Duplicate removal	Records were exported to Zotero and duplicates were removed
Duplicates removed (n)	93
Unique records screened	224
Screening process	Titles and abstracts were reviewed for relevance to GenAI applications in head and neck oncology
Study selection approach	Potentially eligible articles underwent full-text review, and studies were retained based on thematic relevance, clinical applicability, and contribution to the conceptual synthesis
Inclusion criteria	English-language peer-reviewed studies (2020–2025) evaluating generative AI/LLMs in head and neck oncology or clearly related clinical tasks (e.g., TNM staging, treatment support, tumor board applications, patient education)
Exclusion criteria	Studies focused on non-generative AI methods, non-oncologic otolaryngology, or without meaningful head-and-neck–specific relevance; non-English publications; duplicate records
Additional sources	Reference lists of relevant studies were manually screened to identify additional contextual literature
Synthesis approach	Qualitative, narrative synthesis grouped by clinical application, with emphasis on performance, failure modes, and translational readiness

Note. The search and screening process was reported in a PRISMA-ScR–informed manner adapted to the scoping nature of the review.

We included studies that explicitly evaluated, deployed, or analyzed GenAI/LLMs for HNC–related tasks, including TNM staging, treatment planning, tumor board support, patient counseling, and patient-facing informational content. Articles restricted to non-GenAI approaches (e.g., traditional machine learning, radiomics, convolutional neural networks) or to non-oncologic otolaryngology were excluded to maintain topic specificity. Given the scoping review design and heterogeneity of the evidence base, inclusion decisions were guided by thematic relevance, clinical applicability, and contribution to the overall conceptual synthesis.

Because the included studies varied substantially in design, clinical task, comparator, and evaluation metrics, quantitative meta-analysis was not attempted. Instead, studies were grouped thematically by clinical application (e.g., staging, tumor boards, patient education). A qualitative synthesis approach was used, emphasizing reported performance, contextual factors influencing outputs, reproducibility issues, and documented failure modes such as hallucination, omission, and overgeneralization. Formal risk-of-bias assessment tools were not applied due to the heterogeneity and early-stage nature of the literature, consistent with scoping review methodology.

Our objective was not to rank models but to identify which GenAI applications appear closest to real-world implementation, which remain exploratory, and what safety or governance gaps must be addressed before clinical use. This approach aligns with best practices for scoping reviews, which prioritize conceptual clarity and translational relevance when evidence heterogeneity precludes statistical aggregation.

Clinical applications of generative artificial intelligence in head and neck oncology

The earliest clinical interest in GenAI in head and neck oncology has focused on its potential to support complex cognitive tasks that are heavily language-dependent, time-consuming, and vulnerable to human error (Table 2). These include TNM staging and guideline navigation, synthesis of multidisciplinary tumor board material, and patient-facing education and counseling.^11,18–20 Across these domains, emerging evidence suggests a consistent pattern: GenAI performs best when constrained to assistive, text-centric functions and becomes unreliable when extended toward autonomous clinical judgment.^11,18–20

Table 2.

Clinical applications of generative AI in head and neck oncology.

Domain	Example tasks	Key studies	Approx. study scale	Metrics reported	Performance & evidence strength	Readiness
TNM staging & treatment support	TNM staging; adjuvant therapy selection; NCCN navigation	Vural Camalan 2025; Marchi 2024; Kayaalp 2025; Lorenzi 2024	Small-to-moderate cohorts and vignette-based case sets	Accuracy; agreement with NCCN; expert scoring; occasional agreement statistics	Moderate evidence from predominantly simulation-based studies with limited real-world confirmation. Stronger performance in common subsites; reduced reliability in rare or complex cases.	Emerging assistive use only
Multidisciplinary tumor boards	Case summarization; TNM explanation; diagnostic/therapeutic workup	Schmidl 2024 (primary and R/M cohorts); Aubreville 2025; Alami 2024	Small prospective cohorts to larger retrospective datasets	Concordance with MDT decisions; qualitative scoring systems; occasional agreement metrics	Low-to-moderate evidence. Reliable for summarization and explanation; variable and sometimes inappropriate treatment recommendations.	Exploratory (preparatory synthesis only)
Patient education & counseling	Plain-language explanations; prognosis Q&A; FAQ generation	Wei 2024; Lee 2024; Mnajjed 2024; Hack 2026	Small prompt-based evaluations and question sets	Readability indices; quality scores; comparative accuracy assessments	Moderate evidence. High readability with generally acceptable quality, though performance may lag curated sources and omit nuance or uncertainty.	Near-term assistive
General ENT/cross-cutting tasks	FAQ generation; multi-domain counseling; mixed ENT tasks	Hack 2025; Hack 2025 neck masses	Cross-domain study sets and meta-analytic synthesis	Accuracy; task-category comparisons; cross-domain performance summaries	Supportive background evidence. Stronger performance in communication tasks than diagnostic or management decisions; indirect applicability to HNC.	Supportive background evidence

Note. Summary of current GenAI applications across head and neck oncology, including TNM staging, tumor board support, and patient education. Approximate study scale and performance metrics (e.g., accuracy, concordance, readability indices, and agreement measures such as Cohen’s κ) are reported descriptively due to heterogeneity in study design and reporting. Evidence strength reflects consistency of findings, study design (simulation-based vs real-world), and degree of clinical validation. Readiness levels indicate assistive, not autonomous, clinical use based on reported performance, benefits, and observed failure modes.

TNM staging and clinical decision support

Several peer-reviewed, head and neck–specific studies have evaluated GenAI for TNM staging and clinical decision support. In one of the largest evaluations to date, a 2024 study assessed ChatGPT-4 on 263 HNSCC cases (oral cavity, oropharynx, hypopharynx, larynx), reporting moderate to substantial concordance with multidisciplinary tumor board decisions and NCCN guidelines, with κ values ranging from 0.48 to 0.78 for treatment recommendations. A 2025 simulation comparing ChatGPT-o1 and DeepSeek-V3 on staged HNC scenarios similarly found statistically significant accuracy across subsites (p < 0.05), with highest performance observed for common, well-represented subsites such as the larynx and early-stage disease.^19,23 Collectively, these findings suggest that GenAI can approximate clinician reasoning under constrained, structured conditions and may function as a staging cross-check or educational aid, rather than an autonomous decision-maker.^19,23,24

Complementary evidence is provided by Marchi et al., who evaluated ChatGPT responses to NCCN-style clinical scenarios in head and neck oncology. The model demonstrated strong alignment with guideline-concordant recommendations for adjuvant therapy and surveillance, particularly when prompts were explicitly structured and limited to guideline interpretation.²⁵ These results reinforce a consistent pattern across studies; the more bounded and rules-based the task, the more reliably large language models tend to perform.

However, these same studies underscore important limitations. Performance degrades in the presence of incomplete clinical data, ambiguous imaging descriptors, or rare anatomic subsites such as the sinonasal tract or skull base. Borderline or “gray-zone” scenarios, including subtle cortical bone erosion distinguishing T3 from T4a oral cavity tumors, equivocal midline involvement in oropharyngeal primaries, or minimal prevertebral fascia contact in hypopharyngeal disease, further expose GenAI’s tendency to default to generic heuristics rather than interrogate uncertainty. In such cases, expert interpretation depends on integrating imaging, endoscopy, pathology, and functional assessment, information that is rarely fully captured in textual prompts. GenAI systems lack intrinsic awareness of missing or indeterminate data and may extrapolate rather than explicitly flag uncertainty. Consequently, erroneous outputs may appear linguistically polished yet clinically misleading, resulting in high-stakes “silent errors’’.^19,24,25

Multidisciplinary tumor boards

Multidisciplinary tumor boards (MDTs) represent a second major area of exploration for GenAI in head and neck oncology. Given the volume and heterogeneity of data reviewed in HNC MDTs, GenAI has been proposed as a tool for case preparation and summarization.^26,27 Lechien et al. reported that ChatGPT-4 produced accurate TNM explanations in 95% of cases and appropriately identified diagnostic workup steps in most scenarios.²⁸ These findings suggest that GenAI may reduce preparatory workload and promote more standardized case summaries in high-volume centers.

However, MDTs are not merely information-processing venues; they are deliberative social systems in which clinical reasoning evolves through negotiation, dissent, and iterative hypothesis refinement. Introducing GenAI-generated summaries therefore risks reshaping this communication ecology. Emerging evidence suggests that GenAI may inadvertently pre-frame discussions, emphasizing certain dimensions (e.g., oncologic aggressiveness or guideline alignment) while underrepresenting others (e.g., functional preservation, patient preference, or reconstructive complexity). Such framing may influence the trajectory of multidisciplinary debate even before expert discussion begins.^18,28–30 Social science literature consistently demonstrates that early framing exerts a disproportionate influence on downstream group decisions, particularly among less-experienced clinicians.

Empirical studies reinforce these concerns. Schmidl et al. observed that ChatGPT occasionally proposed non–guideline-concordant recommendations and struggled to integrate competing clinical priorities in real MDT contexts.¹⁸ Similarly, comparative studies of ChatGPT-4 and ChatGPT-4o in recurrent and metastatic HNC tumor boards found that model fluency did not reliably translate into clinical soundness or decision consistency.²⁸ Importantly, the principal risk is not overtly incorrect recommendations, but subtle distortions of reasoning pathways, including anchoring effects, reinforcement of implicit assumptions, and narrowing of the perceived decision space, that clinicians may not consciously detect. For these reasons, current evidence supports the use of GenAI in MDTs only for preparatory synthesis and documentation, with explicit human review and critical discussion prior to clinical decision-making.^18,23,26,28

Patient education and communication

Patient education and communication represent the most mature and consistently supported application of GenAI in head and neck oncology. Unlike decision support, these tasks are fundamentally linguistic and therefore closely aligned with GenAI’s strengths in natural language generation, simplification, and adaptation to varying literacy levels.^11,20 Lee et al. found that ChatGPT-generated explanations for HNC surgeries achieved comparable accuracy and superior readability relative to conventional patient education materials.³¹ Mnajjed and Patel similarly reported high performance on validated patient education metrics, including the Suitability Assessment of Materials and the Patient Education Materials Assessment Tool.³²

These findings are reinforced by broader otolaryngology evidence. In a meta-analysis of otolaryngology-related tasks, Hack et al. reported that communication and education domains outperformed diagnostic and decision-support applications, achieving approximately 83% accuracy overall.²² Given that HNC treatments often involve complex trade-offs among survival, airway safety, swallowing, and voice, improved patient comprehension may meaningfully enhance shared decision-making. GenAI may also help bridge literacy, language, and access gaps.

Nevertheless, GenAI-mediated education is not risk-free. Wei et al. found that ChatGPT responses to common HNC patient questions were occasionally incomplete or less accurate than vetted online sources, particularly for nuanced clinical topics.³³ Oversimplification, omission of uncertainty, and authoritative tone may inadvertently mislead patients if GenAI outputs are used without clinician review. Ethical considerations around transparency, data privacy, and equity further complicate widespread deployment.³⁴

Synthesis across applications

The current evidence base evaluating GenAI in head and neck oncology remains early-stage and methodologically heterogeneous. Most studies are single-center, retrospective, or simulation-based evaluations with relatively small sample sizes and limited external validation. These designs introduce several potential sources of bias, including selection bias in case construction, incomplete representation of real-world clinical complexity, and reliance on curated or structured inputs that may not reflect routine clinical documentation.

A key distinction across studies is the use of simulated-case environments versus real clinical workflows. Simulation-based studies, often using standardized vignettes or fully specified clinical scenarios, tend to report higher agreement with guidelines or expert decision-making. However, these settings reduce ambiguity and omit the fragmented, incomplete, and context-dependent data that characterize real-world head and neck oncology practice. In contrast, the limited number of studies evaluating GenAI within actual clinical workflows or multidisciplinary tumor boards demonstrate more variable and less predictable performance, underscoring the gap between controlled evaluation and real-world deployment.

The literature is also susceptible to publication and reporting bias, with a predominance of proof-of-concept studies reporting favorable or promising results. Negative findings, inconsistent performance, or clinically unsafe outputs may be underreported. In addition, many studies evaluate GenAI under optimized prompting conditions, which may not reflect routine clinical use and may further overestimate real-world performance. Taken together, these factors suggest that current performance estimates should be interpreted cautiously, particularly when extrapolating beyond bounded, assistive use cases. These limitations should be considered when interpreting reported performance and when assessing the readiness of GenAI for clinical integration.

In aggregate, the available evidence across clinical applications converges on a central conclusion: GenAI is best suited to assistive, language-centric tasks in head and neck oncology that can be clearly bounded, structured, and reviewed by clinicians.^35–37 When applied to staging verification, guideline navigation, case summarization, and patient education within defined constraints, GenAI can enhance efficiency, standardization, and communication.^{2,13–15,19,24,28,38}

Several use cases therefore appear appropriate for near-term clinical deployment, including documentation assistance, literacy-adapted patient education, structured staging cross-checks against established guidelines, and preparatory tumor board case summarization with mandatory expert oversight. Medium-term targets include benchmark-driven, workflow-embedded evaluations of GenAI-assisted tumor board preparation and guideline navigation tools. By contrast, autonomous diagnosis or treatment recommendation, particularly in rare subsites or gray-zone scenarios, is not supported by the current evidence base.

When extended beyond these assistive roles toward independent clinical judgment, GenAI’s known limitations—including hallucination, omission of functional considerations, overgeneralization, and susceptibility to framing effects—introduce clinically meaningful risk in a domain as anatomically complex and functionally consequential as head and neck oncology.^{9–12,14,15,19,24,39,40}

From a head and neck oncology perspective, several clinically critical domains remain underexplored in the current GenAI literature. Functional outcome prediction—including speech intelligibility, swallowing function, and airway preservation—plays a central role in treatment selection but is rarely incorporated into current GenAI evaluations, which remain predominantly focused on oncologic endpoints.^41–44 Similarly, reconstructive planning, including flap selection and anticipated functional rehabilitation, introduces an additional layer of complexity that is not well captured by existing language-based models.^19,42,45,46 Integration of GenAI into radiotherapy planning workflows and decision-making for less common subsites, such as sinonasal and skull base malignancies, also remains limited.^19,42,45,46 These gaps highlight important areas for future development, particularly for models intended to support comprehensive, multidisciplinary decision-making in head and neck oncology.^{19,42,45–48}

Safety, governance, and the path forward

The integration of GenAI into head and neck oncology raises safety considerations that extend beyond model accuracy to encompass human factors, ethics, and institutional responsibility (Table 3). These risks are not hypothetical. They arise from both the probabilistic nature of LLMs and the sociotechnical environments in which they are deployed. Emerging studies and early clinical evaluations suggest that, in a subspecialty characterized by clinical nuance, multidisciplinary decision-making, and high functional stakes, even subtle GenAI failures may carry disproportionate consequences.^{9–12,14,15,19,24,39,40,49}

Table 3.

Major GenAI failure modes in head and neck oncology and recommended governance measures.

Failure Mode	Description	Example scenario	Potential consequences	Recommended safeguards
Hallucination	Confident but incorrect statements about staging or treatment	Invented contraindications; misclassification of T category	Wrong treatment decisions; patient confusion	Clinician review; RAG to vetted guidelines; explicit AI labeling
Omission	Failure to mention airway, swallowing, QoL, reconstruction issues	Counseling omits aspiration risk or gastrostomy considerations	Under-informed consent; unsafe follow-up	Structured prompts; checklists; mandatory clinician edit
Overgeneralization	Applying common patterns to rare subsites or atypical disease	Sinonasal or skull-base tumors treated like oropharyngeal SCC	Mis-staging; incorrect radiation fields or surgery level	Scope boundaries; warnings; subspecialist review
Temporal drift	Model uses outdated NCCN/AJCC versions	Recommends obsolete staging or therapy algorithms	Guideline-discordant care; legal defensibility issues	Version-locking; scheduled revalidation; provenance logs
Automation bias & framing	Clinicians anchor on fluent AI outputs	Tumor board discussion shaped by AI-suggested plan	Narrowed debate; reduced exploration of alternatives	Limit AI to prep; explicit discussion; MDT training
Equity & access gaps	Language/cultural mismatches; unequal deployment quality	Underserved groups receive lower-quality explanations	Widening disparities; reduced care quality	Multilingual validated content; monitoring; targeted support

Note. Key failure modes observed across head and neck oncology studies, with representative clinical scenarios, potential consequences, and governance strategies. Framework emphasizes human-in-the-loop oversight, guideline anchoring, version control, and safeguards against automation bias, temporal drift, and equity gaps.

Across the head and neck–specific literature, several recurring failure modes are consistently observed. Hallucinations remain a central concern, particularly when GenAI systems are queried beyond narrowly constrained tasks.^30,50,51 Omission errors, in which critical considerations such as airway risk, swallowing function, or quality-of-life trade-offs are absent from generated outputs, pose equal risk. Overgeneralization occurs when common disease patterns are inappropriately applied to rare subsites or atypical presentations. This reflects biases in training data and is especially hazardous in anatomically complex regions. Finally, temporal drift threatens factual reliability as staging systems and guidelines evolve, particularly when models are not explicitly anchored to version-controlled sources.^{20,30,40,52,53} These patterns have been reported primarily in controlled or retrospective settings, and their real-world frequency and impact remain incompletely characterized. Collectively, these failure modes may be particularly consequential in head and neck oncology, where small inaccuracies in tumor extent or reconstructive implications can lead to major deviations in care pathways.

Beyond model-intrinsic limitations, GenAI introduces important human–machine interaction risks. Automation bias, the tendency to over-trust algorithmic outputs, may be amplified by GenAI’s fluency and professional tone. In time-pressured clinical environments, clinicians may unconsciously defer to GenAI-generated summaries or recommendations, particularly when outputs appear confident or align with initial impressions. Of greater concern is subtle cognitive anchoring. Once a GenAI-generated frame or hypothesis is introduced, it can disproportionately shape subsequent reasoning, even when incorrect.^39,40,49,52 These effects are well described in decision science and may be amplified in AI-assisted contexts, particularly in multidisciplinary settings where early framing strongly influences group consensus.

These safety concerns carry unresolved medico-legal implications. Most GenAI systems currently used in oncology fall outside formal medical device regulation, placing accountability primarily on clinicians and institutions.^4,49 When GenAI-assisted content contributes to patient harm, responsibility attribution, among clinician, institution, and AI vendor, remains unclear, particularly when AI-generated text is incorporated into the medical record without explicit labeling. This uncertainty is compounded by the rapid and opaque update cycles of commercial LLMs, whereby identical prompts may yield different outputs over time, undermining reproducibility and legal defensibility. Emerging guidance, including recommendations from the 2025 NCCN AI Summit, suggests explicit labeling of GenAI-assisted content in clinical documentation to ensure auditability and medico-legal transparency.^39,40,49

A related governance challenge is the need for rigorous AI provenance tracking. As models evolve, institutions require mechanisms to document the specific model version, prompt structure, and contextual inputs used to generate a given output. Without such provenance logs, analogous to metadata in radiologic PACS systems, it becomes difficult to retrospectively evaluate decisions, conduct morbidity-and-mortality review, or support institutional learning. This lack of traceability represents a major barrier to quality assurance in head and neck oncology.^12,54

Responsible integration therefore requires carefully designed and enforceable governance frameworks. Human-in-the-loop oversight should be considered a foundational safeguard, whereby GenAI outputs are reviewed, edited, and explicitly endorsed by clinicians before influencing decision-making or entering the medical record.^2,39,49,52 Retrieval-augmented generation (RAG), anchored to curated and version-controlled guideline sources (e.g., NCCN), may improve factual grounding but does not eliminate error or bias. Complementary explainable-AI techniques, such as attribution methods highlighting which guideline clauses or clinical features informed a recommendation, may further enhance interpretability, although head and neck–specific validation remains limited.¹² GenAI-assisted materials should be clearly identified, and patients should be informed when such systems contribute to education or communication.^2,39,49,52

Finally, institutional policies must clearly define acceptable use cases, documentation standards, and auditability requirements. Professional societies in head and neck oncology are well positioned to establish field-specific guidance, reducing inter-institutional variability and reinforcing shared ethical norms. In parallel, clinician education in GenAI literacy, including recognition of model limitations, common failure modes, and appropriate skepticism, will be essential for safe adoption.^2,39,49,52 As deployment expands, institutions will require dynamic monitoring systems capable of detecting performance drift, equity gaps, and unintended downstream effects on clinical outcomes.

Future directions

Looking forward, future research should prioritize prospective, workflow-embedded evaluations of GenAI in head and neck oncology. To date, nearly all studies have relied on retrospective prompts, synthetic vignettes, or simulation-based assessments, which do not capture the real pressures, uncertainties, and incomplete data that shape clinical decision-making. Cluster-randomized or stepped-wedge evaluations of GenAI-assisted tumor board preparation, patient counseling, or staging verification, measuring time savings, decision consistency, error rates, and clinician trust, represent a critical next phase of validation.

Importantly, many of the limitations observed across current GenAI applications in head and neck oncology are not merely implementation failures but reflect more fundamental constraints of large language model architectures. LLMs are trained predominantly on large-scale internet and text-based corpora and lack exposure to sufficiently granular, high-quality, domain-specific clinical datasets. As a result, their apparent “reasoning” reflects probabilistic linguistic pattern matching rather than true clinical, anatomical, or pathophysiological understanding. This epistemic limitation likely underlies the consistent degradation in performance observed in nuanced, gray-zone scenarios that require integration of imaging subtleties, functional trade-offs, and tacit subspecialty knowledge.

Beyond evaluation design, advances in multimodal AI architectures offer the potential to better address the complexity of HNC care. Models capable of integrating CT/MRI imaging, pathology descriptors, genomic data, endoscopic findings, and structured clinical text may eventually support safer, more holistic decision assistance. However, multimodal foundation models will require rigorous domain-specific training, robust guardrails, and prospective clinical trials; without these safeguards, increased model complexity may simply compound existing risks.

In this context, future progress in AI-assisted clinical decision support may depend less on further scaling of general-purpose LLMs and more on domain-specific models trained on structured clinical data and explicit domain knowledge representations. Emerging approaches such as tabular foundation models, including TabPFN, demonstrate that strong predictive performance can be achieved from comparatively small, structured datasets when appropriate inductive priors are incorporated. Hybrid systems that integrate structured clinical variables, domain priors, and language-based interfaces may therefore offer a more reliable and clinically aligned pathway toward decision support than text-only generative models.

A parallel priority is the development of head and neck–specific benchmark suites. Existing LLM benchmarks are overwhelmingly generic and fail to capture the unique anatomical, functional, and multidisciplinary considerations of HNC. Purpose-built datasets, including expertly annotated staging vignettes, imaging-grounded decision scenarios, reconstruction planning dilemmas, and validated patient FAQ sets across multiple literacy levels, would allow reproducible comparisons across models and facilitate detection of performance drift over time.

Equity considerations must also be foregrounded. High reliability and cost-effective deployment could, in principle, expand access to expert-quality information in regions with limited HNC subspecialists. Yet unequal access to high-quality deployments, language limitations, and the risk of model biases replicating existing disparities could widen gaps in outcomes if not closely monitored. Ensuring equitable access to validated tools, multilingual capability, culturally sensitive content generation, and community-centered evaluation frameworks will be essential.

Finally, the field must avoid treating head and neck oncology as a permissive testing ground for experimental AI tools. HNC decisions can alter long-term airway function, swallowing, voice, appearance, and overall quality of life. The threshold for acceptable AI error is therefore likely to be substantially lower than in less functionally consequential domains. Accordingly, deployment strategies should be approached cautiously and supported by rigorous validation prior to widespread clinical integration.

Conclusion

In conclusion, GenAI holds genuine promise as an assistive technology in head and neck oncology, particularly for language-centric tasks such as documentation, case summarization, guideline navigation, literacy-adapted patient education, and cross-checking structured staging information. At present, its strengths lie in augmenting human expertise rather than replacing it. Autonomous diagnosis or treatment recommendation, whether in primary, recurrent, or metastatic disease, remains unsupported by evidence and carries unacceptable risk.

A principled path forward requires restraint, transparency, and governance, ensuring that GenAI augments, not substitutes, expert clinical judgment. Explicit labeling of GenAI-assisted content, human-in-the-loop oversight, version-controlled AI provenance logs, and ongoing monitoring for drift and inequity will be essential components of safe clinical integration. If developed and deployed responsibly, GenAI may streamline preparation, enhance communication, and narrow access gaps, while preserving the clinical reasoning, nuance, and multidisciplinary expertise that define high-quality HNC care.

Footnotes

ORCID iD

Sholem Hack

Author contribution

Jacob E Karni: Conception and design of study, Literature Review, Drafting of article and/or critical revision, Final approval of manuscript.

Christian Simon: Validation, Literature Review, Drafting of article and/or critical revision, Final approval of manuscript.

Sholem Hack: Conception and design of study, Literature Review, Visualization, Project Administration, Drafting of article and/or critical revision, Final approval of manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Ferber

El Nahhas

OSM

Wölflein

, et al. Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology. Nature Cancer 2025; 6: 1–13, Nature Publishing Group. https://doi.org/10.1038/s43018-025-00991-6

Fahad

Rabbi

Benta Hasan

, et al. Generative AI in clinical (2020–2025): a mini-review of applications, emerging trends, and clinical challenges. Front Digit Health. Frontiers Media SA 2025; 7: 1653369, [cited 2025 Dec 16]. https://doi.org/10.3389/FDGTH.2025.1653369

Tarhini

Dave

El Naqa

. General artificial intelligence for the diagnosis and treatment of cancer: the rise of foundation models. BJR|Artificial Intelligence. Oxford Academic, 2025. [cited 2025 Dec 16];2: https://doi.org/10.1093/BJRAI/UBAF015

Derbal

. Reimagining cancer treatments in the era of generative AI. Digit Health. SAGE Publications Inc; 2025;11:20552076251394630. https://doi.org/10.1177/20552076251394631

Chen

Parsa

Swanson

, et al. Large language models in oncology: a review. BMJ Oncology. BMJ Publishing Group 2025; 4: e000759, [cited 2025 Aug 16]. https://doi.org/10.1136/BMJONC-2025-000759

Huhulea

Huang

Eng

, et al. Artificial Intelligence Advancements in Oncology: A Review of Current Trends and Future Directions. Biomedicines. Multidisciplinary Digital Publishing Institute (MDPI) 2025; 13: 951. https://doi.org/10.3390/BIOMEDICINES13040951

Derbal

. Generative AI - Assisted Adaptive Cancer Therapy. Cancer Control. SAGE Publications Ltd 2025. https://doi.org/10.1177/10732748251349919/ASSET/D644B1A5-FF5F-4926-A674-66F85351D06B/ASSETS/IMAGES/LARGE/10.1177_10732748251349919-FIG5.JPG

Derbal

. Clinical evaluation of GenAI adaptive cancer therapy, EngMedicine, Clinical evaluation of GenAI adaptive cancer therapy. EngMedicine. Elsevier; 2026;3:100116. https://doi.org/10.1016/J.ENGMED.2025.100116

Loperfido

Celebrini

Marzetti

, et al. Current role of artificial intelligence in head and neck cancer surgery: a systematic review of literature. Open Exploration 2019 4:5 [Internet]. Open Exploration 2023; 4: 933–940, [cited 2025 Dec 16]. https://doi.org/10.37349/ETAT.2023.00174

10.

Broggi

Maniaci

Lentini

, et al. Artificial Intelligence in Head and Neck Cancer Diagnosis: A Comprehensive Review with Emphasis on Radiomics, Histopathological, and Molecular Applications. Cancers (Basel). Multidisciplinary Digital Publishing Institute (MDPI) 2024; 16: 3623. https://doi.org/10.3390/CANCERS16213623

11.

Pham

Teh

Chatzopoulou

, et al. Artificial Intelligence in Head and Neck Cancer: Innovations, Applications, and Future Directions. Current Oncology. Multidisciplinary Digital Publishing Institute (MDPI) 2024; 31: 5255–5290. https://doi.org/10.3390/CURRONCOL31090389

12.

Cai

Zhang

Pan

, et al. Artificial intelligence in head and neck cancer: a bibliometric and visualization analysis (1995–2025). Discover Oncology [Internet]. Springer Science and Business Media B.V 2025; 16: 2148. [cited 2025 Dec 16]. https://doi.org/10.1007/S12672-025-03992-0

13.

Takita

Kabata

Walston

, et al. A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians. npj Digital Medicine 2025 8:1. Nature Publishing Group 2025; 8: 175, [cited 2025 Dec 16]. https://doi.org/10.1038/s41746-025-01543-z

14.

Feldman

Hoffer

Conley

, et al. Dedicated AI Expert System vs Generative AI With Large Language Model for Clinical Diagnoses. JAMA Netw Open [Internet]. American Medical Association 2025; 8: e2512994, [cited 2025 Dec 16]. https://doi.org/10.1001/JAMANETWORKOPEN.2025.12994

15.

Pandya

Makaryan

Bresler

, et al. Advancing AI in oncology: A performance comparison of ChatGPT-4o and ChatGPT-o1 in neuroendocrine tumor clinical decision making. Journal of Clinical Oncology [Internet]. American Society of Clinical Oncology 2025; 43: [cited 2025 Dec 16]. https://doi.org/10.1200/JCO.2025.43.16_SUPPL.E13730

16.

Shan

Chen

Wang

, et al. Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis. https://medinform.jmir.org/2025/1/e64963 [Internet]. JMIR Medical Informatics; 2025; 13:e64963. https://doi.org/10.2196/64963

17.

Colevas

Cmelak

Pfister

, et al. NCCN Guidelines® Insights: Head and Neck Cancers, Version 2.2025. J Natl Compr Canc Netw [Internet] 2025; 23: 2–11, [cited 2025 Dec 17]. https://doi.org/10.6004/JNCCN.2025.0007

18.

Schmidl

Hütten

Pigorsch

, et al. Assessing the role of advanced artificial intelligence as a tool in multidisciplinary tumor board decision-making for primary head and neck cancer cases. Front Oncol [Internet] 2024; 14: 1353031, [cited 2024 Nov 12]. https://doi.org/10.3389/FONC.2024.1353031

19.

Vural

Doluoglu

Taraf

, et al. ChatGPT versus DeepSeek in head and neck cancer staging and treatment planning: guideline-based study. European Archives of Oto-Rhino-Laryngology [Internet]. Springer Science and Business Media Deutschland GmbH 2025; 282: 4815, [cited 2025 Dec 16]. https://doi.org/10.1007/S00405-025-09524-4

20.

Hack

Alsleibi

Saleh

, et al.

Are chatbots a reliable source for patient frequently asked questions on neck masses?

Eur Arch Otorhinolaryngol [Internet] 2025; 282: 4273–4282, [cited 2025 Jul 12]. https://doi.org/10.1007/S00405-025-09433-6

21.

Aubreville

Ganz

Ammeling

, et al. Prediction of tumor board procedural recommendations using large language models. Eur Arch Otorhinolaryngol [Internet] 2025; 282: 1619–1629, [cited 2025 Aug 13]. https://doi.org/10.1007/S00405-024-08947-9

22.

Hack

Attal

Farzad

, et al. Performance of generative AI across ENT tasks: A systematic review and meta-analysis. Auris Nasus Larynx 2025; 52: 585–596, [Internet]. https://doi.org/10.1016/J.ANL.2025.08.010

23.

Alami

Willemse

Quiriny

, et al. Evaluation of ChatGPT-4’s Performance in Therapeutic Decision-Making During Multidisciplinary Oncology Meetings for Head and Neck Squamous Cell Carcinoma. Cureus [Internet]. Springer Science and Business Media LLC 2024; 16: e68808, [cited 2025 Oct 26]. https://doi.org/10.7759/CUREUS.68808

24.

Kayaalp

Bölek

Yaşar

. Confirmation of Large Language Models in Head and Neck Cancer Staging. Diagnostics. Multidisciplinary Digital Publishing Institute (MDPI) 2025; 15: 2375. https://doi.org/10.3390/DIAGNOSTICS15182375/S1

25.

Marchi

Bellini

Iandelli

, et al. Exploring the landscape of AI-assisted decision-making in head and neck cancer treatment: a comparative analysis of NCCN guidelines and ChatGPT responses. Eur Arch Otorhinolaryngol [Internet] 2024; 281: 2123–2136, [cited 2025 Aug 13]. https://doi.org/10.1007/S00405-024-08525-Z

26.

Lechien

Chiesa-Estomba

Baudouin

, et al. Accuracy of ChatGPT in head and neck oncological board decisions: preliminary findings. Eur Arch Otorhinolaryngol [Internet] 2024; 281: 2105–2114, [cited 2025 Aug 13]. https://doi.org/10.1007/S00405-023-08326-W

27.

Rao

Fernandez-Alvarez

Guntinas-Lichius

, et al. The Limitations of Artificial Intelligence in Head and Neck Oncology. Adv Ther [Internet]. Adis 2025; 42: 2559–2568, [cited 2025 Dec 17]. https://doi.org/10.1007/S12325-025-03198-4

28.

Schmidl

Hütten

Pigorsch

, et al. Assessing the role of advanced artificial intelligence as a tool in multidisciplinary tumor board decision-making for recurrent/metastatic head and neck cancer cases - the first study on ChatGPT 4o and a comparison to ChatGPT 4.0. Front Oncol [Internet] 2024; 14: 1455413, [cited 2025 Dec 16]. https://doi.org/10.3389/FONC.2024.1455413

29.

Zanoni

Patel

Shah

. Changes in the 8th Edition of the American Joint Committee on Cancer (AJCC) Staging of Head and Neck Cancer: Rationale and Implications. Curr Oncol Rep 2019; 21: 52, [Internet]. Current Medicine Group LLC 1; 2019 [cited 2025 Dec 17]. https://doi.org/10.1007/S11912-019-0799-X

30.

Hagen

Hornung

Barham

, et al. Artificial Intelligence in Head and Neck Cancer: Towards Precision Medicine. Cancers (Basel) [Internet] 2025; 17: 3023. https://doi.org/10.3390/CANCERS17183023

31.

Lee

Hamill

Shnayder

, et al. Exploring the Role of Artificial Intelligence Chatbots in Preoperative Counseling for Head and Neck Cancer Surgery. Laryngoscope 2024; 134: 2757–2761, [Internet]. https://doi.org/10.1002/LARY.31243

32.

Mnajjed

Patel

. Assessment of ChatGPT generated educational material for head and neck surgery counseling. Am J Otolaryngol 2024; 45: 104410, [Internet]. https://doi.org/10.1016/J.AMJOTO.2024.104410

33.

Wei

Fritz

Rajasekaran

. Answering head and neck cancer questions: An assessment of ChatGPT responses. Am J Otolaryngol [Internet] 2024; 45: 104085. https://doi.org/10.1016/J.AMJOTO.2023.104085

34.

Attal

Farzad

Hack

. Ethical challenges of artificial intelligence in otolaryngology: balancing transparency, autonomy, and clinical judgment. European Archives of Oto-Rhino-Laryngology. Springer Science and Business Media Deutschland GmbH, 2025. https://doi.org/10.1007/S00405-025-09374-0

35.

Yim

Khuntia

Parameswaran

, et al. Preliminary Evidence of the Use of Generative AI in Health Care Clinical Services: Systematic Narrative Review. JMIR Med Inform [Internet] 2024; 12: e52073, [cited 2025 Mar 17]. https://doi.org/10.2196/52073

36.

Spatscheck

Schaschek

Winkelmann

. The effects of generative AI’s human-like competencies on clinical decision-making. J Decis Syst [Internet]. Taylor & Francis, 2024. [cited 2025 Dec 17]. https://doi.org/10.1080/12460125.2024.2430731

37.

Scott

Reddy

Kelly

, et al. Using generative artificial intelligence in clinical practice: a narrative review and proposed agenda for implementation. Med J Aust [Internet]. John Wiley and Sons Inc 2025; 223: 664–672, [cited 2025 Dec 17]. https://doi.org/10.5694/MJA2.70057

38.

Bhuyan

Sateesh

Mukul

, et al. Generative Artificial Intelligence Use in Healthcare: Opportunities for Clinical Excellence and Administrative Efficiency. J Med Syst [Internet] 2025; 49: 10, [cited 2025 Mar 17]. https://doi.org/10.1007/S10916-024-02136-1

39.

Kolla

Parikh

. Uses and limitations of artificial intelligence for oncology. Cancer [Internet]. John Wiley and Sons Inc 2024; 130: 2101–2107, [cited 2025 Dec 17]. https://doi.org/10.1002/CNCR.35307

40.

Chang

Park

Schäffer

, et al. Hallmarks of artificial intelligence contributions to precision oncology, Nat Cancer [Internet]. Nature Research; 2025 [cited 2025 Dec 17];6:417. 431. https://doi.org/10.1038/S43018-025-00917-2

41.

Park

Pillai

Deng

, et al. Assessing the research landscape and clinical utility of large language models: a scoping review. BMC Med Inform Decis Mak. BioMed Central Ltd 2024; 24: 1–14, [cited 2024 Dec 2]. https://doi.org/10.1186/S12911-024-02459-6/FIGURES/3

42.

Carl

Schramm

Haggenmüller

, et al. Large language model use in clinical oncology. NPJ Precis Oncol. Nature Research 2024; 8: 240. https://doi.org/10.1038/S41698-024-00733-4

43.

Hao

Qiu

Holmes

, et al. Large language model integrations in cancer decision-making: a systematic review and meta-analysis. NPJ Digit Med [Internet]. Nature Research 2025; 8: 450, [cited 2026 Jan 12]. https://doi.org/10.1038/S41746-025-01824-7

44.

Benary

Wang

Schmidt

, et al. Leveraging Large Language Models for Decision Support in Personalized Oncology. JAMA Netw Open [Internet]. American Medical Association 2023; 6: e2343689, [cited 2024 Dec 2]. https://doi.org/10.1001/JAMANETWORKOPEN.2023.43689

45.

Lorenzi

Pugliese

Maniaci

, et al. Reliability of large language models for advanced head and neck malignancies management: a comparison between ChatGPT 4 and Gemini Advanced. Eur Arch Otorhinolaryngol [Internet] 2024; 281: 5001–5006, [cited 2025 Aug 13]. https://doi.org/10.1007/S00405-024-08746-2

46.

Buhr

Ernst

Blaikie

, et al. Assessment of decision-making with locally run and web-based large language models versus human board recommendations in otorhinolaryngology, head and neck surgery. Eur Arch Otorhinolaryngol [Internet] 2025; 282: 1593–1607, [cited 2025 Oct 26]. https://doi.org/10.1007/S00405-024-09153-3

47.

Rajendran

Yang

Niedermayr

, et al. Large language model-augmented learning for auto-delineation of treatment targets in head-and-neck cancer radiotherapy. Radiotherapy and Oncology [Internet]. Elsevier Ireland Ltd 2025; 205: 110740, [cited 2026 Apr 4]. https://doi.org/10.1016/j.radonc.2025.110740

48.

Marcaccini

Seth

Novo

, et al. Leveraging Artificial Intelligence for Personalized Rehabilitation Programs for Head and Neck Surgery Patients. Technologies (Basel) [Internet]. Multidisciplinary Digital Publishing Institute (MDPI), 2025, vol 13, p. 142. [cited 2026 Apr 4]. https://doi.org/10.3390/TECHNOLOGIES13040142/S1

49.

Thind

Tsao

. Artificial intelligence in oncology: promise, peril, and the future of patient–physician interaction. Front Digit Health. Frontiers Media SA; 2025;7:1633577. https://doi.org/10.3389/FDGTH.2025.1633577/BIBTEX

50.

Sun

Sheng

Zhou

, et al. AI hallucination: towards a comprehensive classification of distorted information in artificial intelligence-generated content. Humanities and Social Sciences Communications 2024 11:1 [Internet]. Palgrave 2024; 11: 1–14, [cited 2025 Jun 15]. https://doi.org/10.1057/s41599-024-03811-x

51.

Chelli

Descamps

Lavoué

, et al. Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis. J Med Internet Res 2024; 26: e53164, [cited 2025 Oct 26]. https://doi.org/10.2196/53164

52.

Nishisako

Higashi

Wakao

. Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study. https://cancer.jmir.org/2025/1/e70176 [Internet]. JMIR Cancer; 202511:e70176. https://doi.org/10.2196/70176

53.

Hack

Attal

Geva

, et al. Blinded comparative evaluation of GPT-generated, online search-derived, and guideline-based answers for HPV-associated oropharyngeal cancer. Oral Oncol [Internet]. Pergamon 2026; 172: 107813, [cited 2025 Dec 4]. https://doi.org/10.1016/J.ORALONCOLOGY.2025.107813

54.

Werder

Ramesh

Zhang

. Establishing Data Provenance for Responsible Artificial Intelligence Systems. ACM Trans Manag Inf Syst [Internet]. Association for Computing Machinery; 2022 [cited 2025 Dec 17];13. 1, 23. https://doi.org/10.1145/3503488/ASSET/69A6641C-640F-4CCB-B19F-D317B792C20E/ASSETS/IMAGES/LARGE/TMIS1302-22-F01.JPG

Assistive,not autonomous: Generative artificial intelligence in head and neck cancer care - A scoping review

Abstract

Objectives

Methods

Results

Conclusion

Keywords

Introduction

Methods

Clinical applications of generative artificial intelligence in head and neck oncology

TNM staging and clinical decision support

Multidisciplinary tumor boards

Patient education and communication

Synthesis across applications

Safety, governance, and the path forward

Future directions

Conclusion

Footnotes

ORCID iD

Author contribution

Funding

Declaration of conflicting interests

References