Abstract
Background
Artificial intelligence (AI) is a general term that refers to the use of a computer to model intelligent behavior with minimal human intervention. Generative AI, a subtype of AI, can collect, extract, digest, and generate information in a way that is understandable to humans. 1 Given the unprecedented volume of available data, along with advances in natural language processing and socially aware algorithms, AI applications are set to become increasingly prevalent in healthcare. Advantages of AI integration within medicine include elimination of human bias, improved pattern recognition, and decision-making. 2 There has been widespread growth of AI in surgery. Its integration has demonstrated significant promise in enhancing diagnostic accuracy, personalizing treatment plans, and improving patient education.
Included Publications.
Compared to other plastic surgery subspecialties (eg, burn and craniofacial trauma), cleft care is relatively underrepresented in the AI literature. Machine learning (ML) is being used across plastic surgery for pattern discovery, risk stratification, and outcome prediction. 3 In contrast, cleft care has only recently begun leveraging AI for individualized treatment planning and real-time caregiver support. This lag likely reflects the smaller patient population, anatomical complexity, and the need for long-term multidisciplinary follow-up, all of which are factors that complicate data acquisition and algorithm training.
Deep learning (DL) has been widely adopted for preoperative tasks such as anatomical classification, lesion detection, image segmentation, and image registration. These models use convolutional layers to extract input features and fully connected layers for output classification. For example, Chilamkurthy et al 3 demonstrated that DL could detect intracranial hemorrhage, calvarial fractures, midline shifts, and mass effects on CT scans. Similarly, Zhang and co-authors 4 used convolutional neural networks (CNNs) to classify benign or malignant thyroid lesions in ultrasound images, achieving sensitivity comparable to that of clinicians and improved specificity.
AI also plays a growing role in patient education. Its ability to synthesize and present complex medical information in understandable terms is particularly valuable in educating patients about surgical procedures, risks, and postoperative care. Large language models (LLMs) can communicate across literacy levels and languages, offering potential benefits for reaching marginalized populations. 5 ChatGPT is one example of a commercially available LLM that has been shown to provide mostly accurate and informative explanations. 6 Many LLMs, like ChatGPT, encourage patients to consult physicians for all medical inquiries, making it an adjunct tool that does not replace patient–surgeon communication. 7 LLMs can also be customized to generate procedure-specific notes, which have shown high consistency in formatting and content, potentially reducing miscommunication and documentation errors. 8
Cleft lip and palate (CL/P) surgery is a highly specialized field that demands precision in diagnosis, surgical planning, and patient education. Previous reviews of the role of AI in CL/P have been limited to diagnosis and treatment prediction,10,11 and orthodontic planning. 12 The aim of the current scoping review is to comprehensively define the use of AI not only in diagnosis/detection, but in surgical planning and patient education to identify gaps in the current evidence and inform future research and clinical practice.
Methods
A scoping review was conducted following PRISMA-ScR 9 and PRISMA-S 10 reporting standards. Studies were included if they discussed the use of AI in CL/P care and then, categorized based on whether AI was used for diagnosis and detection, surgical planning, or patient education.
Search Strategy
The search strategy was developed with the assistance of a medical librarian. A computerized search was conducted across the following databases from their inception to December 30, 2025: PubMed, Embase, and Scopus. No restrictions were placed on study design, outcome type, or clinical setting. Keywords and Medical Subject Headings (MeSH) terms regarding AI and cleft care were combined with Boolean operators for efficiency. Search syntax was adapted for each database. Minor refinements to search terms were made during the screening process to improve relevance and coverage. No substantive changes were made to the core search concepts.
To ensure comprehensive coverage, supplementary search methods were employed, including manual screening of reference lists of included articles. Grey literature was searched selectively to identify influential technical or methodological work, but was not exhaustively captured, consistent with the exploratory aims of a scoping review.
All retrieved records were imported into Covidence; this platform was used primarily for database screening and de-duplication. 11 The protocol was completed but not publicly shared or registered with PROSPERO. Ethics approval and informed consent were not required for this scoping review, as the authors used publicly accessible documents as data and evidence.
Inclusion and Exclusion Criteria
Title and abstract screening was conducted independently by 2 reviewers, followed by full-text review for eligibility. Discrepancies were resolved through discussion. Articles citing studies included in this review were screened to identify any additional articles for inclusion. Evidence was included for review if it met the following criteria: (1) diagnosis/detection of CL/P abnormalities or conditions, (2) surgical or treatment planning, (3) patient education or engagement, and (4) studies were in English.
Article Selection and Data Extraction
The following data were extracted from each article and used for comparisons: authors, year of publication, country of origin, journal, study design, intervention, study results, primary outcomes, and study outcomes of significance. The evidence level of primary studies was evaluated using the recognized hierarchy adopted by the journal, with high-quality, multicenter or single-center randomized controlled trials (RCTs) with sufficient power, or systematic reviews of such studies representing the highest level of evidence, and expert opinion formed through consensus, case reports, clinical examples, or evidence-based physiological studies representing the lowest level of evidence. Systematic reviews were assessed to identify other potential primary source publications. Data were extracted and compiled using an a priori-designed Excel data collection form capturing study characteristics, AI methodology, outcomes measured, and key findings.
Results
Study Selection
The initial database and literature searches yielded 334 articles, of which 258 remained after duplicates were removed (Figure 1). Following initial title and abstract screening, 132 articles remained, and 88 were excluded because they did not meet the inclusion criteria. This left 45 articles for inclusion in the qualitative synthesis. Of these 45 studies, 17 focused on cleft lip/palate and were, therefore, included in this review (Table 1). There was agreement between all authors on the list of included articles.

Preferred reporting items for scoping reviews flowchart illustrating the search strategy used to identify relevant studies for inclusion.
Study Characteristics
Most papers were from the United States (n = 11), with other authors being from Switzerland, Sweden, South Korea, Japan, China, and Poland (n = 1 each). Across the reviewed literature, AI has been used in various but complementary ways for CL/P diagnosis and detection, ranging from image-based classification to predictive modeling and treatment planning. DL models have become increasingly effective due to the availability of large-scale datasets and advances in model depth and parameterization. Our search yielded 4 studies using AI for the detection and diagnosis of cleft lip and or palate.
Diagnosis and Detection
Antenatal Diagnosis/Severity
Agarwal et al 12 used a computational method aimed at automatically identifying newborns or children with cleft abnormalities using 2D digital camera images. Specifically, the use of frontal face images to classify the image as healthy (no cleft), unilateral cleft lip, or bilateral cleft lip. A previously trained CNN, combined with a support vector machine, systematically learned latent features from the 2D images, achieving high accuracy in distinguishing between normal, unilateral CL/P, and bilateral CL/P cases. The accuracy for bilateral cases showed the poorest performance, but overall, the method achieved a success rate of 95%, demonstrating the potential of AI in cleft diagnosis.
Similarly, McCullough et al 13 demonstrated the use of CNNs to quantify the preoperative severity of unilateral cleft lip from 2D facial photographs. Their system accurately identified facial landmarks and produced severity grades, allowing for reproducible and objective categorization of preoperative severity. This method has the potential to standardize patient assessment and tailor surgical approaches to individual anatomic differences.
Jurek et al 14 used a form of computer image processing to classify ultrasound images of the fetal palate during the first trimester of pregnancy, and compared the results obtained across successive cases. Their AI system used a novel deep neural network to classify ultrasound images as either normal or including a cleft palate. Using standard cross-sectional ultrasound images to isolate the palatal bone, their method could be applied in the first trimester, and achieved a promising average efficiency of 81.6%.
Kuwada et al 15 further advanced DL applications by developing a model to diagnose different types of cleft anomalies using panoramic radiographs. They analyzed radiographs from 383 patients with cleft alveolus with or without cleft palate and 210 patients without cleft anomalies. Using this dataset, they created a computer-aided diagnosis (CAD) system that applies DL object detection techniques, with and without normal data during training, to assess model performance and identify characteristic imaging features influencing diagnostic accuracy. Distinguishing a cleft alveolus with or without a cleft palate is challenging due to image overlap from the cervical spine and the narrow panoramic image layer; furthermore, this difficulty leads to the increased potential for misdiagnosis. The CAD system was especially helpful in addressing these limitations, offering diagnostic support for less experienced radiologists. The model achieved relatively high detection sensitivity, outperforming or matching results reported by radiologists in similar panoramic radiograph studies.
Speech Evaluation
Instead of using imaging, Wang et al 16 focused on speech diagnostics, such as the detection of hypernasality and articulatory impairments, using audio input processed through AI-supported speech recognition systems. They employed ML for the detection of hypernasal speech to aid in the diagnosis of CL/P patients. They developed a long short-term memory-based deep recurrent neural network (DRNN) system capable of identifying hypernasal speech by learning short-term dependencies in acoustic signals. Their system achieved a detection accuracy of 93.35%.
Similarly, Cornefjord et al 17 aimed to develop an artificial neural network (ANN) to automatically evaluate velopharyngeal competence (VPC) using retrospective audio recordings and corresponding perceptual speech assessments to determine which children may need speech therapy or secondary palatal surgery. The study successfully designed ANN models, but with accuracies of only 40% to 60%, they are not reliable enough to replace auditory perceptual assessments by trained speech-language pathologists to evaluate VPC in children with CL/P.
Surgical Planning
AI has been applied to surgical planning in CL/P patients in various novel ways. The methods, aims, and outcomes of these AI applications differ significantly between studies, varying in their emphases on automation, accuracy, preoperative diagnosis, and postoperative evaluation.
One of the most significant presurgical treatment planning contributions is by Schnabel et al, 18 who developed an end-to-end pipeline for producing DL and geometry processing-based presurgical orthopaedic plates. Their framework inputs intraoral scans, AI-landmarks them, and subsequently designs 3D-printable plates tailored to the patient's specific neonatal CL/P anatomy. This pipeline streamlines the traditionally manual, skill-based process and has been applied in clinical settings, proving its point-of-need translational value in surgical preparation processes.
Sayadi et al 19 conducted a similar study examining the automated detection and placement of 21 anatomic landmarks, which represent important anthropometric features crucial for understanding cleft nasolabial anatomy and designing various types of nasolabial repairs essential for unilateral cleft lip surgical planning. The AI could reliably mark both images and live video feeds (across a wide range of viewing angles), automating key steps in cleft lip surgical design.
Other studies extended AI applications from preparation to intraoperative and postoperative fields. Seo et al 20 applied AI-assisted 3D landmark auto-digitization to evaluate facial soft tissue changes after bimaxillary orthognathic surgery in patients with cleft. This enabled high-resolution, reproducible measurement of the surgical outcome with feedback available for future surgical planning and technique optimization in terms of both hard and soft tissue outcomes.
In addition, Miranda et al 21 developed an explainable AI classifier to assess the severity of alveolar bone defects from 3D surface models derived from cone-beam CT scans. Visual justification and classification were provided by the model using attention-based interpretability techniques, a step towards explainable clinical decision support systems in surgical planning.
Patient Education
There is an increasing role for AI, specifically LLMs like ChatGPT, in supplementing patient education for cleft lip and/or palate. These AI applications are being implemented in increasingly sophisticated ways, ranging from enhancing the readability and accessibility of patient education information to developing structured support systems for patients and caregivers. Our search yielded 6 studies using AI for patient education regarding cleft lip and or palate.
Fazilat et al 22 compared ChatGPT-3.5-generated answers with those from established academic sources in response to common queries on CL/P surgery. Notably, both laypeople and surgeons favored AI-generated content due to its clarity and comprehensiveness, although readability remained above the recommended sixth-grade level for all sources examined. Likewise, Shehab et al 23 utilized ChatGPT-4 to translate available cleft surgery patient education materials from American Cleft Palate Association-related websites with great success, lowering their grade level from ninth to sixth while improving their clarity scores; however, they observed underuse of interactive learning resources like videos or audio clips.
Zhang et al 24 focused on ChatGPT rephrasing postoperative instructions for plastic surgical procedures, including CL/P surgery, to increase readability. They concluded AI was able to lower the reading level to a sixth-grade level but not a fourth-grade level without compromising medical accuracy. The study emphasizes sixth-grade readability as the best target to maximize comprehension without losing crucial information.
Chaker et al 25 explored the application of AI in caregiver instruction following CL/P surgery. They posed standard postoperative questions to ChatGPT and compared its responses with those provided by seasoned pediatric plastic surgeons. While ChatGPT achieved a 69% accuracy rate, it sometimes failed to provide context-specific finesse (eg, in answering questions regarding surgical methods and certain feeding restrictions). Despite this, the authors presented an AI-enhanced model that could streamline content generation and deliver consistent support, aiming to reduce caregiver stress.
Mahedia et al 26 also focus on ChatGPT as a patient education aid. Taking generated responses to common patient questions and concerns related to cleft lip repair, the authors analyzed the quality, clarity, relevance, and trustworthiness of the AI-generated responses to determine how ChatGPT improves patient understanding. Clarity and content quality received the highest ratings, while trustworthiness had the lowest rating. Although the authors concluded that ChatGPT showed promise in supplementing patient education, they emphasized how language models should only complement, not replace, traditional physician-led discussions.
Manasyan et al 27 assessed AI's role in improving materials related to alveolar bone grafting. ChatGPT was shown to significantly enhance readability (from an eighth-grade to a sixth-grade level) and understandability, though actionability remained limited. Their findings align with those of other studies, which advocate for better alignment of patient materials with literacy standards.
Finally, Vallurupalli et al 28 evaluated ChatGPT 3.5 as an effective tool for simplifying craniofacial education material. They demonstrated that AI-generated simplification was equivalent to traditional readability calculators, consistently decreasing complexity enough to meet AMA/NIH standards. This places a perspective on the use of AI as a standard part of patient education materials dissemination across institutions.
Discussion
Interpretation of Findings
This scoping review reveals a growing role for AI in CL/P care, particularly across the 3 domains of diagnosis/detection, surgical planning, and patient education. While AI has already demonstrated maturity and successful clinical integration in other specialties such as radiology, dermatology, and ophthalmology,29–31 where DL models are now FDA-approved and routinely used for diagnostics, the use of AI in cleft care remains largely exploratory.
The reviewed studies demonstrate encouraging technical performance, such as diagnostic models achieving accuracies exceeding 90% using CNNs 12 and long short-term memory-based DRNNs. 16 Similarly, surgical planning tools and presurgical devices can automate complex preoperative workflows, from nasolabial anthropometry 19 to presurgical orthopedic plate fabrication. 18 While these advances are promising, they have not been externally validated and rely on small, homogeneous datasets, raising concerns about generalizability.
Beyond image-based detection, this review also identified emerging AI applications in speech analysis. Wang et al 16 demonstrated that AI can detect hypernasality and articulatory impairment from audio input with a detection accuracy of 93.35%. This suggests that AI may have future utility as an adjunctive screening tool in cleft-related speech assessment, particularly in settings with limited access to specialized speech-language pathology expertise. However, the findings are mixed. Cornefjord et al 17 developed an ANN to automatically assess VPC using retrospective speech recordings, but reported accuracies of only 40% to 60%, which were insufficient to replace assessment by trained clinicians. Future research should therefore prioritize prospective validation of speech-based AI tools and inclusion of larger and more diverse speech datasets, to determine whether these systems can enhance, rather than replace, clinician assessment.
AI's application in patient education, particularly through LLMs like ChatGPT, reflects a novel direction aimed at improving health literacy and care accessibility.24,27,28 These tools succeeded in reducing the reading level of patient-facing materials to the sixth-grade benchmark set by the American Medical Association and National Institutes of Health. 28 However, studies consistently note limitations in “actionability” and clinical nuance, reinforcing the importance of human oversight in clinical communication.
Limitations and Future Research
Several limitations were identified across the reviewed studies, highlighting key areas for future research. A significant constraint is the reliance on large, annotated datasets to train AI algorithms, which poses challenges due to limited availability and unbalanced datasets with few positive samples.12,16,19 Additionally, the generalizability of findings remains limited. Some pipelines were trained using narrow datasets, such as educational materials only from ACPA-approved teams, which may not reflect real-world diversity in patient populations or health literacy levels.23,24,27 Many studies also focused solely on specific cleft types, reducing applicability across the full spectrum of CL/P conditions. 18
Furthermore, AI tools like ChatGPT are constantly evolving, so some studies did not use the latest version available, potentially limiting performance.25,28 The need for specific training parameters, 13 morphological variability among patients (such as alveolar bone defects), 21 and the challenge of segmenting complex anatomy with high accuracy all contribute to limitations in current model performance. While quantitative assessments like landmark accuracy offer a practical approach for evaluation, their clinical significance remains uncertain. 18 Future studies should aim to expand datasets, incorporate diverse patient populations, and validate models across all cleft types to enhance generalizability and clinical utility.15,20,22,28
Beyond identification of some inaccuracies when it comes to clinical care suggestions, 25 no included studies discussed the risk of AI generating plausible-sounding but inaccurate or unsupported patient information. Examples of this type of confabulation include fabricated references, inconsistent statements, or recommendations that conflict with established guidelines. This limitation underscores the need for clinician oversight and verification of AI-generated outputs against reliable sources before clinical use.
Conclusion
AI holds significant promise in enhancing cleft lip and/or palate care across diagnostic, planning, and educational domains. However, despite the apparent potential, AI use in cleft care remains in its early stages of development. As identified in this scoping review, current limitations include a lack of clinical validation, limited dataset diversity, and inconsistent methodology. With improvements in clinical validation, equitable dataset development, and interdisciplinary collaboration, AI has the potential not only to match the advances seen in other specialties but to redefine standards in cleft care.
Footnotes
Ethical Considerations
This article conforms to the guidelines set forth by the Helsinki Declaration in 1975.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
