Sage Journals: Discover world-class research

Abstract

Deep learning continues to advance imaging-based diagnosis in oral and maxillofacial radiology. This narrative review has synthesized recent deep learning applications for detecting, classifying, and segmenting jaw cystic lesions and maxillofacial tumors on panoramic radiographs and cone-beam computed tomography scans. It has summarized representative one-stage detectors and convolutional neural network/transformer-based classifiers, along with segmentation methods, reported performance metrics, and key use-case considerations. In addition to this synthesis, the review has critically examined dataset constraints, spectrum and site bias, device-related heterogeneity, annotation inconsistency, and gaps in model explainability as well as described how these limitations restrict generalizability. Practical considerations for clinical implementation are also discussed, including workflow placement, quality assurance, and governance, followed by emerging research directions such as federated learning, multimodal fusion, and radiomics–deep learning combinations, each evaluated in terms of feasibility and current evidence maturity. Key evaluation metrics are interpreted in the context of dental imaging. Overall, current findings suggest that deep learning may enhance early and consistent recognition of jaw lesions, support surgical planning through automated delineation, and promote standardized interpretation, provided that models undergo external validation, reporting remains transparent, and deployment is guided by appropriate clinical oversight.

Keywords

Jaw cystic lesion deep learning cone-beam computed tomography neural network computer-aided diagnosis

Introduction

Odontogenic cysts and tumors of the jaws constitute a substantial portion of oral and maxillofacial pathologies and are second in prevalence only to dental impactions.¹ These lesions, which include periapical cysts, dentigerous cysts, odontogenic keratocysts (OKCs), and benign tumors such as ameloblastomas, often progress insidiously, with many remaining asymptomatic until they enlarge sufficiently to cause swelling, tooth displacement, or even pathologic fracture.² Early and accurate diagnosis is essential because management varies considerably by lesion type; for instance, an OKC is typically treated with conservative enucleation, whereas an ameloblastoma frequently requires more extensive resection due to its aggressive nature. Misdiagnosis is a common clinical challenge because these lesions can appear radiographically similar on routine examinations.^3–5 For instance, both OKCs and ameloblastomas often present as radiolucent jaw lesions, and distinguishing them on a panoramic radiograph (PR) can be challenging even for experienced clinicians.⁶ In clinical practice, such diagnostic errors may result in inappropriate treatment—either overtreatment of a cystic lesion or insufficient surgery for a tumor—with substantial implications for patient outcomes.^7,8

Diagnostic imaging is essential for the early detection and characterization of jaw lesions. PR is widely used as an initial screening modality in dentistry and can reveal asymptomatic radiolucent lesions during routine examinations.^9–11 However, interpretation of PR is often hindered by overlapping anatomical structures and projection distortions that may obscure true pathology or create misleading appearances. Cone-beam computed tomography (CBCT) is increasingly employed for three-dimensional assessment of maxillofacial pathology and provides improved visualization of lesion boundaries, internal features, and proximity to critical anatomical structures.^12,13 CBCT can identify subtle osseous changes that may not be apparent on PR and offers multiplanar views that aid in surgical planning.^14,15 Although advanced imaging techniques such as multidetector computed tomography (CT) and magnetic resonance imaging (MRI) are used for large tumors, suspected malignancy, and soft-tissue involvement, PR and CBCT remain the primary imaging modalities for most odontogenic lesions.

Despite the availability of these imaging tools, challenges remain in consistently diagnosing jaw cysts and tumors. Radiographic features are often equivocal, and image interpretation is subject to considerable interobserver variability.¹⁶ Even specialists may disagree on whether a radiolucency represents an OKC or ameloblastoma without biopsy confirmation. In this context, artificial intelligence (AI), particularly deep learning (DL), has the potential to improve diagnostic accuracy and efficiency. DL, a subset of machine learning that uses multilayered artificial neural networks, has achieved notable success in medical image analysis over the past decade. Convolutional neural networks (CNNs), which are loosely modeled on the human visual cortex, can automatically learn complex imaging patterns. In fields such as radiology and pathology, they have achieved expert-level performance in detecting abnormalities when trained on large annotated datasets.^17–19 The oral and maxillofacial imaging field has similarly experienced rapid growth in DL applications using dental radiographs and CBCT scans for various tasks such as caries detection and cephalometric landmark identification, with promising outcomes.^20,21 Several studies have also investigated CNN-based approaches for identifying and classifying jaw lesions on PRs.⁷ Early findings indicate that AI may assist clinicians by highlighting potential lesions and suggesting likely diagnoses, thereby supporting clinical decision-making.

This narrative review critically examines recent DL applications for imaging-based diagnosis of jaw cystic lesions, with a focus on methodological rigor, clinical relevance, and existing evidence gaps. It is intended as a reference for dental researchers, radiologists, and clinicians seeking an overview of current AI approaches in jaw lesion diagnosis and aims to identify opportunities for future improvements that may enhance patient safety and treatment effectiveness.

Methods

This review was conducted following the Scale for the Assessment of Narrative Review Articles (SANRA) guidelines for narrative reviews.²² Literature searches were performed in PubMed and Scopus from January 2015 to August 2025 using combinations of the following keywords: (“jaw” OR “maxillofacial”) AND (cyst* OR tumor* OR lesion*) AND (panoramic OR orthopantomogram OR CBCT) AND (deep learning OR CNN OR transformer OR segmentation OR detection OR classification). Studies were included if they applied DL to PR or CBCT for the detection, classification, or segmentation of jaw lesions or tumors. Nonimaging studies, purely methodological investigations, and studies not involving DL were excluded. Two authors independently screened titles and abstracts, with disagreements resolved through discussion. Extracted data included imaging modality, task, model architecture, dataset size, validation strategy (including external validation if reported), and performance metrics. This review emphasizes critical appraisal and clinical implications and does not aim to provide a fully systematic summary of all published studies in this field.

DL applications in jaw cystic lesions

Recent research has proposed a range of DL models designed to detect and diagnose odontogenic cysts and related jaw lesions on imaging. The term “jaw cystic lesions” generally encompasses common entities such as radicular cysts, dentigerous cysts, OKCs, nasopalatine duct cysts, and simple bone cysts, many of which appear as radiolucent areas in the jaws. Because these lesions are often radiographically similar, AI faces a twofold task: first, detecting the presence or location of any lesion on an image (object detection), and second, classifying the lesion into the appropriate type (diagnosis). Some studies also incorporate a segmentation step to delineate lesion boundaries for visualization or volumetric analysis. Notable studies and their approaches to these tasks are summarized below.²³

Object detection and localization of jaw lesions

Localizing a lesion on a PR can be challenging because the image is large (often ∼3000 × 1500 pixels) and the lesion may occupy only a small region.²⁴ Modern one-stage detectors, such as YOLO family and RetinaNet, have largely replaced sliding-window approaches. Across representative studies, high box-level performance has been reported; for example, a YOLOv3 model trained on 1282 PRs achieved ∼0.87 precision for lesion boxes,²⁵ and subsequent YOLO iterations for mandibular radiolucencies reported ∼0.95 precision with ∼0.94 recall following data augmentation.⁶ These results indicate that the models rarely miss lesions and generate few false alarms on the test sets, demonstrating strong performance for automated X-ray analysis.

YOLO’s advantage lies in its ability to perform real-time detection; in a previous study, YOLO could evaluate a batch of 181 panoramic test images in real time, whereas human experts required over half an hour.²⁶ Two-stage pipelines, such as detector followed by U-Net, remain useful when precise contour delineation is needed, but one-stage models typically provide simpler, near-real-time triage.^27,28 Choice between two-stage and one-stage approaches is task-dependent: when the clinical goal is to flag any potential lesion for secondary review, a robust one-stage detector is often sufficient; when planning requires lesion shape or extent, a subsequent segmentation step adds value. Tajima et al.²⁹ validated YOLOv2 on small datasets, achieving 84.0% sensitivity and 85.8% specificity for cyst-like radiolucencies, demonstrating that optimized small-sample training can mitigate performance loss.

Yang et al.’s YOLOv2 model, trained on 1603 PRs, outperformed human clinicians in precision (70.7%) and recall (68.0%), with a diagnostic accuracy of 66.3%, comparable to that of oral surgeons, indicating that AI can efficiently approach expert-level detection.²⁶ However, most studies are single‑center and rely on per‑lesion average precision (AP) rather than per‑patient outcomes; dataset splits are sometimes performed at the image level rather than the patient level, risking data leakage. Domain shift due to different vendors, acquisition parameters, or metal artifacts is rarely evaluated. Few studies provide probability calibration or decision‑curve/net‑benefit analyses to determine whether alerts aid clinicians. Consequently, high AP values reflect technical capability under controlled conditions rather than clinic‑ready performance.

For clinical translation, detection studies should (a) report patient‑level sensitivity, specificity, and time saved alongside per‑lesion metrics; (b) include multicenter external validation with scanner and vendor stratification; (c) provide reliability plots/expected calibration error and specify operating thresholds for reported claims; and (d) quantify the review burden of false positives. Incorporating these elements can convert strong technical performance into interpretable value for triage and worklist prioritization in PR evaluation.

Classification and diagnosis of lesion types

After lesion detection, the subsequent task is determining lesion type. Many DL studies have focused on classifying jaw lesions into diagnostic categories using either entire images or localized regions containing the lesion.³⁰ CNN classifiers were among the earliest approaches, typically requiring the lesion to be approximately centered in the image or provided as input. For example, Poedjiastoeti et al.³¹ adapted the Visual Geometry Group (VGG)-16 model on ∼400 PR crops to differentiate ameloblastomas from OKCs with high screening accuracy. Using larger cohorts and more advanced backbones, Lee et al.³² applied Inception-v3 to combined PR/CBCT inputs to distinguish OKCs, dentigerous cysts, and periapical cysts, reporting ∼80%–90% accuracy. Transfer learning and data augmentation consistently enhance performance; a simple CNN trained from scratch achieving ∼78% accuracy can improve to an accuracy of >90% with pretraining and robust augmentation on small datasets.^33,34 This improvement highlights that pretraining on large datasets, even nonmedical images, provides networks with general feature-extraction capabilities that are valuable when data are limited. Analytically, strong results reported using mixed PR and CBCT inputs should be interpreted cautiously, as region of interest (ROI) pre‑selection and cross-modality “shortcuts” can artificially inflate apparent generalization. For clinically asymmetric risks, such as missing an OKC, cost‑sensitive thresholds and an “uncertain—refer” option are preferable to obtain forced single‑label outputs. Studies should routinely report macro/micro-F1 scores, Cohen’s κ, confusion matrices, and calibration to support clinical interpretation. Data augmentation remains equally critical; for instance, Kwon et al.²⁵ expanded their training set by 12-fold using flips, rotations, and other transformations, which significantly improved YOLO model’s sensitivity and specificity for jaw lesion detection, emphasizing augmentation’s role in mitigating class imbalance and small sample sizes.

Multiple studies have investigated multiclass classification of jaw lesions, which is more challenging than binary classification due to overlapping radiographic features. A meta-analysis by Shoorgashti et al.³⁵ reported that AI models for OKC detection achieved an overall sensitivity of 83.7% and specificity of 82.9%, with YOLO-based models reaching 96.4% sensitivity and 96.0% specificity, demonstrating their effectiveness on real-world radiographs. Fedato et al.³⁶ similarly highlighted AI’s strong diagnostic capability for odontogenic lesions while emphasizing study heterogeneity and the need for standardized evaluation methods. Some studies reported area under the curve values as high as 0.95 for specific cyst types, whereas others observed 0.70–0.80 accuracy in more complex scenarios. Overall, AI models trained on high-quality datasets can achieve classification accuracy exceeding 80%–90% for jaw cysts.

Multi‑class classification is more challenging than binary decisions. A two‑branch CNN achieved an average accuracy of 88.7% across four categories (dentigerous cyst, periapical cyst, OKC, and ameloblastoma), with a mean sensitivity of ∼66.6% and higher specificity of ∼92.7%; when simplified to lesion-versus-healthy classification, the accuracy increased to ∼90.7%.⁷ Cascade designs that first detect and then classify, such as MobileNetv2 + YOLOv3, outperform classification-only baselines for apical radiolucency subtyping.^25,26,37 These studies reported improved performance using the two-stage approach compared with classification alone, highlighting the synergistic effect of detection and classification. Although precision/recall values were not explicitly stated in the reports, their inclusion in this review suggests strong performance on the test sets.

Ensemble and hybrid models have also been investigated. Liu et al.³⁸ proposed a hybrid VGG-19/ResNet-50 model for ameloblastoma–OKC classification; however, its performance was not directly compared with single VGG-19 or ResNet-50 models, leaving it unclear whether the ensemble design offered advantages beyond individual architectures. These innovative approaches illustrate the field’s evolution from using off-the-shelf CNNs toward developing task-specific networks or hybrid combinations that more effectively capture the nuances of jaw imaging (Figure 1).

Figure 1.

Illustration of the deep learning (DL) training process for jaw cystic lesion recognition. DL: deep learning.

Segmentation of cystic lesions

Segmentation, which involves delineating lesion boundaries, is less frequently the primary objective but is included as a component in several studies.³⁹ Accurate segmentation of a jaw cyst can provide precise information on its size, shape, and volume, which is clinically valuable for surgical planning and follow-up. Although only a few studies employed DL exclusively for jaw lesion segmentation, many incorporated segmentation following detection or for visualization purposes.^40–42

The U-Net architecture is the predominant model for medical image segmentation due to its encoder–decoder design, which enables precise localization while preserving contextual information. In jaw imaging, U-Net and its variants have demonstrated strong performance even with limited datasets by leveraging data augmentation and pretraining.^42,43 Kirnbauer et al.⁴⁴ proposed a two-step approach for periapical lesion analysis on CBCT: first, the tooth and relevant region were identified using a Spatial Configuration-Net, followed by binary segmentation of the lesion using an improved U-Net. This method achieved 97.1% sensitivity and 88.0% specificity for lesion detection on CBCT and reported a high mean Dice coefficient, reflecting overlap between AI segmentation and ground truth. These results indicate that once the ROI was located, the U-Net accurately delineated lesions on CBCT slices. Furthermore, Kirnbauer’s pipeline achieved a “successful diagnosis rate” of up to 97% for dental localization, demonstrating that the method rarely missed lesions when present.

A notable segmentation-focused study by Xu et al. employed a Mask Region-based CNN (R-CNN) to automatically segment ameloblastomas on CT images.¹⁶ Despite a limited training set of 79 cases, extensive data augmentation and cross-validation were applied. The model achieved a Dice coefficient of 0.874 for ameloblastoma volume delineation, indicating high segmentation accuracy. Detection performance, evaluated using AP at an Intersection over Union (IoU) threshold of 0.5, was 91.4%, showing that the model correctly identified lesion regions in the majority of cases. Importantly, external validation was performed on 200 CT images from a separate center, demonstrating strong generalization and providing confidence that the model’s performance is not restricted to the original scanner or patient population.

Mask R-CNN, as applied by Xu et al. and Yeshua et al., is an effective instance segmentation framework that generates both bounding boxes and pixel-level masks.^16,45 Yeshua et al. employed the model on 3D CBCT data to detect maxillofacial bone lesions, achieving a per-slice detection sensitivity of 95.9% and precision of 98.9%, with a 3D segmentation Dice coefficient of 83.5%. The high precision reflects minimal false positives, enabling accurate computation of lesion volumes, which supports diagnosis and follow-up. The Dice scores are consistent with Xu et al.’s results for ameloblastomas, demonstrating Mask R-CNN’s reliable performance in jaw lesion segmentation when adequate training data are available.

An innovative variation on U-Net is the Dense U-Net with anatomical constraints. Zheng et al.³⁰ introduced an anatomically constrained Dense U-Net that incorporated oral anatomical knowledge into the segmentation process. This approach allowed good performance even with a small training dataset, outperforming a standard Dense U-Net in both detection accuracy and Dice coefficient by leveraging known anatomical constraints. The study suggests that integrating domain knowledge with DL can guide models and reduce errors that violate anatomical plausibility. However, the Dice coefficient alone may conceal boundary inaccuracies that could impact surgical planning. Studies should also report metrics such as HD95 and relative volume error, as 2D stacking may compromise 3D topological consistency. Direct 3D architectures or the inclusion of shape and anatomical priors may yield more reliable volumetric outputs. Additionally, external validation cohorts remain limited, and linking segmentation accuracy to downstream clinical outcomes, such as operative windows or recurrence, would enhance clinical relevance.

Collectively, applications in jaw cystic lesions demonstrate that DL can (a) flag radiographs or CBCT scans containing a lesion as a screening aid, (b) suggest the likely diagnosis for decision support, and (c) delineate the lesion for measurement and visualization to assist surgical planning (Figure 2). The synergy of detection, classification, and segmentation is evident: studies combining these tasks often report that each step facilitates the others. For instance, performing segmentation after detection can improve classification accuracy by focusing analysis on the lesion region, while knowledge of the lesion class can, in turn, enhance segmentation performance.

Figure 2.

Demonstration of a DL model for jaw cystic lesion recognition. DL: deep learning.

Sources of bias and external validity

Reported diagnostic performance can be influenced by multiple sources of bias, including spectrum and site bias from single-center data, device heterogeneity due to different scanners or acquisition settings, class imbalance, and variability in expert annotations. Small datasets and the absence of external validation further increase the risk of overfitting and inflated performance metrics. To contextualize the results, studies should report detailed cohort characteristics, conduct cross-center evaluations, and include uncertainty estimates where feasible. These considerations are crucial for assessing the readiness of models for clinical deployment.

Clinical integration and limitations

For clinical adoption, three key questions are central: 1. Will the model reduce missed lesions without generating an unacceptable number of false alarms; 2. Does it save net time in PR/CBCT reading under calibrated thresholds; and 3. Is performance stable across scanners, sites, and patient subgroups after deployment? Building on these considerations, limitations and potential remedies can be organized into five areas: data diversity (multi‑center curation), robustness (external validation and drift monitoring), decision‑making (probability calibration, cost‑sensitive thresholds, and “uncertain—refer”), explainability aligned with radiographic signs, and governance (quality assurance, privacy, and regulatory).

Limited data availability and class imbalance

A key limitation in jaw lesion research is the scarcity of large, diverse datasets. Many institutions encounter only a few cases of specific cysts or tumors annually, and assembling thousands of annotated images typically requires multicenter collaboration. As noted, half of the reviewed studies included fewer than 500 images.^16,29,31 Models trained on such small datasets exhibit the risk of overfitting, performing well on seen cases but poorly on new patients. Variations in study inclusion criteria further complicate generalization. The issue of class imbalance is closely related: rarer lesions, such as Stafne bone cysts or central giant cell granulomas, may be underrepresented, causing models to favor more common classes.^46,47 Data augmentation partially mitigates this by synthetically increasing minority-class samples, but it cannot introduce truly new pathology patterns and only perturbs existing ones. Shi et al.³⁰ observed category imbalance in many datasets and highlighted augmentation as a frequent remedy. Although augmentation improves model robustness in some cases, it does not replace the need for truly diverse data.

Generalizability and external validation

Generalizability refers to a model’s performance on data outside its training distribution, such as images acquired with different equipment, settings, or populations. Only a few studies have conducted rigorous external validation. Yeshua et al.⁴⁵ evaluated their Mask R-CNN on a separate cohort and maintained high Dice and detection metrics. Xu et al.¹⁶ tested their model on CT scans from another center, confirming robustness. Although these results are encouraging, additional external validation studies are needed. Publication bias further complicates assessment: studies reporting favorable results are more likely to be published, whereas those with poor generalization may remain unpublished, potentially skewing perceptions of AI performance. In Shoorgashti et al.’s meta-analysis, Egger’s test indicated possible publication bias (p = 0.042), suggesting that aggregated performance metrics may overestimate the capabilities of an unbiased average model.³⁵

Lack of explainability

Current DL models often function as black boxes. For many clinicians, especially in fields such as surgery or radiology where nuanced interpretation can influence management, obtaining a result without an explanatory rationale can be unsettling. For instance, an AI system may label a lesion as “OKC with 90% confidence,” but a surgeon would want to understand the basis for this prediction—did the model recognize features such as a scalloped border or minimal expansion, or was the decision influenced by irrelevant factors like image artifacts? Trust in such outputs is difficult to establish without explanation.

Efforts to improve interpretability include visualization tools such as Gradient-weighted Class Activation Mapping (Grad-CAM), a CNN technique that requires no model modification or retraining. Grad-CAM produces a heatmap in which warmer colors indicate regions most influential to the model’s decision, while cooler colors correspond to less relevant areas. This approach helps bridge the “black box” gap, allowing clinicians to verify that the model focuses on meaningful clinical features rather than artifacts when diagnosing jaw lesions.⁴⁸ Some studies applied Grad-CAM to confirm that the CNN concentrated on the lesion area rather than extraneous regions during classification.⁴⁹ However, these heatmaps have limitations: if the model misclassifies, the highlighted region may be off-target or misleading, and Grad-CAM does not specify features. It does not convey reasoning such as “the decision was based on the lesion’s scalloped margins and epicenter in the ramus,” which a radiologist would typically provide.

Data privacy and regulatory concerns

Medical images constitute protected health information, and sharing them for AI development raises privacy concerns. Strict regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States and the General Data Protection Regulation (GDPR) in Europe, often complicate multi-source data aggregation.^50–52 This challenge motivates the use of federated learning, in which images remain on local servers and only model weights or gradients are shared, enabling collaborative training without exposing patient data.⁵³ Federated learning has emerged as a promising strategy in dentistry for overcoming data silos while maintaining privacy. Early studies indicate that dental AI models can be trained in a federated manner with performance close to that of traditional centralized training.⁵⁴ Nonetheless, this approach introduces additional complexity in coordination and regulatory oversight.

Acknowledging limitations does not diminish the achievements of AI. The ability of AI to detect small jaw lesions with high fidelity or differentiate morphologically similar cysts at a level comparable to experts demonstrates substantial research value. In total, 12 DL studies met the inclusion criteria for jaw-related imaging tasks, of which 9 primarily addressed detection and/or classification, and 5 focused on PR/CBCT-based segmentation; several studies contributed to more than one task. As part of our critical evaluation, Table 1 shows representative DL studies on jaw lesions, highlighting their core contributions and primary limitations in line with the discussions presented in this section.

Table 1.

Representative DL studies on jaw lesions and their key limitations.

Reference	Modality	Task	n / dataset note	Model	Key metrics	Main limitations
Poedjiastoeti et al.,³¹ 2018	PR	Classification	∼400 crops	VGG-16 (TL)	High screening accuracy	Small dataset; crop selection bias
Ariji et al.,²⁷ 2019	PR	Detect + Classify	Per paper	Object detector	Strong localization (per paper)	Heterogeneous labels; no external validation
Kwon et al.,²⁵ 2020	PR	Detection	1282 PRs	YOLOv3	Precision ≈ 0.87	Single-center; no external validation
Yang et al.,²⁶ 2020	PR	Detect + Classify	1603 PRs	YOLOv2	Acc, 66.3%; Prec, 70.7%; Rec, 68.0%	Limited classes; single-center
Lee et al.,³² 2020	PR+CBCT	Classification	PR + CBCT	Inception-v3	Acc, ∼80%–90%	Cross-modality shortcut risk; ROI dependency
Tajima et al.,²⁹ 2022	PR	Detection	Small dataset	YOLOv2	Sens, 84.0%; Spec, 85.8%	Small-sample risk; spectrum bias
Kirnbauer et al.,⁴⁴ 2022	CBCT	Detection	Periapical lesions	DCNN	Sens, 97.1%; Spec, 88.0%	Label variability; single-center
Rašić et al.,⁶ 2023	PR	Detection + segmentation	Lower jaw PRs	YOLO-style	Precision ≈ 0.95; Recall ≈ 0.94	Retrospective; device variability untested
Ver Berne et al.,³⁷ 2023	PR	Detect + Classify	Per paper	Cascade DL	Improved vs. classification-only	Generalizability limited
Yeshua et al.,⁴⁵ 2023	CBCT	Detect + 3D segmentation	Mixed lesions	Mask R-CNN	Per-slice sens, ∼95.9%; Prec, ∼98.9%; 3D Dice, ∼0.835	Slice-stacking; topology consistency
Liu et al.,⁴² 2024	CBCT	Classify + Segmentation	Multiclass	DL pipeline	Task-specific metrics	Needs external validation
Xu et al.,¹⁶ 2024	CT	Segmentation	79 train + 200 ext-val	Mask R-CNN	Dice ≈ 0.874; AP@0.5 ≈ 0.914	Limited training size

Acc: accuracy; CBCT: cone-beam computed tomography; CNN: convolutional neural network; CT: computed tomography; DCNN: deep convolutional neural network; DL: deep learning; PR: panoramic radiograph; Prec: Precision; Rec: recall; ROI: region of interest; Sens: sensitivity; Spec: specificity; YOLO: You Only Look Once; 3D” three‑dimensional.

Future directions

Short- to mid-term progress in jaw lesion imaging is likely to arise from multicenter curation of diverse datasets, transparent reporting, and externally validated models prospectively tested within clinical workflows. Promising technical approaches, including federated learning, multimodal fusion, and radiomics–DL hybrids, may enhance model robustness, but their clinical utility depends on governance, calibration, and sustained post-deployment monitoring. Explainability methods should advance beyond heatmaps toward clinically meaningful rationales aligned with dental radiology practice. Ultimately, successful integration will require regulatory compliance, attention to human factors, and demonstration of additive value compared with standard care.

Conclusion

DL demonstrates substantial potential in assisting the detection and delineation of jaw cystic lesions and maxillofacial tumors on PR and CBCT. When developed with diverse datasets and externally validated, these tools may facilitate earlier and more consistent diagnosis and inform surgical planning. Safe and effective clinical integration requires transparent reporting, appropriate governance, and prospective evaluation within real-world workflows. However, limitations in this review—including the absence of quantitative synthesis, restriction to partial imaging modalities, and inconsistent study methodologies—introduce potential result bias, limit insights into practical application, and hinder comparison of AI architectures, highlighting the need for standardized research reporting to improve future work.

Footnotes

Acknowledgments

Prof Kaijin Hu assisted with English language polishing, limited to grammar and style; all authors reviewed and approved the final manuscript.

Author contributions

Conceptualization: B.Z., Y.L., and C.L.; Methodology and investigation: B.Z. and Y.L.; Writing—original draft: B.Z. and Y.L.; Writing—review & editing: J.S., S.L., and C.L.; Supervision: C.L.

Data availability statement

No new data were generated or analyzed in this study. Figures 1 and are author-created schematic illustrations that contain no third-party copyrighted material and no identifiable patient information.

Declaration of conflicting interest

The authors declare that they have no conflicts of interest.

Funding

This work was supported by the Key Research and Development Program of Shaanxi Province-Key Industry Innovation Chain (Group)-Social Development Field under No. 2024SF-ZDCYL-01-15.

ORCID iD

Yuan Li

References

Villegas

Paparella

ML.

Malignant odontogenic tumors. A report of a series of 30 cases and review of the literature. Oral Oncol 2022; 134: 106068.

Perry

Tkaczuk

Caccamese

, et al. Tumors of the pediatric maxillofacial skeleton: a 20-year clinical study. JAMA Otolaryngol Head Neck Surg 2015; 141: 40–44.

Golob Deeb

Deeb

Schafer

DR.

Odontogenic keratocyst is frequently misdiagnosed for a lateral periodontal cyst in premolar and anterior tooth-bearing areas. J Endod 2022; 48: 337–344.

Marcucci

Serrano

Campos

, et al. Langerhans cell histiocytosis simulating endodontic periapical lesion: are we prepared to diagnose and manage it? A case report. Natl J Maxillofac Surg 2022; 13: 294–297.

Delai

Bernardi

Felippe

, et al. Florid cemento-osseous dysplasia: a case of misdiagnosis. J Endod 2015; 41: 1923–1926.

Rašić

Tropčić

Karlović

, et al. Detection and segmentation of radiolucent lesions in the lower jaw on panoramic radiographs using deep neural networks. Medicina (Kaunas) 2023; 59: 2138.

Feng

, et al. Deep learning based diagnosis for cysts and tumors of jaw with massive healthy samples. Sci Rep 2022; 12: 1855.

Zhang

Tao

KX.

(Pay attention to misdiagnosis and differential diagnosis of gastric gastrointestinal stromal tumor). Zhonghua Wei Chang Wai Ke Za Zhi 2021; 24: 758–761.

Hingst

Weber

MA.

(Diagnostic imaging of drug-induced osteonecrosis of the jaw). Radiologe 2018; 58: 935–948.

10.

Cotti

Schirru

Present status and future directions: imaging techniques for the detection of periapical lesions. Int Endod J 2022; 55: 1085–1099.

11.

Wongratwanich

Shimabukuro

Konishi

, et al. Do various imaging modalities provide potential early detection and diagnosis of medication-related osteonecrosis of the jaw? A review. Dentomaxillofac Radiol 2021; 50: 20200417.

12.

Ahmad

Freymiller

Cone beam computed tomography: evaluation of maxillofacial pathology. J Calif Dent Assoc 2010; 38: 41–47.

13.

Muttanahally

Sheppard

Yadav

, et al. The utility of cone beam computed tomography scans in diagnosing and treating anterior lesions of the maxilla and mandible. Cureus 2024; 16: e52804.

14.

Salemi

Shokri

Mortazavi

, et al. Diagnosis of simulated condylar bone defects using panoramic radiography, spiral tomography and cone-beam computed tomography: a comparison study. J Clin Exp Dent 2015; 7: e34–e39.

15.

Santos

Yamamoto-Silva

Torres

, et al. Contribution of cone-beam computed tomography in the decision of surgical management for bone lesions of the maxillofacial region. J Craniomaxillofac Surg 2019; 47: 87–92.

16.

Qiu

, et al. Automatic segmentation of ameloblastoma on CT images using deep learning with limited data. BMC Oral Health 2024; 24: 55.

17.

Seong

Pae

Park

HJ.

Geometric convolutional neural network for analyzing surface-based neuroimaging data. Front Neuroinform 2018; 12: 42.

18.

Gauthier

Tarr

MJ.

Visual object recognition: do we (finally) know more now than we did?

Annu Rev Vis Sci 2016; 2: 377–396.

19.

YY.

(Role of artificial intelligence in the diagnosis and treatment of gastrointestinal diseases). Zhonghua Wei Chang Wai Ke Za Zhi 2020; 23: 33–37.

20.

Thurzo

Kosnáčová

Kurilová

, et al. use of advanced artificial intelligence in forensic medicine, forensic anthropology and clinical anatomy. Healthcare (Basel) 2021; 9: 1545.

21.

Ezhov

Gusarev

Golitsyna

, et al. Clinically applicable artificial intelligence system for dental diagnosis with CBCT. Sci Rep 2021; 11: 15006.

22.

Baethge

Goldbeck-Wood

Mertens

SANRA-a scale for the quality assessment of narrative review articles. Res Integr Peer Rev 2019; 4: 5.

23.

Rashmitha, Manjunath

Kulkarni

, et al. Segmentation and volumetric analysis of heart from cardiac CT images. Cardiovasc Eng Technol 2024; 15: 383–393.

24.

Tosun

Kumbasar

Sumbullu

, et al. Evaluation of the effectiveness of artificial intelligence models in radiopaque and radiolucent lesions of the maxillofacial region on panoramic radiographs. Oral Radiol. Epub ahead of print 1 July 2025. DOI: 10.1007/s11282-025-00838-x.

25.

Kwon

Yong

Kang

, et al. Automatic diagnosis for cysts and tumors of both jaws on panoramic radiographs using a deep convolution neural network. Dentomaxillofac Radiol 2020; 49: 20200185.

26.

Yang

Kim

, et al. Deep learning for automated detection of cyst and tumors of the jaw in panoramic radiographs. J Clin Med 2020; 9: 1839.

27.

Ariji

Yanashita

Kutsuna

, et al. Automatic detection and classification of radiolucent lesions in the mandible on panoramic radiographs using a deep learning object detection technique. Oral Surg Oral Med Oral Pathol Oral Radiol 2019; 128: 424–430.

28.

Feher

Kuchler

Schwendicke

, et al. Emulating clinical diagnostic reasoning for jaw cysts with machine learning. Diagnostics (Basel) 2022; 12: 1968.

29.

Tajima

Okamoto

Kobayashi

, et al. Development of an automatic detection model using artificial intelligence for the detection of cyst-like radiolucent lesions of the jaws on panoramic radiographs with small training datasets. J Oral Maxillofac Surg Med Pathol 2022; 34: 553–560.

30.

Shi

Wang

, et al. Deep learning in the diagnosis for cystic lesions of the jaws: a review of recent progress. Dentomaxillofac Radiol 2024; 53: 271–280.

31.

Poedjiastoeti

Suebnukarn

Application of convolutional neural network in the diagnosis of jaw tumors. Healthc Inform Res 2018; 24: 236–241.

32.

Lee

Kim

Jeong

SN.

Diagnosis of cystic lesions using panoramic and cone beam computed tomographic images based on deep learning neural network. Oral Dis 2020; 26: 152–158.

33.

Naushad

Kaur

Ghaderpour

Deep transfer learning for land use and land cover classification: a comparative study. Sensors (Basel) 2021; 21: 8083.

34.

Basha

SHS

Vinakota

Pulabaigari

, et al. AutoTune: automatically tuning convolutional neural networks for improved transfer learning. Neural Netw 2021; 133: 112–122.

35.

Shoorgashti

Alimohammadi

Baghizadeh

, et al. Artificial intelligence models accuracy for odontogenic keratocyst detection from panoramic view radiographs: a systematic review and meta-analysis. Health Sci Rep 2025; 8: e70614.

36.

Fedato Tobias

Teodoro

Evangelista

, et al. Diagnostic capability of artificial intelligence tools for detecting and classifying odontogenic cysts and tumors: a systematic review and meta-analysis. Oral Surg Oral Med Oral Pathol Oral Radiol 2024; 138: 414–426.

37.

Ver Berne

Saadi

Politis

, et al. A deep learning approach for radiological detection and classification of radicular cysts and periapical granulomas. J Dent 2023; 135: 104581.

38.

Liu

Zhou

, et al. Differential diagnosis of ameloblastoma and odontogenic keratocyst by machine learning of panoramic radiographs. Int J Comput Assist Radiol Surg 2021; 16: 415–422.

39.

Tang

, et al. LeSAM: adapt segment anything model for medical lesion segmentation. IEEE J Biomed Health Inform 2024; 28: 6031–6041.

40.

Abdolali

Zoroofi

Otake

, et al. Automatic segmentation of maxillofacial cysts in cone beam CT images. Comput Biol Med 2016; 72: 108–119.

41.

Hung

QYH

Wong

, et al. Current applications of deep learning and radiomics on ct and cbct for maxillofacial diseases. Diagnostics (Basel) 2022; 13: 110.

42.

Liu

, et al. Automatic classification and segmentation of multiclass jaw lesions in cone-beam CT using deep learning. Dentomaxillofac Radiol 2024; 53: 439–446.

43.

Wang

Tang

, et al. EU-Net: automatic U-Net neural architecture search with differential evolutionary algorithm for medical image segmentation. Comput Biol Med 2023; 167: 107579.

44.

Kirnbauer

Hadzic

Jakse

, et al. Automatic detection of periapical osteolytic lesions on cone-beam computed tomography using deep convolutional neuronal networks. J Endod 2022; 48: 1434–1440.

45.

Yeshua

Ladyzhensky

Abu-Nasser

, et al. Deep learning for detection and 3D segmentation of maxillofacial bone lesions in cone beam CT. Eur Radiol 2023; 33: 7507–7518.

46.

Huang

Fields

A tutorial on generative adversarial networks with application to classification of imbalanced data. Stat Anal Data Min 2022; 15: 543–552.

47.

Ozturk

Cukur

Deep clustering via center-oriented margin free-triplet loss for skin lesion detection in highly imbalanced datasets. IEEE J Biomed Health Inform 2022; 26: 4679–4690.

48.

, et al. Enhancing brain tumor detection in MRI images through explainable AI using Grad-CAM with Resnet 50. BMC Med Imaging 2024; 24: 107.

49.

Figueroa

Song

Sunny

, et al. Interpretable deep learning approach for oral cancer classification using guided attention inference network. J Biomed Opt 2022; 27: 015001.

50.

Moore

Frye

Review of HIPAA, Part 1: history, protected health information, and privacy and security rules. J Nucl Med Technol 2019; 47: 269–272.

51.

Koetzier

Mastrodicasa

, et al. Generating synthetic data for medical imaging. Radiology 2024; 312: e232471.

52.

Gim

Blazes

, et al. A clinician’s guide to sharing data for AI in ophthalmology. Invest Ophthalmol Vis Sci 2024; 65: 21.

53.

Koutsoubis

Waqas

Yilmaz

, et al. Privacy-preserving federated learning and uncertainty quantification in medical imaging. Radiol Artif Intell 2025; 7: e240637.

54.

Schneider

Rischke

Krois

, et al. Federated vs local vs central deep learning of tooth segmentation on panoramic radiographs. J Dent 2023; 135: 104556.

Deep learning for imaging diagnosis of jaw cystic lesions and maxillofacial tumors: A narrative review