Abstract
Objective
The present retrospective diagnostic accuracy study aimed to evaluate the performance of an AI-based system for automated detection of teeth, caries, implants, restorations, and fixed prostheses on bitewing radiographs.
Methods
A total of 407 bitewing radiographs from 315 adult patients were analyzed using an AI system developed by VELMENI Inc. and compared with reference annotations on individual tooth-level provided by two oral and maxillofacial radiologists. Every tooth was encoded for the absence (0) or presence (1) of radiographic findings: caries, restorations, fixed prosthesis, and implants. Cohen’s kappa (κ) with 95% bootstrap confidence intervals was used for assessment of inter-rater reliability. The AI system’s diagnostic accuracy was evaluated for sensitivity and specificity using human consensus reference standards.
Results
The annotated dataset consisted of 2,829 tooth-level observations. The two radiologists showed substantial to near-perfect agreement for prosthesis (κ = 0.925) and restorations (κ = 0.872) detection, moderate for caries detection (κ = 0.804), and lowest for implants detection (κ = 0.726). The AI system showed substantial agreement with the human observers for restorations (κ = 0.812–0.871) and prosthesis detection (κ = 0.882–0.940), moderate for caries (κ = 0.454–0.508), and the highest agreement for implant detection (κ = 0.763–0.974). Post-filtration for human consensus, the AI system showed high sensitivity for implant (1.000), prosthesis (0.984), restorations (0.974), and caries (0.972). The AI system showed high specificity for implant (1.000) and prosthesis (0.984), and restorations (0.936) detection, but slightly lower specificity for caries detection (0.842).
Conclusions
The AI system demonstrated diagnostic performance comparable to that of oral and maxillofacial radiologists for detecting multiple dental findings on bitewing radiographs, including restorations, prosthesis, and implants, and slightly lower for caries detection. These findings support the potential role of the AI system as a clinical adjunct to improve efficiency and consistency in routine dental imaging interpretation.
Introduction
Artificial Intelligence (AI) is a rapidly evolving field of computer science focused on developing systems capable of performing tasks that typically require human intelligence, such as speech recognition, decision-making, problem-solving, and language translation. By automating complex and repetitive tasks, AI technologies can replicate aspects of human cognition and behavior. Core AI approaches include decision trees, neural networks, and deep learning algorithms, which underpin applications in image and speech recognition. 1 AI is increasingly applied in dentistry, enhancing specialties such as oral and maxillofacial radiology, endodontics, periodontics, prosthodontics, implantology, orthodontics, and oral and maxillofacial surgery by improving diagnostic accuracy and treatment planning.2,3 In particular, AI has shown significant promise in the analysis of dental radiographic images. 4
Machine learning, a subset of AI, enables computational models to improve performance by learning patterns from data rather than relying on explicitly programmed rules. Traditional machine learning methods often require substantial human input, which may introduce errors and inefficiencies.3,5 Deep learning was developed to address these limitations by using multi-layered neural networks capable of automatically learning complex representations directly from large datasets.6,7
Convolutional neural networks (CNNs), a prominent deep learning architecture, are widely used for analyzing two- and three-dimensional medical images, including radiographs. CNNs iteratively optimize predictions by minimizing errors relative to ground truth labels and are commonly applied in radiology for detection, classification, and segmentation tasks. Detection involves identifying abnormalities (e.g., coronal radiolucencies), classification assigns diagnostic categories (e.g., caries or cervical burnout), and segmentation delineates anatomical structures or lesions and assigns semantic labels for quantitative analysis.1,4,5,8 Automated segmentation techniques are generally more reproducible and less time-consuming than manual segmentation. 9 Besides caries detection, CNNs have been widely used in dentistry for teeth recognition, detection of restorations, and diseases such as periapical lesions, tumors, and cysts, among others. 10
Conventional radiography is the most frequently utilized imaging method in dental radiology due to its low-dose and cost-effective characteristics. Intraoral radiographs, particularly bitewing radiographs, are extensively employed to diagnose carious lesions and assess bone levels. They provide valuable information on dental conditions from the distal surface of the canine crown to the last erupted crown. 11 Intraoral radiography, in particular, is the most effective and precise imaging technique for identifying proximal caries 12 and plays an essential role in the diagnosis, management, and treatment planning of dental diseases. Hence, periodic bitewing radiographs are commonly used to monitor the control of caries progression. 13 However, the diagnostic quality of these radiographs is often affected by the expertise of the operator obtaining these radiographs, and the degree of distortions, artifacts, and superimposition can vary significantly, potentially creating misinterpretations. 14 Automated detection systems may help overcome these limitations and improve diagnostic consistency and accuracy. 2
In our previous studies, VELMENI Inc.’s AI system successfully detected teeth, caries, implants, and restorations on panoramic and periapical radiographs.15,16 However, its diagnostic performance on bitewing radiographs, one of the most commonly used intraoral imaging modalities, has not yet been thoroughly evaluated. The present study aims to evaluate the diagnostic performance of VELMENI Inc.’s AI system in automatically detecting teeth, caries, implants, restorations, and other dental features on bitewing radiographs. The null hypothesis is that there is no significant difference between the diagnostic performance of the AI system and oral and maxillofacial radiologists in detecting teeth, caries, implants, restorations, and fixed prostheses on bitewing radiographs.
Materials and methods
Study design and reporting guidelines
This was a retrospective diagnostic accuracy study designed to evaluate the performance of an artificial intelligence (AI) system for detecting dental structures and pathologies on bitewing radiographs. This was a retrospective diagnostic accuracy study, and the manuscript was prepared in accordance with the STARD (Standards for Reporting Diagnostic Accuracy Studies) 2015 recommendations and the CLAIM checklist.17,18 We conducted this study in accordance with the Helsinki Declaration of 1975 as revised in 2024. All the radiographs were de-identified before statistical analysis.
Radiographic dataset
A total of 930 anonymized bitewing radiographs taken between June 2022 and May 2023 from individuals aged 18 and older were obtained from the EPIC and MiPacs systems of the Department of Oral and Maxillofacial Radiology at the University of Mississippi Medical Center. Of these, 523 radiographs were excluded due to the absence of visible teeth, insufficient diagnostic quality caused by patient position, beam angulation, patient motion, overlap, or superposition of foreign subjects, or the presence of image artifacts, resulting in a final dataset of 407 bitewing radiographs used for analysis. Only de-identified bitewing radiographs from patients with more than 16 erupted teeth were used, and this threshold ensured sufficient dental data for reliable analysis. Findings of caries, implants, restorations, and fixed prostheses were included in the study. Bitewing radiographs with no teeth were excluded from the study. All bitewing radiographs were obtained using XDR sensors (Los Angeles, CA, USA), set with parameters of 70 kVp, 8 mA, and 0.16 seconds. The research protocol was approved by the University of Mississippi Medical Center IRB on June 15, 2023 (UMMC-IRB-2024-146).
Image annotation
Four hundred and seven anonymized bitewing dental radiographs were analyzed and labeled independently by two oral and maxillofacial radiologists, each with a minimum of five years of experience. Prior to image annotation, they underwent a calibration process to standardize diagnostic criteria and annotation protocols. The radiologists identified and labeled teeth with caries (including all types, such as enamel, dentin, secondary, root, etc.), implants, restorations (amalgam and composites), and fixed prostheses. Additionally, a convolutional neural network (CNN) architecture developed by VELMENI Inc., located in California, USA, was utilized to detect the same dental categories in the bitewing radiographs. The VELMENI Inc. AI system was trained by the company using proprietary datasets that were different from those in the present study. The training model and parameters are confidential intellectual property. The dataset used in the present study is representative of an external clinical validation of the VELMENI Inc. AI system. In the present study, the dataset was evaluated on an individual tooth level in each radiograph. The presence or absence of the radiographic findings in each tooth, including caries, restorations, prosthesis, and implants, was encoded with a binary variable wherein 0 indicated the absence of the finding, while 1 indicated its presence. This encoding was uniformly applied across observers 1 and 2 as well as the AI system. This resulted in a structured tooth-level dataset consisting of 2,829 individual tooth-level observations. For diagnostic accuracy testing, only teeth for which both human observers assigned the same label were included, while all tooth-level data were retained for inter-rater reliability analysis. Thus, the AI system’s diagnostic performance was evaluated with the judgment of two radiologists being used as a reference. This comparative approach aimed to evaluate AI’s efficacy in detecting dental anomalies and verify the consistency between human and automated assessments.
Statistical analysis
The inter-rater reliability from the individual tooth-level dataset was analyzed for pairwise agreement using Cohen’s kappa (κ) for each radiographic finding across all pairs, including observers 1 vs 2 and each observer vs AI. All κ statistics were performed using the irr package in R (version 4.5.0). Percentile bootstrap 95% confidence intervals were applied with 2000 resampling iterations to estimate the uncertainty.
Results
Inter-rater reliability
Inter-rater reliability between two oral and maxillofacial radiologists and VELMENI Inc’s AI system for the detection of radiographic findings on bitewing radiographs.

Inter-rater reliability between two oral and maxillofacial radiologists and an AI system for the detection of caries, restorations, prosthesis, and implants on bitewing radiographs. Bars represent Cohen’s κ values with 95% bootstrap confidence intervals.
AI system diagnostic accuracy testing results
Figure 2 shows the sensitivity and specificity results for the AI system. Post-filtration for human observer consensus between 2657 and 2820 individual tooth-level observations per finding remained in the dataset, which was used for AI system diagnostic accuracy testing. The AI system showed consistently high sensitivity for all radiographic findings. AI showed the highest sensitivity for implant detection, having a perfect value of 1, followed by prosthesis detection (0.984), restorations (0.974), and caries (0.972). The AI system specificity followed a similar trend. The highest specificity was shown for implant detection, having a perfect value of 1, followed by near-perfect values for detection of prosthesis (0.989) and restorations (0.936). However, the AI system’s specificity for caries detection was found to be comparatively lower having a value of 0.842. This indicates that the AI system is highly sensitive to caries detection at the expense of specificity, leading to a high number of false-positive detections on bitewing radiographs. Diagnostic accuracy of the AI system for the detection of caries, restorations, prosthesis, and implants on bitewing radiographs using sensitivity and specificity calculated using human observer consensus as a reference standard. Error bars indicate 95% bootstrap confidence intervals.
Discussion
This retrospective diagnostic accuracy study evaluated the performance of an artificial intelligence (AI) system for the automated detection of caries, implants, restorations, and fixed prostheses on bitewing radiographs, testing the null hypothesis that there would be no significant difference between the diagnostic performance of the AI system and oral and maxillofacial radiologists.
Diagnosis can be challenging, but it is a fundamental point for the correct planning and subsequent treatment of patients, and imaging exams are an important means of diagnostic support for dentists. Among imaging exams, bitewing radiography can be used for dental assessment and the evaluation of supporting tissue areas. The technique for bitewing radiographs is simple and can be performed both horizontally and vertically, being easy to acquire, with minimal distortion, the capability to produce high-resolution images, and presenting good cost-effectiveness. Overlapping structures and potential underestimation of bone defects are some limitations that this type of radiography can present.19,20 Rather than reiterating these well-established characteristics, the present findings emphasize the role of AI-based analysis in supporting interpretation of bitewing radiographs by improving consistency and reducing the risk of overlooked findings. The analysis of the obtained images is as important as the performed technique, and to assist professionals at this moment, the use of automatic detection methods can bring additional benefits such as preventing pathologies, anatomical structures, previous treatments performed, and possible alterations from being overlooked by the professional, especially by inexperienced dentists. These methods can also ensure the early diagnosis of changes and assist in the regular recording of patient data in the digital environment. 21
The objective of this study was to evaluate the diagnostic performance of VELMENI Inc.'s AI system in the automatic detection of caries, implants, restorations, and fixed prostheses in bitewing radiographs. Inter-rater agreement using Cohen’s kappa, and the diagnostic accuracy of VELMENI Inc.'s AI system as assessed using sensitivity and specificity, showed good level of agreement with the interpretations of two expert human observers. Importantly, this study expands current literature by assessing multi-category detection on bitewing radiographs and by comparing AI performance with oral and maxillofacial radiologists rather than general dental practitioners.
The identification of caries with the aid of CNN is one of the most researched topics in bitewing radiography and has shown promising results. A systematic review identified an average accuracy of 82% to 89% for the detection of carious lesions by CNNs, with CNNs performing similarly to or better than professionals in most studies. 22 Another systematic review reported an accuracy range from 73.35% to 98.44%, averaging 90.08%. 23 Dental caries remains one of the most common chronic diseases globally, with an estimated two billion untreated cases in permanent teeth. Carious lesions in proximal areas still present a diagnostic challenge, potentially progressing silently; thus, early detection is crucial and can lead to more cost-effective treatment. 24 This study, consistent with the previously cited works, also found a moderate to substantial agreement for caries detection as quantified by Cohen’s κ between AI and the two human observers. In their study, Bayrakdar et al. 25 reported that the caries detection model based on a CNN achieved superior results compared to experienced professionals. Similarly, Garcia Cantu et al. 26 observed that the CNN showed superior results for caries detection, especially in primary lesions, compared to experienced dentists. What sets our study apart from these previous studies is that we compared AI with professionals specialized in maxillofacial radiology.
Artificial intelligence models for the identification of dental implants are already widely employed in the field of implant dentistry. In their systematic review, Revilla-León et al. 27 show that CNNs are the most utilized for this application, with an average accuracy ranging between 93.8% and 98%. These results are not only for the identification of implants in imaging exams but also for identifying the type of implant found. This is a significant aid since dentists will occasionally treat patients who cannot provide information about prior treatments, complicating the continuation of prosthetic treatment. High accuracy in identifying dental implants was also found in the study by Kohlakala et al. 28 which achieved a 94% accuracy rate for the detection/segmentation of dental implants. These results are consistent with our findings, in which the CNN demonstrated near-perfect agreement with human observers in identifying the dental implants. The use of bitewing radiographs for evaluating dental implants is not routine in clinical practice, but our study shows favorable results for their use in identifying the number of dental implants. The vertical version of these radiographs can be used to increase the apical area captured in the image.
Identifying restorative treatments, particularly when performed with a composite resin that matches the tooth’s color, can be complex. The present study found a substantial agreement between the AI system and both human observers. A systematic review reported that a convolutional neural network was able to accurately classify teeth with composite resin restorations at 92.9%, amalgam restorations at 99.2%, and gold restorations at 99.4%, indicating that tooth-colored restorative materials can be identified, albeit with greater difficulty. 29 Another study used a CNN to classify dental restorations in bitewing and periapical radiographs, achieving an accuracy of 88% for classifying amalgam restorations and 87% for composite resin restorations, demonstrating that the CNN model exhibited clinically acceptable success in the classification of restorations. 30 These results highlight the potential use of AI for patient diagnosis and treatment planning, student education, and forensic dentistry.
In the present study, the detection of dental prostheses showed the highest level of agreement across both observers and the AI system. A study that used CNNs to detect full crowns and bridges achieved accuracy results ranging from 68.6% to 92.7%. 31 However, another study obtained low performance for identification, mainly due to the different characteristics found in fixed prosthetic rehabilitations. 32
Several limitations of this study should be acknowledged. Radiographs with artifacts were excluded, which may result in overestimation of AI performance compared with real-world clinical settings where image quality is variable. As a limitation, the study was conducted with a limited database, lacking external data, which affects the generalizability of the obtained results. Future research should use larger and more diverse datasets to strengthen and validate the results, overcoming these limitations.
In this study, we demonstrated the strong correlation between the specialist observers and the VELMENI Inc. AI system, with enough robustness to achieve good results in detecting implants, restorations, fixed prostheses, and caries with efficiency, precision, and consistency. We have shown that AI is highly useful, potentially reducing professionals’ workload and repetitive tasks.
Future studies should focus on validating this AI system in more diverse clinical settings and exploring its integration into routine bitewing radiograph interpretation workflows to support clinical decision-making and enhance diagnostic efficiency.
Footnotes
Acknowledgements
We thank the University of Mississippi Medical Center School of Dentistry Division of Oral and Maxillofacial Radiology Clinic and VELMENI Inc.
Ethical considerations
University of Mississippi Medical Center Institutional Review Committee gave ethical approval. IRB Approval Number - (UMMC-IRB-2024-146).
Author contributions
Conceptualization, methodology, R.J., Y.S. and M.S.; software, Y.S., P.J. and M.S.; validation, R.J., and Y.S.; investigation and statistics, R.J., Y.S., M.D.R. and A.P.; data curation, R.J., Y.S. and P.J.; writing—original draft preparation, A.P., P.T. and M.B.G.; writing—review and editing, R.J., Y.S., A.P., M.B.G. and M.S; supervision, R.J., Y.S. and M.S.; project administration, A.F., M.F., M.S. All authors have read and agreed to the published version of the manuscript.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The datasets generated and analyzed during the current study are not publicly available due to institutional data protection policies but are available from the corresponding author on reasonable request.
