Sage Journals: Discover world-class research

Abstract

Introduction

AI-based automated cephalometric landmark detection streamlines orthodontic diagnosis and treatment planning, providing accurate, efficient, and reliable results. Benefits include saving time, minimizing subjectivity, improving precision, and facilitating continuous improvement. However, they should complement clinician expertise, ensuring qualified orthodontists make the final diagnosis and treatment plan.

Aim

To propose a method that automatically detects cephalometric landmarks on the X-ray images and compare these values with the manual annotation method.

Methodology

A dataset of 600 X-ray images, each containing 19 landmarks, was collected. Two orthodontists manually marked the 19 landmarks in 300 cephalograms and their coordinates were automatically extracted. The dataset was cleaned for errors, and a pre-trained CNN model with an EfficientNetB7 backbone was used for landmark detection. The model was trained on 80% of the dataset and tested on the remaining 20%. The two-step method included ROI extraction and landmark detection. The RMSE score was used to evaluate inter-examiner reliability and the R² score was used to compare manual and automated models.

Result

Model landmark locations were compared to the manual method. The mean deviation of the predicted landmarks from the actual landmarks was calculated using RMSE, and the model showed acceptable accuracy compared to manual annotation. EfficientNetB7 was found to have detection accuracies similar to the manual annotation method. For landmarks like Porion, articulare, and soft tissue pogonion, the model outperformed the human annotation method and provides a consistent better result, and for points like Point A, pogonion, gnathion, and menton, the manual methods show more accurate results.

Conclusion

The study introduced an automated approach using deep learning to predict landmark locations, and the results demonstrate its accuracy in comparison with the manual annotation method. This approach effectively detects cephalometric landmarks, suggesting its potential for clinical use with orthodontist’s supervision.

Keywords

Artificial intelligence cephalometry deep learning dental anatomy orthodontics radiography

Introduction

Orthodontics aims to correct craniofacial discrepancies and meet functional and aesthetic demands. Cephalometric measurement has become a popular diagnostic tool to evaluate deviated skeletal parameters in orthodontics since it was introduced in the 1930s. Cephalometric radiographs are used to assess and predict craniofacial growth, and plan and evaluate treatment effects.¹ However, the manual landmarking method used by orthodontists to extract landmarks from X-ray images is time-consuming and prone to errors due to clinician fatigue and high intra and interpersonal variations.³ Therefore, there is a need for a stable and consistent automated method for precise cephalometric evaluation. Challenges in the computer-assisted cephalometric analysis include manual calibration, identification of landmarks, and the complexity of auto-identifying anatomical landmarks due to variations in skeletal structures, image blurs, and overlapping structures. Although deep learning methods are being researched to improve the accuracy of digital cephalometric analysis, limited datasets remain a challenge.

There have been various studies for lateral cephalometric analysis. In particular, the International Symposium on Biomedical Imaging (ISBI) held in 2014 and 2015 challenged this problem and several approaches were published.^{4, 5} They used highly randomized trees as pixel classifiers. Despite the wide variety of studies, no accurate model has yet been developed for use in clinics with less than 2 mm error range.⁴

Automated cephalometric landmark detection software, using AI technology (deep learning), enables accurate and fast calculations. The software simplifies landmark identification and improves with each analysis, thanks to its deep brain algorithm. Deep learning has outperformed existing algorithms in various areas in recent years. Especially since the AlexNet in ILSVRC in 2012,⁹ convolutional neural network (CNN) has been developed rapidly in image processing. CNN is a multi-layered perceptron model inspired by animal visual systems.¹⁰ Therefore, CNN enables us to get specific information efficiently from images. CNN has been widely applied to medical imaging,¹¹ image segmentation,^{12, 13} object/lesion detection,^{14, 15} image/exam classification,¹⁶ and registration.

This study builds and evaluates the accuracy of an artificial intelligence (AI)—based landmark detection CNN model. The proposed model’s performance is tested and compared with the manual landmark annotation method.

Materials and Methods

Materials

In this study, we utilized two datasets for analysis. The first dataset, which we refer to as Dataset 1, was obtained from the 2015 ISBI Grand Challenge conducted by IEEE. This dataset consists of 300 lateral cephalograms. The second dataset, referred to as Dataset 2, was collected from a Dental College and also contains 300 lateral cephalograms. All cephalograms were captured using a Digital Cephalometric X-ray Machine, Orthophos XG 3D, and were processed using Dentsply Sirona Sidexis ver.4.0 software. We used a marked dataset of 600 files that included 19 landmark annotations and an unmarked dataset containing 600 files for training and testing our models. For building our CNN, we utilized the PyTorch framework along with Torch Vision and OpenCV using the Python language. Our models were trained on an Asus Rog laptop with 32GB RAM, an 8GB RTX 2070 Super Max-Q, and an Intel 10th Gen CPU.

Description of Datasets

The CNN model needs a considerable amount of data to work efficiently; otherwise, there will be an over-fitting issue. The data used in this study were provided in the 2015 ISBI Grand Challenges in Dental X-ray Image Analysis of IEEE International Symposium on Biomedical Imaging 2015 (website: http://www-o.ntust.edu.tw/~cweiwang/ISBI2015/challenge1/). This was combined with the dataset created from a Dental College. The challenge dataset contains 300 images, with a resolution of 1,935 × 2,400 pixels in TIFF format, each pixel’s size is 0.1 × 0.1 mm. The cephalograms were acquired with Soredex CRANEXr Excel Ceph machine (Tuusula, Finland) and Soredex SorCom software (3.1.5, version 2.0). The downloaded file contained three folders. The first folder had x-rays with landmarks already identified. The third folder had the coordinates of the marked landmarks for every 300 images.

The custom-created dataset containing 300 patient images between the ages of 6 and 60 years was collected with Orthophos XG 3D Digital X-ray machine and Dentsply Sirona Sidexis (version 4.0) software from a dental college. Two orthodontists manually annotated the collected images, and the coordinates were listed for every 19 landmarks in the same way as the Grand challenge dataset. Thus, a total of 600 images were taken for this study after exclusion criteria.

Inclusion Criteria

Digital lateral cephalograms of patients having Angle’s Class I, II, III skeletal pattern.

Exclusion Criteria

Digital lateral cephalograms of patients with a history of facial trauma

Digital lateral cephalograms of patients with syndromes and or other craniofacial anomalies

Digital lateral cephalograms of patients with gross asymmetry

Digital lateral cephalograms of patients undergone surgical corrections

Digital lateral cephalograms of poor quality that do not allow manual tracing.

Proposed Method

As described above, the dataset collected contained 600 X-ray images, and each image contained 19 landmarks. Since the study was performed by involving human subjects, it was approved by the institutional review board. The dataset collected from the Grand challenge contained 300 X-rays, which two experienced doctors had manually annotated and, the coordinates are listed in tables for 19 landmarks. Two orthodontists with a clinical and academic experience of more than 10 years were selected to augment the database of the present study. They analyzed the 300 cephalograms collected from a Dental College, and manually marked the 19 Cephalometric landmarks listed in Figure 1. After obtaining the marked x-rays, the landmarks’ x,y coordinates were automatically extracted by the computer. Once the dataset was created, the cleaning of the dataset was done, that is, the entire dataset was analyzed for errors of different types and were removed. The mean error between the two Orthodontists was calculated by the RMSE score listed in Table 1 to analyze the inter-examiner reliability, and find the ground truth data for each landmark. After making sure that the dataset is clear without any error, the training of the model was done. Eighty per cent of the dataset were used for training purposes and the remaining dataset was used for testing the model. During the training time, the images were studied by the model. After adequate training, the testing dataset is given to the model for identifying the landmarks (Figure 2). This method is a two-step method: (a) ROI extraction, and (2) Landmark detection. The cropped patches were by registering the test image to training images for ROI extraction, with annotated landmarks (Figure 3a). Later, the pre-trained CNN models with the backbone of EfficientNetB7, a state-of-the-art CNN, were used to detect the landmarks in the extracted ROI patches (Figure 3b). Once the model detected the landmarks, the RMSE score for each landmark was evaluated to obtain the amount of variation in detecting the landmark from the ground truth data, listed in Table 2. Finally, the manual and automated models were compared using the R² score.

Table 1.

Mean Deviation Calculated on Observers Manual Landmarking Done by Experienced Orthodontists.

Anatomical Landmarks	RMSE Score—Mean Error Pixel
Sella	8.158992995
Nasion	13.12710301
Orbitale	18.89700153
Porion	20.56398551
Pogonion	7.834805
Menton	8.064738
Gnathion	6.3035045
Gonion	18.8819135
Incisal point Upper	4.5414755
Incisal point Lower	5.1604265
Upper lip	17.1833495
Lower lip	13.8348595
Subnasale	10.2873305
PNS	9.973005
ANS	13.0483715
Articulare	18.3700165
Soft tissue pogonion	42.107541
Point A	20.8449435
Point B	18.9083315

Table 2.

Error Calculated on CNN Automated Landmarking from the Established Ground Truth.

Anatomical Landmarks	Detection Rate	CNN Model—Mean Error (RMSE)
Sella	99.6	8.775
Nasion	95.6	14.16
Orbitale	99.6	14.22
Porion	92.3	15.975
Pogonion	93.5	9.36
Menton	95.5	10.32
Gnathion	93.3	11.31
Gonion	90.7	15.48
Incisal point Upper incisor	96.7	8.7
Incisal point Lower incisor	94.6	8.01
Upper lip	95.5	14.31
Lower lip	99.9	10.17
Subnasale	96.3	9.375
PNS	97.6	9.315
ANS	94.8	13.455
Articulare	91.5	15.165
Soft tissue pogonion	93.8	13.08
Point A	92.9	14.97
Point B	95.8	11.91

Figure 1.

Landmarks Selected for Detection.

Figure 2.

Trained 19 Models with the Same Architecture but Different Weights.

Figure 3.

3(a). Example of the Training Image. Yellow Boxes are Cropped Training Images. 3(b) The Cropped 512x512 Patches. Note that the Cropped ROI Patches for Each Landmark: Rot Dot is the Target Landmark, The Yellow Boxes are 512x512 ROI Patches.

CNN Model

The Pytorch framework with Torch Vision and OpenCV in Python was used to build the CNN program in this study. CNNs have more layers for deeper networks and better accuracy, and in this study, the EfficientNetB7 architecture is used as the backbone, which Google developed, and these models are used for computer vision applications. They can be effectively used to find features in ROI patches. The EfficientNetB7 architecture uses mobile inverted bottleneck convolution (MBConv) as a baseline network. This CNN can find valuable features automatically for different computer vision tasks. This model can perform scaling on depth, width, and resolution.

First, a baseline neural network called Efficientnet-B0 is formed using Neural Architectural Search (NAS) using the machine learning technique. Once baseline network is created, further scaling of the network is done in terms of depth, width, and resolution to generate a more significant model B7 to provide better accuracy. In particular, the EfficientNet-B7 achieves state-of-the-art 84.3% top-1 accuracy on ImageNet. A fully connected layer was used to estimate landmark location as a regression problem, flattening all features and outputting the coordinate of the landmark in the patch (Figure 4).

Figure 4.

The Architecture of the EfficientNetB7 Shows the Conventional Layers Present in the Network.

Evaluation of the Result

The output data are analyzed using RMSE and R² Score to detect the model’s accuracy and compare it with the manual annotation method respectively. The Root Mean Squared Error (RMSE) is a common metric for assessing the performance of machine learning models. It is often used to provide a metric that is related to the unit being measured. The study uses the RMSE score for comparing the results obtained in the manual annotation method and automated method. The results would give the error between the actual point and marked landmark. This way, rather than a percentage, helps the readers to understand the error better.

Calculation of RMSE Score

For each landmark prediction, the difference between respected landmarks and the actual ground truth data is to be found and they have to square these values. After that, the mean of these squared values is found and the square root of this mean gives the RMSE score. The output is a non-negative value, and it would be better if it is brought near zero.

“Ʃ” represents sum, “yi,” the predicted value for the ith observation, “ẏ” the observed value for the ith observation, and “N” is the sample size. The score will give us an idea of the average distance between the ground truth data values and the predicted data values. The RMSE results are shown in pixel difference, which will give a clear idea about how much deviation is present from the actual landmark point to the predicted landmark point. As the pixels difference decreases, the accuracy of the predicted landmark increases, which can come up to zero value.

Calculation of R² Score

R-squared (R²) is a statistical measure representing the proportion of the difference or variance for a dependent variable that an independent variable or variables can explain. It shows how well the dataset will fit the model. For the calculation of R squared, the correlation coefficient is obtained, and the square is the result.

r = \frac{n (£ x y) - (£ x) (£ y)}{\sqrt{{[n^{*} (£ x^{2} - {(£ x)}^{2})]}^{*} [n^{*} (£ y^{2} - {(£ y)}^{2})]}}

where “r” represents the Correlation coefficient, n is the number in a given dataset, “x” the first variable, and “y” the second variable. The square root of the result gives an R² score.

R² Score shows how much information can be gathered from the images to accurately predict the landmark. The score can be ranged from values –1 to 1. As the values are closer to 1, we can say the dataset is best fitted for the model for predicting the landmarks and vice versa.

Results

The detection accuracy to evaluate the performance, EfficientNetB7 was used. If the distance between a detected landmark and its ground truth becomes less, this landmark can be classified as accurate. In Table 1, points like Orbitale, porion, gonion, articulare, Point A, and point B values show that there is a moderate difference between the two orthodontists and in some points show a high degree of acceptance between the two. It is noted that the landmark, soft tissue pogonion, shows a significant difference between the two orthodontists.

The mean deviation calculated on CNN automated landmarking model is obtained from the established ground truth data listed in Table 2. The results were obtained using RMSE score. It will provide information about deviation of the actual landmarks from the predicted landmark in pixels. The table shows that the points marked by the model have acceptable accuracy with the manual annotation method. For points like pogonion, gnathion, menton, upper incisor point, and lower incisor point, the manual method was found to have better predictability as shown in Figure 5.

Figure 5.

RMSE Score for 19 Landmarks. The Blue Line Shows the Mean Error that Occurred During the Manual Annotation Method; the Red Line Shows the Mean Error that Occurred During Automated Detection of Landmarks.

For detecting the accuracy of the test data, EfficientNetB7 was compared with the manual annotation method. To prove the model’s effectiveness, the results in Test data are shown in Table 3. As noted, EfficientNetB7 obtains detection accuracies similar to the manual annotation method in the R score (Figure 6). For landmarks like, Porion, articulare, and soft tissue pogonion, the model outperformed the human annotation method and provides a consistent better result, and for points like Point A, pogonion, gnathion, and menton, the manual methods show more accurate results.

Table 3.

Performance Comparison Between Manual and Automated Landmark Identification Based on R²-Score.

Anatomical Landmarks	Manual Landmark Accuracy-Score(R²)	Automated Landmark Accuracy-Score(R²)
Sella	0.944790597	0.95381
Nasion	0.928399311	0.90111
Orbitale	0.772685175	0.86632
Porion	0.442685418	0.78339
Pogonion	0.985662897	0.95729
Menton	0.987690519	0.94617
Gnathion	0.991861202	0.92341
Gonion	0.840970697	0.90562
Incisal point Upper	0.993262713	0.99738
Incisal point Lower	0.991278187	0.98637
Upper lip	0.914773372	0.98565
Lower lip	0.947982437	0.94655
Subnasale	0.963626204	0.95022
PNS	0.917825451	0.90707
ANS	0.921108837	0.93311
Articulare	0.655885858	0.75922
Soft tissue Pogonion	0.621971525	0.807231
Point A	0.834820777	0.80375
Point B	0.898344184	0.96541

Figure 6.

R² Score for Manual and Automated Landmark Annotation Method. The Blue Bar Shows the Performance of Manual Annotation, Red Bar Shows the Performance of the Automated Model.

Discussion

Cephalograms enable both qualitative and quantitative diagnosis unlike other dental radiographs. Despite the limitations highlighted by Graber in 1956, the utility of cephalometrics in orthodontic diagnostic treatment planning is widely accepted.

The detection of landmarks is a critical aspect of Cephalometrics, but it is also the most vulnerable area. The definition of some landmarks, such as Orbitale, has been criticized for being unreliable and unclear. Each landmark has its characteristic distribution of errors, and even when assessing the same head film, identification errors occur and cannot be ignored. The use of these landmarks to construct planes and angles can lead to problems, as the accurate identification of certain points, such as the apex of the lower central incisor and Point A, is challenging.

Despite technological advancements in imaging systems and software, diagnostic and treatment planning tools have not kept pace. Orthodontic clinicians still primarily rely on cephalometrics, despite its tediousness and time-consuming nature. Research indicates that lateral cephalograms are not essential for orthodontic treatment planning and are mainly used for medico-legal purposes, academic studies, or due to practitioner inexperience.

Over the past decade, a challenge has arisen in the orthodontic and craniofacial fields regarding the use of digital data and digital imaging. This challenge has resulted in some users feeling demotivated to properly utilize this technology due to the time it takes to obtain accurate measurements and cephalograms. As a result, users may fail to compare superimpositions at different treatment stages, leading to imprecise diagnostics and suboptimal treatment plans. To manually place landmarks on cephalometric images using software such as Dolphin Imaging or QuickCeph, an experienced clinician typically spends 10–15 minutes per image. This process can be time-consuming and prone to errors.⁵

The rapid development of AI in recent years has penetrated many aspects of daily life, including the analysis of extensively available datasets. The accumulation of data in many formats by search engines such as Google and social media (Twitter, Facebook, and Instagram) has great potential for enhancement and improvement of all aspects of our lives. With orthodontic diagnostic and treatment planning, this AI technology could deliver not only an easy, practical, and precise tool for the practising clinician, but also significantly improves the amount of available labelled data. Despite the readily available studies demonstrating the different processes to auto-detect craniofacial landmarks, most clinicians use approaches based on image-processing techniques where the image of cephalometric radiographs require intense human preparation, such as re-scaling, calibration, and labeling. Calibration and other image preparations are time-consuming and, if not done properly, often generate landmark outliners,²³ as they strongly rely on the quantity and size of the cephalometric images. Many other studies propose different novel frameworks for landmark detection in cephalometric radiographs and demonstrate results with an accuracy of 72%, but again, these are not fully automated procedures⁶. A study by Yue et al. in 2006 based on image processing technology combined with statistical model got an accuracy of 71%²⁷ and the study by Ningru et al. got an accuracy of 59.7%. In the older days for landmark detection, we use knowledge-based techniques like edge and pattern detection and genetic programming models. AI-based machine learning techniques are used. Current advances in this technology have, in turn, provided hardware and software development that is sufficiently robust to support the large computational requirements of complex AI algorithms and their application to deep learning. Applications of a variety of deep learning architectures, such as convolutional deep neural networks, deep belief networks, and recurrent neural networks, to the creation of algorithms in important fields such as natural language processing, computer vision, speech recognition, and bioinformatics have resulted in efficient and accurate automation of many pragmatic tasks. However, the developed methods are unable to compete with manual landmark identification. In recent years, the Institute of Electrical and Electronics Engineers (IEEE) and the International Symposium on Biomedical Imaging (ISBI) have organized Grand Challenges on this topic to encourage the development of better algorithms. The results were described as providing a benchmark for any future development. ⁴

The study was formulated to investigate whether AI might be a viable option for the repetitive and arduous task of identifying multiple cephalometric landmarks for use in clinical orthodontic practice. The null hypothesis is that there will be difference between the manual and automated methods. The mean error between the automated and manual did not exceed 0.9 mm. In all landmarks, the model demonstrated accurate identification of landmarks. All those mean differences showing less than 2 mm would not seem to be a clinically significant error. However, since the computers always detected identical positions, the reproducibility by the model upon repeated detection tasks was better than that associated with the human counterparts.

Computer vision, a field of AI that allows machines to analyze and classify images, has made significant progress due to CNNs. CNNs are highly effective in automatically identifying anatomical landmarks in diagnostic images. Two-dimensional cephalometric analysis remains an important tool in orthodontic diagnosis and treatment planning. Computer-assisted cephalometric analysis has already eliminated mechanical errors introduced by human operators, and attempts to fully automate this analysis have been ongoing to save time and improve landmark identification accuracy by reducing the subjective errors of clinicians. The current studies detected less than 20 landmarks and the accuracy of the results were not satisfactory for use in clinical orthodontic practice. For example, in 2009, 10 landmarks on 41 digital images were identified. In 2013, 16 landmarks were identified on 40 cephalometric radiographs, and the mean error from automatically identified landmarks was 2.59 mm. The accuracy of those automated methods was not as good as those associated with manual identification. In addition, cephalometric landmarks need not be limited to simply obtaining patients skeletal characteristics but could also be applied to plan treatment and to predict treatment outcomes, including soft tissue changes. For those purposes, hundreds of variables of anatomical landmarks are needed.

In the present study, the manual and automated models were compared to find whether the automated model is able to attain a landmark detection accuracy similar to the humans. For that, the study was conducted using 600 raw X-ray images, which were randomly collected from an age group of 6–60 years. The 300 images were collected from Open-sourced dataset from the internet and the remaining images were collected and the dataset was made indigenously and made publicly available for further studies. These 300 lateral cephalogram images that are custom created were manually annotated by two orthodontists who had a clinical and academic experience of over 10 years. These 600 images were randomly mixed for increasing the complexity of the detection of landmarks and were categorized into two sections: (a) Training (520 X-rays), and (b) Testing image (80 images) datasets. The training images were used to build an architectural model from a base model called EfficientNetB7 for detecting the landmarks. The EfficientNetB7 architecture was found by Google, which was far more superior than other currently available architectures for computer vision applications. It has the advantage of faster processing with less amount of data over the other currently available superior models. So, these EfficientNetB7 models are a suitable opponent for the other methods.²⁶ Nonetheless, the models applied for landmark identification in orthodontics are not comparable due to several reasons. The older models were built based on their own dataset and not available for public use. Since the datasets used for building each model are not similar, the comparison can lead to false results. The amount of data used for building each model varied, and the efficiency of those models was based on those datasets. Once the model is made, the efficiency of this model was analyzed by testing the remaining dataset. The result was obtained in pixels. The mean detection score for each of the 19 landmarks was formed for the manual annotation method and automated model using RMSE score. By comparing the testing dataset result with the manually annotated landmarks, the accuracy of detecting the landmarks was evaluated and compared. For comparing the result, the R² score is calculated, which shows how much variation is present for detecting a landmark by the model. In this way, the accuracy of the model can be assessed better.

The learning and testing data included images from various malocclusion patients in the present study. From the first formulation of the current study, the selection of these mixed images was intended to test the model’s performance in a more complex condition, rather than identifying landmarks on images from good-looking subjects.

For registration, since people’s heads vary in shape, even though the closet image to the training data was selected as the reference image for each test image, there were still missed situations. This means that after the registration, the patch we created for the test does not include the ground-truth landmark. For the ISBI dataset, there is only one missed patch, and the rate is about 0.0002. Overall, it has little impact on the results. For Testset2 of the ISBI Grand Challenge, it is seen that Landmark 3, Landmark 6, Landmark 13, and Landmark 16 have relatively low accuracy. However, the process works fine on Test1. After visualizing the testing result, it was clear that the anatomy of those failed cases is very different from the successfully detected ones.

In general, the pattern of differences between AI and orthodontists demonstrated that AI acts like an orthodontist. For example, when doctors had difficulty in identifying landmarks on poor quality images, so did AI. This might be the reason why image factors did not meaningfully affect the accuracy of AI in finding landmarks. In those subjects with fixed orthodontic appliances, massive prostheses, and surgical bone plates, it was initially anticipated that there would be difficulties in identifying the landmarks because of the multiple metallic artefacts. However, metal artefacts did not appear to have a clinically significant impact on identifying landmarks either.

As a limitation of the present study, the way AI learned during the training session and identified landmarks later in the test step are not explainable without describing computer science jargon. Although some technical details were necessary, this present study intended to focus on showcasing the results from AI. Upon repeated trials, AI always found identical positions. How much learning data might be sufficient enough to teach AI is currently unknown. Furthermore, it could be conjectured that the number of target landmarks might also be a contributing factor in deciding a sufficient number of learning data. A study to elucidate the sufficient quantity of data for deep learning of AI might be necessary for the future.

From the clinical perspective, however, AI would never replace trained specialists in orthodontics, nor might AI intend to replace a comprehensive orthodontic training program. Rather it could supplement, augment, and amplify diagnostic performance by objectively evaluating each patient seeking orthodontic treatment. The AI proposed in the present study can be compatible with the current clinical environment and would retain its validity under the constant supervision of experts in orthodontics.

For future research, the current study can be utilized to develop a graphical user interface (GUI) that offers either automatic or semi-automatic landmark detection for clinicians. In automatic mode, the system can extract a region of interest (ROI) and select the proper model for each landmark detection based on registration. Alternatively, in the semi-automatic mode, the clinician can manually extract the ROI using a bounding box and select the corresponding model for each landmark detection, which can reduce computational time. Additionally, the dataset created by the study is available as open source for future studies to utilize. However, the current model only detected landmarks in patch images without considering global-context information, which is the relationship among all landmarks. Therefore, utilizing global-context information in future research could improve the model’s performance.

Limitations of the Research

Deep learning models in CNN have limitations such as memory leakage and overfitting, requiring a large dataset to improve accuracy. Memory leakage can cause poor landmark prediction, and overfitting can occur when the model is too complex. The vanishing gradient problem can also arise from overfilling data beyond the model’s threshold. It is not feasible to continuously increase the dataset, and variations in lateral cephalometric images can lead to poor landmark accuracy. Therefore, a clear, distortion-free lateral cephalogram is necessary to build a better model.

Conclusion

The study proposed an approach to automatically predict landmark location and used a deep learning method in AI. The results shows that the model is accurate enough to detect the landmarks correctly as manual annotation method, which proves that the proposed method is effective for cephalometric landmark detection. The proposed method could be used for landmark detection in clinical practice under the supervision of orthodontists.

Footnotes

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical Approval

An approval was obtained from the Institutional ethics committee (IEC) of College before starting the study, Reference no: SGDC/152/2019/3729.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Informed Consent

Not applicable

ORCID iD

Jishnu S.

References

McNamara

Jr.

A method of cephalometric evaluation. Am J Orthod . 1984;86(6):449–469. 10.1016/s0002-9416(84)90352-x

, Oh

, Le

VNT

, Lee

. Deep anatomical context feature learning for cephalometric landmark detection. IEEE J Biomed Health Inform . 2021;25(3):806–817. 10.1109/JBHI.2020.3002582

Kamoen

, Dermaut

, Verbeeck

. The clinical significance of error measurement in the interpretation of treatment results. Eur J Orthod . 2001;23(5):569–578. 10.1093/ejo/23.5.569

Wang

C-W

, Huang

C-T

, Lee

J-H

, . A benchmark for comparison of dental radiography analysis algorithms. Med Image Anal . 2016;31:63–76. https://doi.org/10.1016/j.media.2016.02.004

Wang

, Huang

, Hsieh

, . Evaluation and comparison of anatomical landmark detection methods for cephalometric X-ray images: a grand challenge. IEEE Trans Med Imaging . 2015;34(9):1890–1900. 10.1109/TMI.2015.2412951

Ibragimov

, Likar

, Pernus

, Vrtovec

Automatic cephalometric X-ray landmark detection by applying game theory and random forests. Proc. IEEE International Symposium of Biomedical Imaging . 2014: 1–8.

Chen

, Xie

, Franke

, Grutzner

, Nolte

, Zheng

Automatic x-ray landmark detection and shape segmentation via data-driven joint estimation of image displacements. Med Image Anal . 2014;18:487–499. [PubMed: 24561486]

Vandaele

, Marée

, Jodogne

, Geurts

Automatic cephalometric x-ray landmark detection challenge 2014: a tree-based algorithm. Proceedings of the International Symposium on Biomedical Imaging (ISBI) . IEEE; 2014: 37–44.

Krizhevsky

, Sutskever

, Hinton

GE.

ImageNet classification with deep convolutional neural networks. Commun ACM . 2017; 24;60(6):84–90.

10.

LeCun

, Bengio

, Hinton

Deep learning. Nature . 2015;521(7553):436–444. 10.1038/nature14539.

11.

Litjens

, Kooi

, Bejnordi

, . A survey on deep learning in medical image analysis. Med Image Anal . 2017;42:60–88. 10.1016/j.media.2017.07.005.

12.

Ronneberger

, Fischer

, Brox

U-Net: Convolutional networks for biomedical image segmentation. ArXiv . 2015. /abs/1505.04597

13.

Milletari

, Navab

, Ahmadi

S-A.

V-Net: Fully convolutional neural networks for volumetric medical image segmentation. 2016 Fourth International Conference on 3D Vision (3DV) . 2016: 565–71.

14.

Shen

, Zhou

, Yang

, Tian

Multi-scale convolutional neural networks for lung nodule classification. Inf Process Med Imaging Proc Conf . 2015;24:588–599.

15.

Kawahara

, BenTaieb

, Hamarneh

Deep features to classify skin lesions. 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI) , Prague, Czech Republic, 2016: 1397–1400. doi: 10.1109/ISBI.2016.7493528

16.

Antony

, McGuinness

, Connor

N. E.

, Moran

(2016). Quantifying radiographic knee osteoarthritis severity using deep convolutional neural networks. ArXiv . 2016. /abs/1609.02469

17.

Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016: 19th International Conference, Athens, Greece, October 17–21, 2016; Proceedings, Part II 978–3-319–46723-8, 3319467239, 978–3–319-46722–1.

18.

Arık

SÖ

, Ibragimov

, Xing

Fully automated quantitative cephalometry using convolutional neural networks. J Med Imag . 2017;4(1):014501.

19.

Broadbent

BH.

A new X- ray technique and its application to orthodontia: the introduction of cephalometric radiography. Angle Orthod . 1981;51:147–153.

20.

Brodie

. On the growth pattern of the human head. From the third month to the eighth year of life. Am J Anat . 1941;68:209–262. https://doi.org/10.1002/aja.1000680204

21.

Downs

. Variations in facial relationships; their significance in treatment and prognosis. Am J Orthod . 1948;34(10):812–840. 10.1016/0002-9416(48)90015-3

22.

Steiner

. Cephalometrics in clinical practice. Angle Orthod . 1959;29(1):8–29. https://doi.org/10.1043/0003-3219(1959)029<0008:CICP>2.0.CO;2

23.

Lévy-Mandel

, Venetsanopoulos

, Tsotsos

. Knowledge-based landmarking of cephalograms. Comput Biomed Res . 1986;19(3):282–309.

24.

Parthasarathy

, Nugent

, Gregson

, Fay

. Automatic landmarking of cephalograms. Comput Biomed Res . 1989;22(3):248–269.

25.

Forsyth

, Shaw

, Richmond

, Roberts

. Digital imaging of cephalometric radiographsPart 2: Image quality. Angle Orthod . 1996;66(1):43–50.

26.

Tan

, Le

. EfficientNet: Rethinking model scaling for convolutional neural networks. ArXiv . 2019. /abs/1905.11946

27.

Yue

, Yin

, Li

, Wang

, Xu

Automated 2-D cephalometric analysis on X-ray images by a model-based approach. IEEE Trans Biomed Eng . 2006;53(8):1615–1623. 10.1109/TBME.2006.876638.

28.

Liu

, Chen

, Cheng

. Accuracy of computerized automatic identification of cephalometric landmarks. Am J Orthod Dentofac Orthop . 2000;118(5):535–540.

29.

Hutton

, Cunningham

, Hammond

An evaluation of active shape models for the automatic identification of cephalometric landmarks. Eur J Orthod . 2000;22(5):499–508.

30.

Grau

, Alcañiz

, Juan

, Monserrat

, Knoll

. Automatic localization of cephalometric landmarks. J Biomed Inform . 2001;34(3):146–156.

31.

El-Feghi

, Sid-Ahmed

, Ahmadi

Automatic localization of craniofacial landmarks for assisted cephalometry. Pattern Recognit . 2004;37(3):609–621.

32.

Chen

, Chen

, Yao

, Chang

. The effects of differences in landmark identification on the cephalometric measurements in traditional versus digitized cephalometry. Angle Orthod . 2004;74(2):155–161. 10.1043/0003-3219(2004)074<0155:TEODIL>2.0.CO;2

33.

Mohseni

, Kasaei

Automatic localization of cephalometric landmarks. 2007

IEEE International Symposium on Signal Processing and Information Technology ; Giza, Egypt, 2007:396–401. 10.1109/ISSPIT.2007.4458132.

34.

Leonardi

, Giordano

, Maiorana

, Spampinato

. Automatic cephalometric analysis. Angle Orthod . 2008;78(1):145–151. 10.2319/120506-491.1

35.

Mirzaalian

, Hamarneh

Automatic globally-optimal pictorial structures with random decision forest based likelihoods for cephalometric X-ray landmark detection

IEEE International Symposium on Biomedical Imaging (IEEE ISBI) 2014:1–12.

Automated Cephalometric Landmark Detection: A Novel Software Model Compared with Manual Annotation Method

Abstract

Introduction

Aim

Methodology

Result

Conclusion

Keywords

Introduction

Materials and Methods

Materials

Description of Datasets

Inclusion Criteria

Exclusion Criteria

Proposed Method

Mean Deviation Calculated on Observers Manual Landmarking Done by Experienced Orthodontists.

Error Calculated on CNN Automated Landmarking from the Established Ground Truth.

Landmarks Selected for Detection.

Trained 19 Models with the Same Architecture but Different Weights.

3(a). Example of the Training Image. Yellow Boxes are Cropped Training Images. 3(b) The Cropped 512x512 Patches. Note that the Cropped ROI Patches for Each Landmark: Rot Dot is the Target Landmark, The Yellow Boxes are 512x512 ROI Patches.

CNN Model

The Architecture of the EfficientNetB7 Shows the Conventional Layers Present in the Network.

Evaluation of the Result

Calculation of RMSE Score

Calculation of R 2 Score

Results

RMSE Score for 19 Landmarks. The Blue Line Shows the Mean Error that Occurred During the Manual Annotation Method; the Red Line Shows the Mean Error that Occurred During Automated Detection of Landmarks.

Performance Comparison Between Manual and Automated Landmark Identification Based on R2-Score.

R2 Score for Manual and Automated Landmark Annotation Method. The Blue Bar Shows the Performance of Manual Annotation, Red Bar Shows the Performance of the Automated Model.

Discussion

Limitations of the Research

Conclusion

Footnotes

Declaration of Conflicting Interests

Ethical Approval

Funding

Informed Consent

ORCID iD

References

Calculation of R² Score

Performance Comparison Between Manual and Automated Landmark Identification Based on R²-Score.

R² Score for Manual and Automated Landmark Annotation Method. The Blue Bar Shows the Performance of Manual Annotation, Red Bar Shows the Performance of the Automated Model.