Abstract
Aim:
The present study is aimed to assess the segmentation success of an artificial intelligence (AI) system based on the deep convolutional neural network (D-CNN) method for the segmentation of masseter muscles on ultrasonography (USG) images.
Materials and Methods:
This retrospective study was carried out by using the radiology archive of the Department of Oral and Maxillofacial Radiology of the Faculty of Dentistry in Ankara University. A total of 195 anonymized USG images were used in this retrospective study. The deep learning process was performed using U-net, Pyramid Scene Parsing Network (PSPNet), and Fuzzy Petri Net (FPN) architectures. Muscle thickness was assessed using USG by manual segmentation and measurements using USG’s software. The neural network model (CranioCatch, Eskisehir-Turkey) was then used to determine the muscles, following automatic measurements of the muscles. Accuracy, ROC area under the curve (AUC), and Precision-Recall Curves (PRC) AUC were calculated in the test dataset and compare a human observer and the AI model. Manual segmentation and measurements were compared statistically with AI (P < .05). The Mann–Whitney U test was used to analyze whether there is a statistically significant difference between the predicted values and the actual values.
Results:
The AI models detected and segmented all test muscle data for FPN and U-net, while only two cases of muscles were not detected by PSPNet (false negatives). Accuracies of FPN, PSPNet, and U-net were estimated as 0.985, 0.947, and 0.969, respectively. Receiver operating characteristic scores of FPN, PSPNet, and U-net were estimated as 0.977, 0.934, and 0.969, respectively. The D-CNN measurements of the muscles were similar to manual measurements. There was no significant difference between the two measurement methods in three groups (P > .05).
Conclusion:
The proposed AI system approach for the analysis of USG images seems to be promising for automatic masseter muscle segmentation and measurement of thickness. This method can help surgeons, radiologists, and other professionals such as physical therapists in evaluating the time correctly and saving time for diagnosis.
Introduction
The masseter, which is the muscle that elevates the mandible, is one of the most affected muscles because of bruxism. Inflammation of the muscle, chronic local muscular contracture, and localized muscular hypertrophy, which may in turn cause myofascial pain, are some of the many negative effects bruxism has on the masseter muscles.1,2 Studies have reported that bruxism is seen in approximately 8% to 31% of the adult population.3,4 Despite recent advances in the treatment and rehabilitation of bruxism and masseter muscle pathologies, the question of an effective treatment or rehabilitation approach is still unanswered. Medical professionals are constantly engaged in finding a novel treatment, rehabilitation protocols, and techniques that could optimize the rehabilitation time and advance healing. One of the most important underlying principles in a successful treatment is performing correct medical diagnosis and assessment.1–4
The ultrasound is one of the mostly used methods for the assessment of muscle hardness in bruxism. Unlike a physical examination, which allows only the subjective judgment of a lesion, ultrasound has the potential to quantify muscles.5,6 Studies evaluating muscle thickness using ultrasonography (USG) and muscle thickness/width and cross-sectional area using computed tomography (CT) have been conducted to assess changes in masseter morphology associated with orthognathic surgery.7–10 Even though these assessments bear great importance in the diagnosis and follow-up of bruxism, they are often affected by the difference in application techniques and the experience of the radiologist in image processing. Because elastography measurements are still operator-device dependent and the measurements have nonlinear results, different results may be obtained. 11
In the past few decades, advances in medical imaging technology and the increase in the role of imaging within the diagnostic process have led to a rapid increase in the use of artificial intelligence (AI). AI is widely used with different aims such as assessing the risk, detecting or diagnosing diseases, tracking prognosis, and therapy response. 12 Out of the several developed models for AI image analysis, “deep learning” is one of the latest advancements. Deep learning resembles the workings of the human brain in data processing and is used in creating patterns for decision making. 13 Deep learning has networks capable of learning from data and potentially detects lesions automatically, suggests differential diagnoses, and generates preliminary radiology reports. The “deep” aspect of deep learning refers to the multilayer structure of multilayer perceptions. By stacking multiple layers, a hierarchy of features that are a complex composition of low-level input features can be represented. 14 A hierarchical level of artificial neural networks (ANNs) is utilized to learn to distinguish patterns of data. This hierarchical function of deep learning systems enables data processing with a nonlinear approach.15,16 Deep learning techniques have recently been introduced for the analysis of medical images and have shown promising results in various applications such as segmentation and registration. Recent studies on this technology suggest that it may potentially lead to an overall augmentation of radiology practice, as it will complement irreplaceable and remarkable human skills. Deep learning is expected to help radiologists in providing an accurate diagnosis, by delivering a quantitative analysis of suspicious lesions, and may enable them to read the images in a shorter time because of automatic report generation. 17
Therefore, the present study is aimed to assess the segmentation success of an AI system based on the deep convolutional neural network (D-CNN) method for the detection and segmentation of masseter muscles on USG images.
Materials and Methods
Data Preparation
USG images of 24 patients admitted to the Ankara University’s Faculty of Dentistry who were diagnosed with bruxism by a diagnostic dentist were used in this retrospective study between October 2019 and April 2020. The diagnosis of bruxism was based on the criteria defined by Koyano et al. 18 via questionnaires and clinical findings. According to the power analysis, at least 40 image data for AI application reached 80% power. However, 195 anonymized USG images from patients who only had bruxism were used in this study.
Ultrasonography Images Dataset
Ultrasound imaging was performed by using a high-resolution ACUSON S2000 ultrasound machine (Siemens, Munich, Germany) with a 9L4 Linear Transducer. Measurements and analyses were performed by a single radiologist with eight years of experience. Bilateral masseter muscles were measured in the resting position. During the USG procedure, the head of the patient was adjusted to the measured side. While imaging, the probe was held perpendicular to the muscle mass, and the anterior–posterior direction thickness and muscle volume were measured in transverse sections. Muscle thickness was measured from three different points of the masseter muscles and the data is stored in USG machine’s system for further analysis (Figure 1).
Image Annotation Used as the Ground Truth
The same oral and maxillofacial radiologist performed annotations of images for ground truth using Colabeler annotation software (MacGenius, Blaze Software, CA, USA). The bounding boxes method (polygonal boxes) was used to define the location of the masseter muscles.
Deep Convolutional Neural Network Architecture
The deep learning process was performed using U-net, Pyramid Scene Parsing Network (PSPNet), and Fuzzy Petri Net (FPN) architectures.
Model Pipeline
In this study, an AI algorithm (CranioCatch, Eskisehir-Turkey) was developed to perform the automatic segmentation of masseter muscles (Figure 1).
AI Model (CranioCatch, Eskisehir-Turkey) Pipeline for Masseter Muscle Detection and Segmentation in USG Images
Training Phase
The images were randomly divided into:
Training group: 157 images Validation group: 18 images Test group: 20 images.
CranioCatch’s proposal (Eskisehir-Turkey) is employed for the detection and segmentation of masseter muscles using U-net, PSPNet, and FPN architectures. In the training, the following parameters were used in common for each algorithm:
ENCODER = ‘se_resnext50_32x4d’
ENCODER_WEIGHTS = ‘ImageNet’
ACTIVATION = ‘sigmoid’
s = 0.0001
Measurement of Masseter Muscle Thickness
A rule-based pixel measurement was used in the measurement of masseter muscle thickness. The muscular region was found by contour detection on the image. Pixel-based dimensions of these sections on the image were calculated by determining three vertical sections from the right and left regions of the muscle region horizontally (Figure 2). By comparing the found dimensions with real values in centimeters, the ratio was obtained. The lengths in pixels of the test data were converted to centimeters using this ratio.
The Images Show the Masseter Muscle Segmentation Performed Using AI Models (CranioCatch, Eskisehir-Turkey)
Statistical Analysis
Accuracy, Area under the ROC curve (AUC), and Precision-Recall Curves (PRC) were calculated in the test dataset to compare human observer and AI model. Then we used four indicators as well including P [precision = true positives/(true positives + false positives)], R [recall = true positives/(true positives + false negatives)], f1-score [f1-score = P*R*2/(P+R)], and support (total number in test set) to evaluate the performance of classifiers. The Mann–Whitney U test was used to analyze whether there is a statistically significant difference between the predicted values and the actual values. Bias is prevented by double-sided blinding. Statistical analyses were evaluated using the IBM Statistical Package for Social Sciences 21.0 (SPSS, Chicago, IL) program. For a statistical analysis, a computer with Windows 10 64-bit operating system, quad-core Intel Skylake Core i5-6500 CPU 3.2 GHz 6 MB Cache, and one 8 GB 2400 MHz DDR4 RAM memory was used.
Results
AI models were used to segment and measure the thickness automatically using test data. The AI models detected and segmented all test muscle data for FPN and U-net, while only two cases of lesions were not detected by PSPNet (false negatives). Accuracies of FPN, PSPNet, and U-net were estimated as 0.985, 0.947, and 0.969, respectively. Receiver operating characteristic (ROC) scores of FPN, PSPNet, and U-net were estimated as 0.977, 0.934, and 0.969, respectively (Table 1). Because the FPN/PSPNet/U-net model segmented all muscles, the masseter muscle measurements were performed using these models. The thickness was measured at the same points that were measured and stored in the USG machine’s software (Figure 2). The average thickness measurements of the bilateral masseter muscles were taken. These were calculated as a mean of three measurements using USG own’s software (Figures 3A and B). Table 2 shows the comparison of observer measurements and the FPN/U-net model. The results show a high agreement between AI and human observers without any significant difference (P = .2 for the right, P = .76 for the middle, P = .42 for the left; df = 38; Figure 4). The ROC and PRC graphics show the performance of the models (Figure 5).
Evaluation for Diagnostic Performance by Three AI Models Set for Masseter Muscle Segmentation
The Comparison of Observer Measurements and the FPN/U-net Model. The Results Show a High Agreement Between AI and Human Observers Without Any Significant Difference
(A) and (B). The Images Show the Masseter Muscle Measurements Performed Using AI Models (CranioCatch, Eskisehir-Turkey)
The Plotted Graphics Show a Similar Distribution of the Human Observer (Original) and AI Measurements (Estimated)
The ROC and PRC Graphics Show the Performance of the Models
Discussion
In this study, we used ANNs to evaluate the masseter muscles in patients who have a diagnosis of bruxism. The advantage of validation of the ANN based on simulated experimental data and not based on the actual experiments is that the evaluation of masseter muscles using elastography is already known, and therefore, comparisons between simulated data and ANN output can be more easily made. The results of the noise-free ANN prove that the ANN is capable of predicting the parameters of the evaluation of masseter muscles in patients with bruxism.
Physiotherapy is one of the preferred treatment options in patients with bruxism. In initial diagnosis and posttreatment of follow-up, musculoskeletal soft tissue imaging is necessary to assess the morphologic and metabolic status of soft tissues. When the measurements of ultrasonic thickness of masseter muscles between patients with bruxism and healthy people are compared, its thickness in people with bruxism was found to be higher than the healthy subjects.19,20 In recent years, elastography has gradually been used for examining the mechanical properties of musculoskeletal tissues. This technique takes advantage of changing soft tissue elasticity in different pathologies to provide qualitative and quantitative information that can be used for diagnostic purposes and follow-up evaluations during rehabilitation. 21 Although there are existing quantitative measurements for masseter muscle stiffness with elastography, there is little evidence-based published data available for this assessment. In evaluation, data acquisition and interpretation are largely operator dependent, which is the major limitation of this imaging system. However, in case of sono-elastography, a wide variety of techniques can be used to display elastographic images, and therefore the findings, as well as the artifacts or limitations, may be highly dependent on the technique.22,23
So far, the most widely used option for the interpretation of indentation data has been the use of analytical relationships. However, the application of analytical relationships is often associated with many simplifying assumptions as they are not available for all practical indentation experiments. Only a few studies investigate the use of ANN in the evaluation of muscles. Even though the muscles were not directly assessed, it is possible to find some studies in which imaging techniques were investigated. Jamaludin et al. 24 and Olczak et al. 25 used machine learning models in the classification of pathophysiological changes such as fractures or spinal degeneration in X-ray and magnetic resonance imaging (MRI) images and concluded that there was an accuracy equal to or greater than human experts. Similarly, Ashinsky et al. studied machine learning models of the evaluation of prospective MRI images of knee cartilage and predicted the onset of arthritis 36 months before it was identifiable to human observers with 75% accuracy. 26 When the literature regarding orthopedic studies is examined, ANNs perform relatively the same as or better than orthopedic surgeons in the detection of fractures of the proximal humerus, hand, wrist, or ankle on X-ray images.25–27
Various studies have also evaluated the performance of deep learning image classification for the diagnosis of dentomaxillofacial conditions. The performance of these AI systems was not significantly different from that of radiologists, suggesting that these systems could be a useful method for diagnostic support.28,29 Deep learning systems are also capable of detecting the impact of systemic diseases on oral tissues. A D-CNN-based computer-aided diagnosis system showed strong agreement with experienced oral and maxillofacial radiologists in detecting osteoporosis; this system could also provide information to dentists for the early detection of osteoporosis, allowing asymptomatic patients to be referred to the appropriate medical professionals for preventive care. 30 A recent study by Murata et al. 31 showed that deep learning systems could diagnose maxillary sinusitis on a panoramic radiograph at a rate comparable to that of radiologists and that their performance was superior to that of dental residents’ capabilities.
To the best of our knowledge, it is the first proposed model that segments and measures the masseter muscles on USG images in the literature, for which the current algorithm achieved a high performance and accuracy rate. It was found that ANN can be used for the automatic segmentation and evaluation of masseter muscles. Such muscle evaluations can be used for automatic detection before orthognathic surgery and even for predicting the changes after surgeries. USG, CT, and MRI imaging techniques are used to determine muscle dimensions.32–35 Because of its advantages being free of ionizing radiation and easy applicability, the USG can be preferred to analyze muscle and dimensions. However, because of its dynamic evaluation nature, the evaluations can be changed among the USG performers (such as radiologists, USG technicians, etc.). Moreover, because of a complex anatomy, USG evaluations can also be challenging for inexperienced radiologists or technicians. In hospital information systems, in which almost all records are digital, physicians must maintain accurate dental records. Depending on the experience and attentiveness of the physicians, misdiagnosis or inadequate diagnosis can occur in busy clinics. For these instances, such AI systems can help identify the anatomy as well as pathologies accurately.36–39
Radiological AI systems aim to create automated, routine, simple evaluations so that radiologists and technicians can perform medical diagnosis and read images faster, saving time for more complex cases. To standardize the interpretation of images and provide equal service in diagnostic accuracy, anonymous datasets could be made open to access, which can be helpful in situations where there is limited radiological expertise. 40 The proposed model had high sensitivity and precision rates similar to human examiners. Thus, it can save time by enabling the automatic segmentation and measurement of masseter muscles. As a limitation in our study, there were no control USG images. It would be more valid if the AI system could have tested on USG images of patients without bruxism, which would further validate the AI system diagnostic robustness in future studies.
Conclusion
In conclusion, currently, radiologists are under a great deal of pressure because they have to diagnose a greater number of cases, or more complicated cases as compared to the past. AI can be an opportunity for radiologists to overcome these challenges. The proposed AI system approach for the analysis of USG images seems to be promising for automatic masseter muscle segmentation and measurements. The presented algorithm had high sensitivity and precision values like human observers. This method can help surgeons, radiologists, and other professionals such as physical therapists for making the right use of time and saving time for diagnosis.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
