Abstract
This is a visual representation of the abstract.
Introduction
Scoliosis is a complex 3D deformity of the spine that affects 2% to 3% of the general population. The deformity has a direct functional and cosmetic impact on the individual. 1 It can occur in an idiopathic scenario or be congenital in nature with the presence of vertebral anomalies. The current reference standard for measuring the extent of scoliosis relies on measurement of the Cobb angle which is the degree of lateral spinal curvature from a straight plumb line, with an angle of 10° or more being considered scoliosis.2,3
Cobb angles are measured manually by human specialists which is time consuming and harbours limitations such as interobserver and intra-observer variability as well as inaccuracy of measurements. Studies have demonstrated greater than 5% variability of Cobb angle measurements between observers. 3 Additionally, selection of the upper and lower scoliotic curvature vertebral levels for measurement of Cobb angles and the decision about the primary/secondary curvatures can vary between observers further exasperating the difference in measurements. 3 Limitations of existing techniques to measure Cobb angle in scoliosis include (1) human variation of measurements, with an intra- and interobserver variation of at least 5° to 10°, most likely due to subjective errors in the operator’s selection of the vertebral levels 4 , for which machine learning (ML) methods can be helpful as support tools, (2) equipment errors; and (3) image errors related to technical variation, for example, inconsistency of rotation of patients between examinations in follow-up radiographics which can result in up to 20° variation. 5 For the latter 2 factors ML methods are less helpful at the present. These limitations can lead to both under-measured and over-measured Cobb angles which may result in lack of timely referral for bracing management at a time the scoliotic deformity progression can be halted or minimized and the patient can be saved from surgery. Therefore, developing ML tools for providing accurate measurements of Cobb angles as a complement to the role of radiologists is key.
Different groups of investigators have developed automated Cobb angle estimators using a variety of techniques, including direct estimation methods and multi-view extrapolation Nets (MVE-Net).6-8 Of these approaches, segmentation-based methods have shown limitations due to multiple error transmission. 8 Direct estimation methods have had success in measuring angles on anterior-poster (AP) radiographs, but showed inefficiencies in the assessment of multiple views as they cannot combine the data from AP and lateral views. 8 MVE-Net has had great success in accurately measuring Cobb angles using multi-view radiographs.6,8
Whilst datasets do exist to allow for the testing and validation of Artificial Intelligence (AI) Cobb angle estimators, these pre-existing tools do not allow for testing in specific clinical scenarios. For example, the largest available published scoliosis Cobb angle database tool does not detail patient age, an important factor in the setting of pediatric/adolescent musculoskeletal assessment, given the rapid visual changes that occur on radiographs with respect to ossification during this life phase. 6 Further to this, to our knowledge, few if any previously published databases allows for validation of Cobb angle estimators in specific clinical scenarios, with validation required to assess performance of ML estimators in the presence of congenital vertebral segmentation anomalies.
The aim of this study was to validate an ML model for automatic detection of Cobb angles on 3-foot standing spine radiographs of children and adolescents with clinical suspicion of scoliosis across 2 common clinical scenarios (idiopathic scoliosis, group 1 and scoliosis in the context of vertebral segmentation anomalies, group 2). This is to ensure appropriate validation of ML models for detection of Cobb angles on 3-foot standing spine radiographs of patients with clinical suspicion of scoliosis across 2 common clinical scenarios in the pediatric setting (both idiopathic scoliosis and scoliosis in the context of vertebral segmentation anomalies). We hope this model could serve as an ancillary tool to double check manually measured Cobb angles by less experienced radiologists.
Methods
Ethics Board Approval
Approval for this retrospective study was granted by our institution Research Ethics Board (application #1000078917). Waiver to patient consent was obtained given the retrospective nature of the study and its sample size.
Patient Population
This study included consecutive pediatric patients aged 2.0 to 18.0 years who underwent a 3-foot spine radiograph for scoliosis across a 10-year period from January 2013 to January 2022. Our institution is a tertiary pediatric hospital, with a dedicated scoliosis service within the Orthopedic Surgery Department.
Inclusion criteria for group 1 were patients aged 2.0 to 18.0 years whose 3-foot standing radiograph was obtained in standing position without any evidence of congenital vertebral anomaly in the given time period.
Inclusion criteria for group 2 were patients aged 2.0 to 18.0 years with a 3-foot standing radiograph that had been obtained in the standing position and that presented with a congenital vertebral anomaly such as hemivertebra (Figure 1), butterfly vertebrae (Figure 2), abnormal number of vertebrae (Figures 3 and 4), and fused vertebrae in the given time period.

Frontal projection 3-foot spine radiograph obtained in a 12 year-old male performed for clinical suspicion of scoliosis demonstrating a T11 left hemivertebra (arrow, A). Vertebral bodies with 3 vertebral body corners (such as in the setting of hemivertebra) were not assigned landmark coordinates (arrow, B).

Frontal projection 3-foot spine radiograph obtained in a 13 year-old female performed for clinical suspicion of scoliosis demonstrating a T10 butterfly vertebrae (arrow, A). All vertebral bodies with 4 corners (including butterfly vertebra) were given landmark coordinates (arrow, B).

Frontal projection 3-foot spine radiograph obtained in a 12 year-old female performed for clinical suspicion of scoliosis demonstrating an additional thoracic vertebrae (T13) (arrow, A). In the setting of congenital anomalies, thoracic vertebrae were considered to be rib-bearing vertebra, with lumbar vertebrae considered to be non-rib bearing vertebrae. Landmark coordinates were assigned to the additional thoracic vertebrae (arrow, B).

Frontal projection 3-foot spine radiograph obtained in a 17 year-old female performed for clinical suspicion of scoliosis demonstrating an absent thoracic vertebral body with the inferior-most thoracic vertebra, T11 (arrow, A). In the setting of congenital anomalies, thoracic vertebrae were considered to be rib-bearing vertebra, with lumbar vertebrae considered to be non-rib bearing vertebrae. Landmark coordinates were assigned to the 11th thoracic vertebra (arrow, B).
Exclusion criteria were those patients with supine or seated position radiographs, those patients with films of inadequate exposure to enable visualization of the vertebral endplates and patients with previous spinal surgery.
Patient Identification
Patients who underwent a 3-foot spine radiograph for scoliosis were retrospectively identified via the natural language processing radiology report search function of Bialogics Analytics Platform, Aurora, Canada. The search term “scoliosis,” was used to search “3-foot spine radiograph” studies between the 10-year period of January 1, 2013 and January 1, 2022. The digital medical imaging report of each patient was then accessed to obtain further demographic information such as patient age and sex information.
Protocol for Radiographs
Radiographs were obtained using an EOS Imaging System, Paris, France which is an ultra-low dose 3-D technique that allows for upright, standing position, weight-bearing radiographs of the spine. The thoracolumbar spine is obtained in one image, rather than stitching together multiple images. Whilst this technique obtains both frontal and lateral projections within our institution’s standard protocol, only the frontal projections were assessed for the study, as this is the view from which Cobb angles are calculated. Although studies have shown that multi-view such as the one obtained with 3D measurements provides Cobb angles on average 9.2° larger than those obtained with 2D radiography which impacts variability of interpretability of Cobb angles, 9 this study had a pilot design, we preferred to have a standardized view that would be available in most radiographs requests for assessment of scoliosis in children and adolescents. Also, by keeping standardized a single 2D frontal X-ray acquisition method we hoped to avoid expected differences in Cobb angle measurements. Other authors have also used frontal X-ray views only for scoliosis measurements of Cobb angles while assessing ML methods.10,11
Data Analysis
Cobb Angle Assessment
Cobb angles were plotted digitally on a General Electrics (GE) Centricity digital Picture Archiving and Communications System (PACS) workstation, Bothell, WA. Cobb angles were calculated by identifying the end vertebrate as those with the maximal tilt toward the apex of the curve. Tangents to the superior endplate of the superior end vertebra and inferior endplate of the inferior end vertebra were drawn. The angle between these 2 tangents was recorded. Vertical lines were then drawn along the axis of the curvatures of the spine, and the angle between these 2 tangents was reviewed as an internal check of the original Cobb angle recorded. 12 Only the dominant Cobb angle was recorded, as this is the clinically significant angle that dictates management. 13
All Cobb angles were assessed by 2 Board Certified radiologists’ readers (S.S., 7 years experience in radiology and multiple radiologists of our institution with 5 to +20 years of experience in radiology after training), with one measurement performed prospectively, and the other measurement recorded after being previously performed and documented in the medical imaging report. If there was a discrepancy of more than 5° between Cobb angles, the case was referred to a third reader, a musculoskeletal pediatric radiologist with 20+ years of experience in radiology after training (A.S.D.). The third reader recorded the Cobb angle. The mean of the readers’ measurements with the 2 closest angles was taken as the final Cobb angle measurement.
Machine Learning for Vertebrae Segmentation
Segmentation-based machine learning consists of classifying individual pixels into foreground and background, followed by extracting landmark coordinates (vertebral body corners) from the segmentations on a frontal view only (Figure 5). It has been shown that Augmented U-Net models with non-square kernels increase vertebral body corner landmark accuracy selection by introducing extra kernel constraints. 14 In this paper, we use pretrained Augmented U-Net models with non-square kernels14 (ML model) to segment vertebrae and calcualte Cobb angles for all the cases in group 1 and 2. The radiographs were cropped to exclude cervical vertebrae, which are rarely involved in spinal deformity. 15 To create training data for the ML model, each image was labelled with landmark coordinates of the thoracic and lumbar vertebral body corners by radiologists. All vertebral bodies with 4 corners were given landmark coordinates (Figures 1-4). Vertebral bodies with 3 vertebral body corners (such as in the setting of hemivertebrae) were not assigned landmark coordinates. In the setting of congenital anomalies, thoracic vertebrae were considered to be rib-bearing vertebrae, with lumbar vertebrae considered to be non-rib bearing vertebrae. The ML model using the Augmented U-Net architecture with non-square kernels 14 was trained using a different dataset (AASCE-MICCAI challenge 2019 dataset), 16 which contained 609 spinal AP X-Ray images along with the coordinates of 17 vertebrae from the thoracic and lumbar regions, provided by radiologists. The ML model was then applied to our dataset, and the resultant largest (dominant) Cobb angles were recorded.

Frontal projection 3-foot spine radiograph (A) obtained in a 14 year-old male performed for clinical suspicion of scoliosis. Each thoracic and lumbar vertebral body was segmented by the ML model, and the coordinates were identified (segmentation process, B).
Statistical Analysis
Data were reported using mean, standard deviation (SD) or median, range as appropriate.
The Symmetric Mean Absolute Percentage Error (SMAPE) was used as our evaluation metric to quantify the accuracy of the predicted dominant Cobb Angles. The equation for the SMAPE metric is given by
where N represents the number of samples, and A and B are the ground truth and predicted angles, respectively. 17 Lower SMAPE corresponds to better performance. 17
Normality of distribution of age was assessed using the Shapiro-Wilks test.
A P value <.05 was considered statistically significant.
Results
Patient Population
A total of 130 patients were included in this study, 50 patients for group 1 and 80 patients for group 2. The median age of the cohort was 11.0 years (Q1-Q3: 11.0-15.0; range: 2-17 years). The mean age of the idiopathic scoliosis cohort (group 1) was 13.40 years (SD: 2.01). The median age of the cohort with segmentation anomalies (group 2) was 13.0 years (Q1-Q3: 9.5-15.0) (Table 1).
Demographic Characteristics of the Study Population.
Note. Butterfly = butterfly vertebra; Hemi = hemivertebra; Additional = additional vertebra; Missing = missing vertebra; Fused = fused vertebra; F = female; M = male.
The ages for group 1 were normally distributed (P > .05) as a whole (P = .07) and when assessed for both males (P = .3) and females (P = .1) as separate cohorts. When comparing the ages for males and females in group 1 using t-test, there was no significant difference (P = .5). The ages for group 2 were not normally distributed (P < .05) as a whole (P < .0001) or when assessed for both male (P = .005) and females (P = .004) as separate cohorts. When comparing the ages for males and females in group 2 using the Mann Whitney U Test there was no significant difference (P = .09).
Spectrum of Congenital Anomalies Imaged
Table 1 details the breakdown of congenital anomalies present in patient population. There were 31 (38.75%) patients with a single anomaly and 49 (61.25%) patients with multiple anomalies.
We carried out 3 experiments. First, the coordinates provided by the radiologists were used to calculate the Cobb angle, which was then compared to the Cobb angle manually calculated by the radiologists (Tables 2 and 3, column 1). Next, the Cobb angles automatically measured by the ML model were compared with the ones calculated using radiologists’ coordinates (Tables 2 and 3, column 2). Finally, the Cobb angles automatically measured by the ML model were compared with the ones manually calculated by the radiologists (Tables 2 and 3, column 3).
SMAPE Achieved and Absolute Differences in Degrees for the Whole Dataset and for the Dataset With and Without Anomalies.
Note. diff = difference; ML = machine learning; SMAPE = symmetric mean absolute percentage error. Shaded values are the breakdown of absolute differences in degrees.
SMAPE Achieved and Absolute Differences in Degrees for Each Assessed Congenital Anomaly.
Note. diff = difference; ML = machine learning; SMAPE = symmetric mean absolute percentage error. Shaded values are the breakdown of absolute differences in degrees.
Table 2 shows the SMAPE achieved and absolute differences in degrees for the whole dataset and for the dataset with and without anomalies.
Table 3 reveals the SMAPE achieved and absolute differences in degrees for each assessed congenital anomaly.
Quantitative Result for Cobb Angle Measurement Compared to Human Reader: Whole Data Set
When applied to our whole data set (N = 130), the ML model achieved a SMAPE of 11.82% as compared to being manually measured by radiologists.
Out of all study cases, 112 (86.15%) cases had less than 10° of absolute difference, and 79 (60.77%) cases, less than 5° of difference. When the results of the ML model were compared with the ones calculated using radiologists’ coordinates, the SMAPE decreased to 9.62%.
Quantitative Result for Cobb Angle Measurement Compared to Human Reader: Without Congenital Anomalies (Group 1)
In the setting of idiopathic scoliosis, the architecture achieved an overall SMAPE of 13.02% when compared to human radiologist readers (Table 2). Ninety-six percent of predictions had less than 10° of absolute differences from the human reader ground truths, and 72% of them, less than 5°. When the results of the ML model were compared with the ones calculated using radiologists’ coordinates, the SMAPE decreased to 8.18%.
Quantitative Result for Cobb Angle Measurement Compared to Human Reader: With Congenital Anomalies
In the setting of congenital spine anomalies, the architecture achieved an overall SMAPE of 11.90% when compared to human radiologist readers (Table 2). Eighty percent of predictions had less than 10° of absolute differences from the human reader ground truths, and 53.75%, less than 5°. When the results of the ML model were compared with the ones calculated using radiologists’ coordinates, the SMAPE decreased to 10.53%.
Discussion
The aim of our study was to create an ML model for automatic detection of Cobb angles on 3-foot standing spine radiographs of children and adolescents with clinical suspicion of scoliosis both idiopathic and in the setting of congenital anomalies. We retrospectively measured Cobb angles of 105 patients both manually by radiologists and by ML segmentation-based approach using Augmented U-Net model with non-square kernels.
The ML model used in this study achieved a SMAPE of 11.82% amongst the total cohort and when stratified for idiopathic scoliosis SMAPE of 13.02% and for congenital anomalies, 11.90% were achieved respectively (Table 2). It was interesting to observe that when the coordinates provided by the radiologist were used to calculate the Cobb angle, and then compared to the Cobb angle manually calculated by the same radiologist, they did not align completely. In fact, there was a SMAPE of 8.75% between 2 different measurements by the same radiologist. This amount of intra-reader variability indicates the complexity of the task at hand, and puts the performance achieved by the ML model (11.82%) in perspective (Table 2). Some may argue about counterintuitive results in this study as it would be expected concerning the SMAPE measure of accuracy of our ML model for cases with and without congenital spine anomalies. In this study, the our ML model achieved a better SMAPE for cases with congenital anomalies (overall SMAPE of 11.90%, Table 2) than for those with idiopathic scoliosis (without congenital anomalies, overall SMAPE of 13.02%, Table 2). Conversely, we would hypothesize that there would be less significant discrepancy in Cobb angles for idiopathic scoliosis cases than for congenital anomalies’ cases. First, although we were not able to perform cross validation of our results due to limited test data, the SMAPE results are close for 2 aforementioned scenarios and thus, most likely are not statistically significant. Second, the sample size of this study is small (50 cases with no anomalies and 80 cases with anomalies), which prevents from proper comparison since each dataset may have a different subset of cases (e.g., harder vs easier cases for interpretation). Hence, the model’s performance differences could be due to the small sample size of this study. Finally, the counterintuitive SMAPE results of our study could also be due to discrepancy of Cobb angles’ measurements manually obtained by radiologists. There was a SMAPE of 8.75% between the Cobb angles calculated based on radiologists’ coordinates versus manually measured Cobb angles by radiologists (Table 2). Our ML model was trained using the coordinates given by the radiologists and then compared against the Cobb angles manually measured by the radiologists. Thus, there is an inherent discrepancy in the way the ground truth data for the ML model was created and how the radiologists manually measured the Cobb angles of study cases. When the results of the ML model were compared with the ones calculated using radiologists’ coordinates, the ML model achieved a better SMAPE for cases with idiopathic scoliosis (without congenital anomalies, overall SMAPE of 8.18%, Table 2) than for those with congenital anomalies (overall SMAPE of 10.53%, Table 2).
Demographics
Our population subset of male and female spread is reflective of the known female predominance of scoliosis. 18 In this study, patients with idiopathic scoliosis were older than those with congenital scoliosis, as expected given that congenital scoliosis anomalies are present from birth. It is important to assess the ability for ML to characterize scoliosis across the range of ages that are encountered in both congenital and adolescent idiopathic scoliosis, given the vast way in which bones change across radiographs with skeletal development. Only a few papers in recent years detailed ML approaches in the setting of scoliosis have presented age and sex information. Whilst both Wang et al 19 and Zheng et al 20 have published papers on ML in the setting of adolescent idiopathic scoliosis with detailed demographics information, to our knowledge our study is one of the fewer available papers in the literature to present the demographics information in the setting of congenital scoliosis, which present from birth and thus presents a subset of patients with a greater degree of skeletal immaturity on radiographs.
Performance of Machine Learning Compared to the Radiologist in the Setting of Scoliosis Without Segmentation Anomaly (Group 1)
Similar to the previous study that used the proposed architecture of the current study, 14 the model achieved a low SMAPE in patients without congenital anomaly, as further discussed below when comparing SMAPEs to other state-of-the-art models.
Performance of Machine Learning compared to the Radiologist in the Setting of Scoliosis With Segmentation Anomaly (Group 2)
Previous studies have either included patients with adolescent idiopathic scoliosis or not specified if patients with congenital scoliosis were included, with some studies specifically detailing the exclusion of congenital scoliosis. 21 It has been previously postulated that variability in manual measurement of angles in congenital scoliosis is larger than that in adolescent idiopathic scoliosis due to skeletal immaturity, incomplete ossification, and anomalous development of the end-vertebrae. 20 With this in mind, it is important to assess the ML capability to augment the role of readers of spine radiographs in this subset of patients. Overall, the ML architecture used in this study achieved a better SMAPE for those patients with additional or missing vertebral bodies. The ML architecture achieved a larger SMAPE for those patients with hemivertebrae, fused or butterfly vertebrae. Compared with the patients without vertebral body anomaly, ML architecture used in this study performed better in the setting of anomaly presence.
Comparison to Previously Published State-of-the-Art Machine Learning Methods
We compared the performance of the ML model used in this study with state-of-the-art, regression, and segmentation methods.
The performance of the methods proposed in previous work was evaluated by comparing the results of the methods with the Cobb angles calculated using radiologist’s manual coordinates, as opposed to the Cobb angles directly measured by radiologists. Hence, for fair comparison, we use SMAPE calculated similarly for the ML model used in this study. Table 4 shows a list of prior established benchmarks and their achieved SMAPEs when applied to AASCE-MICCAI challenge 2019 dataset. 16 Compared with prior established benchmarks, when assessing mixed congenital scoliosis and idiopathic scoliosis cases collected in this study, the ML model achieved the lowest SMAPE of 9.62% and SMAPE of 8.18% with a clean dataset and no congenital anomalies, which is comparable with previously reported results for the model 14 applied to the AASCE-MICCAI challenge 2019 dataset 16 (SMAPE of 9.2%). Despite containing congenital scoliosis cases, which are more variable in Cobb angle assessment even when performed manually, 22 the SMAPE is significantly lower than the previous segmentation-based benchmark (SMAPE of 16.5%). 10
Prior Established Benchmarks and Their Achieved SMAPEs.
Note. SMAPE = symmetric mean absolute percentage error.
The regression-based approaches have provided contradictory performance. Traditional landmark coordinates prediction methods, such as Extrapolation net 8 , Faster-RCNN 23 with DenseNet, deliver rather mediocre results, with SMAPE values of 23.40% and 25.70%, respectively. However, the use of ResNet for predicting vertebral centroids and their corresponding offsets to the vertices has achieved an SMAPE of 10.80% 24 , which is higher than that achieved by the ML model used in this study when applied to mixed congenital scoliosis and idiopathic scoliosis cases (SMAPE of 9.62%), clean dataset with no congenital anomalies (SMAPE of 8.18%), or the AASCE-MICCAI challenge 2019 dataset 16 (SMAPE of 9.2%).
Limitations
This study is limited by its small sample size and preliminary nature. It may confer a sampling bias as the data were obtained from a single tertiary pediatric institution, whereby it is possible that the cases are likely to be more severe than in other community-based and primary and secondary healthcare settings. Therefore, the results of this study may be applicable to tertiary healthcare centres but may pose a selection bias for utilization of the study results into the diagnostic test performance of the ML model used in this study, particularly concerning positive predictive value and negative predictive value NPV of tests. It is well known that as the prevalence of the condition increases, the positive predictive value also increases and the negative predictive value decreases, and as the prevalence decreases the positive predictive value decreases while the negative predictive value increases.25,26 Furthermore, this study excluded all seated and very poor exposure films meaning that the model has not been tested for these scenarios.
A further limitation is the use of prior recorded Cobb angle information in the radiology report as the second reader as there was no opportunity for calibration of these measurements. This was somewhat mitigated by the introduction of a third reader in the case of a discrepancy between the pre-existing and retrospective reads.
Conclusion
The ML model used in this study is promising in providing automated Cobb angle measurement in the pediatric setting, for both congenital and idiopathic scoliosis scenarios as noted in the results of this study. Nevertheless, larger studies are needed in the future to confirm the results of this study prior to translation of this ML algorithm into clinical practice.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This research study was not funded by a research grant. Dr. Doria disclosed receipt of the following financial support unrelated to the conduct of the current research or publication of this article: Baxalta-Shire (Research Grant), Novo Nordisk (Research Grant), Terry Fox Foundation (Research Grant), PSI Foundation (Research Grant), Society of Pediatric Radiology (Research Grant), Garron Family Cancer Centre (Research Grant).
Ethics
Approval has been granted by our institution’s Research Ethics Board (application #1000078917).
