Abstract
Study Design
Retrospective diagnostic study.
Objectives
To develop a fine-grained classification model based on deep learning using X-ray images, to screen for scoliosis, and further to screen for atypical scoliosis patterns associated with Chiari Malformation type I (CMS).
Methods
A total of 508 pairs of coronal and sagittal X-ray images from patients with CMS, adolescent idiopathic scoliosis (AIS), and normal controls (NC) were processed through construction of the ResNet-50 model, including the development of ResNet-50 Coronal, ResNet-50 Sagittal, ResNet-50 Dual, ResNet-50 Concat, and ResNet-50 Bilinear models. Evaluation metrics calculated included accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for both the scoliosis diagnosis system and the CMS diagnosis system, along with the generation of receiver operating characteristic (ROC) curves and heatmaps for CMS diagnosis.
Results
The classification results for the scoliosis diagnosis system showed that the ResNet-50 Coronal model had the best overall performance. For the CMS diagnosis system, the ResNet-50 Coronal and ResNet-50 Dual models demonstrated optimal performance. Specifically, the ResNet-50 Dual model reached the diagnostic level of senior spine surgeons, and the ResNet-50 Coronal model even surpassed senior surgeons in specificity and PPV. The CMS heatmaps revealed that major classification weights were concentrated on features such as atypical curve types, significant lateral shift of scoliotic segments, longer affected segments, and severe trunk tilt.
Conclusions
The fine-grained classification model based on the ResNet-50 network can accurately screen for atypical scoliosis patterns associated with CMS, highlighting the importance of radiographic features such as atypical curve types in model classification.
Keywords
Introduction
Chiari malformation (CM), also known as cerebellar tonsillar herniation, is a congenital malformation at the craniovertebral junction characterized primarily by underdevelopment of the posterior cranial fossa and herniation of the cerebellar tonsils through the foramen magnum.1,2 CM can be divided into 4 types based on anatomical and radiological features, among which Chiari malformation type I (CMI) is the most common subtype. 3 Its characteristic feature is a small posterior cranial fossa with cerebellar tonsillar descent below the line of the foramen magnum by more than 5 mm, but normal development of the cerebellum and brainstem.4,5 Scoliosis is one of the most common complications in children with CMI (scoliosis secondary to CMI, CMS), with an incidence rate of 15-50%. Unlike adolescent idiopathic scoliosis (AIS), CMS patients often have syringomyelia, and even present with neurological symptoms, and the scoliosis is more likely to progress.5,6 Early posterior fossa decompression (PFD) surgery is necessary to halt disease progression, hence early diagnosis is necessary.
Full-spine X-ray is the widely used method for diagnosis of scoliosis, however, it is difficult to identify CMS from AIS using X-ray images. It has been reported that 2-26% of patients initially diagnosed with idiopathic scoliosis have neurological anomalies, with CMS and syringomyelia being the most common. 7 Missed or incorrect diagnoses can severely affect treatment outcomes. Magnetic Resonance Imaging (MRI) is the golden standard used for diagnosis of CMI, however, conducting full-spine MRI screenings for all patients with scoliosis does not conform to the principles of medical economics. 8 Therefore, it is of great importance to quickly and accurately identify CMS from AIS on X-ray images. While previous research has applied machine learning for Chiari malformation screening using MRI data, 9 to date, no studies have reported the use of X-ray radiographs for screening scoliosis associated with Chiari malformation. Previous research has found that CMS patients exhibit radiological features of neuromuscular scoliosis, with a higher incidence of atypical scoliosis types such as left thoracic, double thoracic, and left thoracic-right lumbar curves. 10 These atypical curve types are more frequently observed in CMS patients compared to AIS. Unlike typical idiopathic scoliosis, where the curve tends to be a right-sided thoracic curve, CMS often presents with unusual scoliosis patterns that may include more complex curve configurations and more severe trunk tilt. As a result, CMS is often categorized as a neuromuscular rather than idiopathic form of scoliosis, due to its distinct radiological features and early progression. However, the recognition of these atypical characteristics is based on rich clinical experiences, and most spine surgeons still face significant challenges in deciding when to perform a full-spine MRI to identify CMS. Early diagnosis is crucial, as timely identification of CMS could lead to early surgical intervention which may prevent further curve progression and alleviate potential neurological deficits. This highlights the clinical need for more efficient, accurate, and non-invasive screening tools that can help differentiate atypical scoliosis related to CMS from typical AIS.
Fine-grained image classification refers to the identification of subcategories within a major image category. The challenge and distinction of fine-grained image analysis tasks from general or generic image tasks lie in the finer granularity of the image categories involved. In recent years, fine-grained classification models based on deep learning technology have rapidly developed, with models such as Inception, 11 ResNet, 11 DenseNet, 12 and EfficientNet 13 emerging and achieving remarkable success in image recognition, natural language processing, medical image analysis, and bioinformatics. Our team’s previous research developed the Bilateral CNN fine-grained classification model, which can accurately identify the atrophic characteristics of scoliosis secondary to neurofibromatosis type 1 (NF1-S), thus diagnosing atrophic and non-atrophic NF1-S with an accuracy of 80.36%. 14
The objectives of this study are to: (1) develop a fine-grained classification model based on deep learning technology for accurate diagnosis of scoliosis from X-ray images; (2) apply the fine-grained classification model to achieve accurate identification of X-ray images of atypical scoliosis associated with CMS patients and explore radiological features that influence the model’s classification performance using heatmaps.
Materials and Methods
Subjects
This study protocol was approved by our hospital’s ethics committee (IRB20030115), and informed consent was obtained from all patients and their families. We conducted a retrospective analysis of patients diagnosed with CMS who underwent coronal and sagittal standing full-spine X-ray and cranial MRI examinations at our hospital from June 2016 to November 2023. The inclusion criteria were as follows: (1) diagnosed with CMS with syringomyelia, patients with AIS (case control), and normal adolescents (normal control, NC), all having complete clinical and radiological records; (2) underwent standing full-spine coronal and sagittal X-ray examinations for diagnostic research purposes at first visit. The exclusion criteria were as follows: (1) under 11 years of age or over 18 years old; (2) history of spine surgery or PFD surgery; (3) concurrent spinal trauma or spinal tumors; (4) congenital spinal developmental abnormalities. Finally, a total of 128 CMS cases, along with 200 age- and gender-matched AIS cases and 180 normal adolescents, were included in the study dataset, serving as the CMI group, AIS group, and NC group, respectively.
Datasets and Preprocessing
This study included data from 508 subjects at our hospital, comprising 256 coronal and sagittal X-ray images of 128 CMS patients, 400 coronal and sagittal X-ray images of 200 AIS patients, and 360 coronal and sagittal X-ray images of 180 normal adolescents. 70% of the subjects’ X-ray images were used for training, 20% of the subjects’ X-ray images were used for validation, with the remainder for testing. To ensure the image quality was suitable for deep learning, 100% original data in DICOM format were collected. Considering the higher number of AIS cases in actual classification tasks, the number of AIS patients in the test dataset was higher than that of CMS patients.
Training, Validation and Testing Dataset Volumes.
Experimental Environment and Hyperparameters
Model Hyperparameter Settings.
aThis study employs a learning rate scheduling strategy where the learning rate is reduced to 10% of its original value at epochs 30, 60, and 90. That is, the initial learning rate is 0.1, which is then reduced to 0.01 after the 30th epoch, further reduced to 0.001 after the 60th epoch, and finally lowered to 0.0001 after the 90th epoch.
Model Development
ResNet-50 Model Architecture
The image classification model constructed in this study is based on the structure of ResNet-50, developed by Microsoft Research in 2015. ResNet-50 is a classic deep convolutional neural network whose core idea involves introducing residual blocks to construct skip connections, effectively alleviating the problem of gradient vanishing in deep networks. ResNet-50 consists of a 50-layer network structure (Figure 1), mainly including the following 3 parts: (1) Backbone: Serving as the main structure (backbone) of the model, it contains 4 stages, with the output from the last stage used for feature extraction. (2) Neck: The “neck” structure after the output of the Backbone, where we employ Global Average Pooling (GAP). GAP operation compresses the feature map into a fixed-size vector while preserving its spatial information. (3) Head: For the classification head, we use a fully connected layer for classification, setting the number of output categories of the model to 3. To compute the loss for imbalanced datasets, we chose Focal Loss.15 When assessing the model’s performance, we utilized Top-1 accuracy. This modified ResNet-50 architecture for X-ray image classification consists of 4 stages of convolutional layers for feature extraction, followed by a classification head. Two variants are shown: ResNet-50 Concat, where outputs from 2 streams are concatenated before classification, and ResNet-50 Bilinear, which uses bilinear pooling of the streams’ features for final classification with a SoftMax layer.

X-ray Classification Model
To test the diagnostic performance of coronal and sagittal X-ray images, we constructed 5 ResNet-50 models, including ResNet-50 Coronal, ResNet-50 Sagittal, ResNet-50 Dual, ResNet-50 Concat, and ResNet-50 Bilinear. Among these, the first 3 are single-stream models, and the last 2 are dual-stream models: (1) ResNet-50 Coronal Model: A single coronal X-ray image is input into the ResNet-50 model to predict the scoliosis classification, where (2) ResNet-50 Sagittal Model: A single sagittal X-ray image is input into the ResNet-50 model to predict the scoliosis classification, where (3) ResNet-50 Dual Model: Both coronal and sagittal X-ray images are concatenated along their longest axis and then input into the ResNet-50 model to predict the scoliosis classification, where (4) ResNet-50 Concat Model: coronal and sagittal X-ray images are separately input into the ResNet-50 model, and the extracted image features are concatenated along the channel dimension (Figure 1) to predict the scoliosis classification using the merged features. Here, (5) ResNet-50 Bilinear Model: coronal and sagittal X-ray images are separately input into the ResNet-50 model, and the extracted image features are used to compute the tensor outer product (Figure 1) to predict the scoliosis classification using the merged features. Here,
Performance Evaluation
Diagnostic Test Indicators
Diagnostic test indicators are calculated based on the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) as follows: (1) Accuracy: Accuracy is the ratio of correctly predicted observations to the total observations. It describes the ability of the classifier to correctly predict all predictions. (2) Sensitivity (also known as Recall): Sensitivity describes the classifier’s ability to correctly identify positive instances. It is the proportion of all true positive instances that were correctly predicted as positive. (3) Specificity: Specificity describes the classifier’s ability to correctly identify negative instances. It is the proportion of all true negative instances that were correctly predicted as negative. (4) Positive Predictive Value (PPV, also known as Precision): The positive predictive value describes the proportion of instances predicted as positive that are actually true positive instances. (5) Negative Predictive Value (NPV): The negative predictive value describes the proportion of instances predicted as negative that are actually true negative instances.
X-ray Image Classification Tasks
Task One: Scoliosis Diagnosis System
CMS and AIS patients are considered positive cases, while normal adolescents (NC) are considered negative cases. The classification system’s accuracy, sensitivity, specificity, PPV, NPV, and the confusion matrix are calculated.
Task Two: CMS Diagnosis System
CMS patients are considered positive cases, while AIS patients and NC are considered negative cases. The classification system’s accuracy, sensitivity, specificity, PPV, NPV, and the confusion matrix are calculated. Additionally, 2 spine surgery experts (Expert 1 with 24 years of clinical experience and Expert 2 with 10 years of clinical experience) also assess the images to compare the gap in diagnostic capabilities with the model.
Visualization of Experimental Results
To explore how the ResNet-50 model makes decisions, we use Class Activation Mapping (CAM) 16 to “visualize” the image areas that the model focuses on when making decisions. We generate CAMs using the convolutional feature maps from the model’s last convolutional layer and the gradients of these feature maps for the target class. The generated CAM is a 2D array the same size as the original image, representing the importance of each area of focus for the model. To make it more readable for spinal surgeons, we convert the CAM into a heatmap and overlay it on the original image. In this heatmap, red areas indicate regions considered most important by the model, while blue or darker areas indicate regions considered less important.
Statistical and Data Analysis
Results are presented as single values, with means ± standard errors of the mean (SEM). The comparison between CMS and AIS groups is performed using independent sample t-tests, while comparison among 3 groups is performed using One-way analysis of variance (ANOVA), with the significance level α set at 0.05 for two-tailed tests. We performed statistical analyses with SPSS 22.0 software (SPSS Inc., USA). To evaluate the model’s performance in diagnosing CMS, we use Python (v3.8.16) and various Python packages (including NumPy v1.23.5, scikit-learn v1.3.2, Matplotlib v3.7.1) to plot Receiver Operating Characteristic (ROC) curves and calculate the Area Under the Curve (AUC).
Results
Demographic Information of the Datasets.
Task One: Scoliosis Diagnosis System
The Quantitative Evaluation of Scoliosis Diagnosis System.
aThe best-performing results highlighted in bold.

The confusion matrix for the classification system involving CMS, AIS, and NC: The vertical and horizontal axes represent the true labels (from top to bottom: CMS, AIS, and NC) and the predicted labels (from left to right: CMS, AIS, and NC), respectively.
Task Two: CMS Diagnosis System
The Quantitative Evaluation of CMS Diagnosis System.
aThe best-performing results highlighted in bold.

The ROC curves for diagnosing CMS with the ResNet-50 Coronal model (left) and the ResNet-50 Dual model (right).
Visualization Heatmaps Based on CAM
Figure 4 presents heatmaps generated by the ResNet-50 Dual model using CAM, showcasing the visualized image features of 2 correctly classified CMS patients. In the image, areas with higher weights are colored red, primarily located on the right side of the thoracic curve and the lumbar curve in the coronal X-ray images, as well as the thoracic kyphosis and the lumbar-sacral region in the sagittal X-ray images. Based on this figure, it can be inferred that atypical curve types, significant lateral shift of scoliotic segments, length of the affected segments, and severe trunk tilt are important factors influencing the model’s classification outcome. The heatmaps highlighting critical areas affecting accurate CMS classification by the ResNet-50 Dual model. (A) A 16-year-old male with CMS, characterized by atypical left thoracic curvature and severe vertebral segment lateral shift; (B) A 13-year-old female with CMS, characterized by an atypical long thoracic curve and significant trunk tilt.
Discussion
In recent studies, machine learning approaches have been used to assist in the identification of Chiari I malformation, often relying on MRI for diagnosis and surgical decision-making.9,17-19 Mesin et al 9 proposed a machine learning model that leverages morphometric indices extracted from sagittal MRI to predict patients at higher risk of syringomyelia and clinical deterioration following surgery. The study achieved a classification accuracy of 71%, demonstrating the potential of MRI-based machine learning approaches to support neurosurgical management.
Our approach significantly differs from the aforementioned studies, we employ full-spine X-ray images for detection of atypical scoliosis associated with Chiari malformation, making our method more accessible and cost-effective. Full-spine X-rays are routinely performed in clinical practice for scoliosis diagnosis, offering a readily available and non-invasive imaging modality that avoids the need for more expensive and resource-intensive MRI scans.
The pathogenesis of CMS is complex and remains not fully understood. Previous research by our team indicated that 70% of Chiari malformation patients have the primary direction of cerebellar tonsillar herniation aligned with the primary curve direction of their scoliosis. 20 Yeom et al found that 83% of patients with syringomyelia had cavities that were biased toward the same direction as the primary curve of their scoliosis. 21 Attenello et al discovered that for CMS patients with a Cobb angle less than 40°, posterior fossa decompression (PFD) surgery could halt the progression of scoliosis, whereas for those with a Cobb angle greater than 40°, deformity continued to worsen after PFD. 22 Based on these clinical observations, it can be inferred that early-stage CM and syringomyelia-induced neurological damage are significant factors contributing to the progression of scoliosis. Subsequently, our team, using multiple invigorated high-definition diffusion imaging methods and fiber disappearance technology, identified microstructural damage in spinal cord nerve fibers in CMS patients with syringomyelia,23,24 further corroborating clinical observations.
Therefore, compared to idiopathic scoliosis, CMS more frequently exhibits characteristics of neuromuscular scoliosis, including longer affected segments, significant lateral shifts in scoliotic segments, a higher occurrence of atypical curve types, and severe trunk tilt. These atypical features are distinct from those observed in AIS. Additionally, CMS is characterized by earlier onset, faster progression, higher brace treatment failure rate, more clinical symptoms, and a higher proportion of males,25-27 all of which can serve as clues for achieving precise diagnosis of CMS in clinical settings.
Studies have reported that among patients initially diagnosed with AIS, the prevalence of neuroaxial abnormalities detected by MRI varies considerably, typically ranging between 2% and 12%. Consequently, routine MRI screening is recommended for scoliosis patients to identify underlying neuroaxial anomalies, as these findings significantly impact treatment decisions. As highlighted by Heemskerk et al, 28 the presence of neuroaxial anomalies often necessitates neurosurgical interventions prior to scoliosis correction to prevent potential complications during deformity surgery.
Furthermore, missed diagnoses of Chiari malformation can lead to severe clinical consequences. Studies indicate that among patients undergoing spinal fusion, those with Chiari malformation combined with syringomyelia experience intraoperative neuromonitoring abnormalities—indicative of potential neural injury—at rates as high as 28%. 29 Additionally, untreated Chiari-associated syringomyelia may progressively enlarge, compressing or damaging spinal neural pathways and neurons, potentially leading to worsening symptoms or even tethered cord syndrome. Importantly, delays in diagnosing and managing these neuroaxial abnormalities can result in irreversible neurological impairment, emphasizing the necessity for timely PFD.
While MRI remains the gold standard for detecting neuroaxial abnormalities, its routine application imposes substantial economic costs and resource burdens. Overuse of MRI in asymptomatic patients or those without clear clinical indications can lead to unnecessary healthcare expenditures, patient anxiety, and potentially unwarranted interventions. Although current literature lacks direct comparative cost-effectiveness analyses between routine and selective MRI screening strategies, existing studies indicate poor predictive ability for MRI screening criteria, resulting in inefficient selection processes. Routine MRI screening thus generates significant economic pressure, strains healthcare resources, and may exacerbate patient and family anxiety due to overdiagnosis and uncertain clinical implications. 30
Studies have consistently shown that CMS often presents with left-sided thoracic curves, multiple curve patterns, and severe sagittal imbalances, all of which are challenging to differentiate from other forms of scoliosis. Such atypical curve features, while distinguishable to some extent, are often difficult to identify based solely on visual inspection or traditional radiological methods due to the overlap with other scoliosis subtypes. This is where our model plays a pivotal role. By leveraging advanced image classification techniques, our deep learning-based model is capable of identifying subtle and complex patterns in radiographs that are difficult to distinguish using conventional methods. These atypical scoliosis patterns, such as those commonly found in CMS, represent a crucial area where machine learning can significantly enhance early detection and diagnosis, providing a more accurate and efficient tool for clinicians to differentiate between AIS and neurogenic forms of scoliosis.
Fine-grained image classification has emerged as a promising technique to address the diagnostic challenges mentioned above. The task is challenging due to minimal inter-class differences, substantial intra-class variations, and factors such as perspective, background, and occlusion. The classification strategies can be generally categorized into several approaches. The first is the generic deep convolutional neural network (DCNN) method, represented by our ResNet-50 Corona l, Sagittal, and Dual models, which have demonstrated excellent performance across various image classification tasks. Another approach involves localization-based methods that identify distinctive local regions within images; these include supervised methods requiring detailed annotations and weakly supervised methods using only category labels. 31 A third approach is the network ensemble strategy, which combines multiple specialized DCNNs to enhance accuracy, exemplified by our ResNet-50 Concat model. Lastly, higher-order feature encoding techniques, such as the bilinear model used in our ResNet-50 Bilinear approach, transform DCNN-extracted features into higher-dimensional descriptors, capturing more complex image patterns.
The backbone network used in this study, ResNet-50, is a widely utilized convolutional neural network structure known for its efficiency and superior performance due to residual connections that effectively mitigate gradient vanishing issues. To enhance the generalization capability and reduce overfitting, we diversified the limited training dataset through common data augmentation methods such as flipping, rotating, translating, and scaling. Additionally, to address dataset imbalance, we employed the Focal loss function, which emphasizes hard-to-classify positive samples, thereby improving performance on minority classes.
The diagnostic test results of this study demonstrate that the use of coronal X-ray images alone can achieve rapid and accurate diagnosis of scoliosis. When sagittal X-ray images are used alone or used in combination with coronal X-ray images, the performance of the model is actually reduced, indicating that compared to providing supplementary information, sagittal X-ray images bring more noise into the task of scoliosis screening. In diagnosing CMS patients, the ResNet-50 Dual and ResNet-50 Coronal models achieved levels comparable to or even exceeding that of senior spine surgery experts. However, although these models surpassed 2 human experts in confirmatory indicators (specificity and PPV), effectively reducing the occurrence of false positives, they only exceeded mid-career spinal surgeons in screening indicators (sensitivity and NPV) and matched the level of senior spinal surgeons, indicating a significant risk of missed diagnoses. Therefore, further optimization of the model’s performance is needed in the future.
Due to sample size limitations, this study did not attempt to use models with a larger number of parameters. Increasing the sample size and trying the application of multiple models might achieve better diagnostic results. The current model currently focuses specifically on detecting atypical scoliosis patterns associated with CMS. While the model demonstrates promising results for CMS, its applicability to other neuroaxial abnormalities remains limited. Future work should aim to enhance the generalization ability of the model, allowing it to accurately recognize atypical scoliosis patterns associated with other neuroaxial abnormalities, such as cerebral palsy, tethered cord syndrome, and spinal cord tumors. Expanding the dataset and incorporating a wider range of neuroaxial conditions will be crucial for improving the model’s performance and ensuring its broader clinical utility. This study is a single-center research, and future work should include external test sets to further test the model’s classification performance.
Conclusion
This is the first study using a fine-grained image classification algorithm based on the ResNet-50 network for identification of atypical scoliosis associated with CMS by X-ray images. Using a generic DCNN approach, a strategy based on network ensemble, and a higher-order encoding strategy of convolutional features, we constructed 5 models for scoliosis screening and CMS diagnosis. Among them, the ResNet-50 Dual and ResNet-50 Coronal models achieved favorable classification results, with heatmaps indicating that image features such as atypical curve types hold significant weight in the classification task. Therefore, spinal surgeons, combining the diagnostic results of the model with patients’ clinical data, can hope to reduce the rates of missed diagnoses of CMS, enabling early intervention to halt the progression of scoliosis. Moreover, by providing a more accurate and efficient screening tool, this model could help reduce the over-reliance on MRI scans, saving healthcare resources and improving clinical decision-making.
Footnotes
Author Note
The manuscript submitted does not contain information about medical device(s)/drug(s). No relevant financial activities outside the submitted work.
Author Contributions
Z.Z.Z., A.Y.L., X.D.Q. initiated the project and the collaboration. Y.C., Z.H., K.G.Y. developed the network architectures, training, and testing setup. Z.L., J.C.C., W.Y.L. designed the clinical setup. E.C.C., Y.Q., X.P.C. created the data set and defined clinical labels. N.L., X.L.L. contributed to the software engineering. Z.H., X.L.L. contributed algorithmic expertise. Y.C., Z.H., Z.Z.Z., A.Y.L. wrote the paper. Y.C., Z.H., K.G.Y have access to and verify the underlying study data.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Key R&D Program of China (2023YFC2507700), the Natural Science Foundation of Jiangsu Province (BK20230147), the AO Spine Asia Pacific National Research Grant (AOSRG2024023), the China Postdoctoral Science Foundation (2022M711581), the Nanjing Medical Science and Technology Development Foundation (YKK22098), and the Jiangsu Provincial Key Research and Development Program (BE2023658).
Ethical Statement
Data Availability Statement
The source code and trained models of ResNet-50 can be requested from the corresponding author upon reasonable request. Images and sensitive patient privacy information will not be disclosed. Additionally, access to this data will require signing a data transfer or access agreement, subject to case-by-case review by Nanjing Drum Tower Hospital.
