Sage Journals: Discover world-class research

Abstract

Objective

This systematic review evaluates the performance and limitations of AI-based models for Degenerative cervical diseases (DCD) diagnosis using MRI.

Methods

A comprehensive literature search was conducted in three databases—PubMed, Embase, and Web of Science—covering studies published between January 2010 and March 2024. Studies were included if they employed AI techniques for the diagnosis or prognosis of DCD using MRI. Key performance metrics, methodological details, and limitations were extracted and analyzed.

Results

Eleven studies met the inclusion criteria, with AI models showing high diagnostic performance. Accuracy ranged from 81.58% to 98%, sensitivities from 84% to 98%, specificities from 90% to 100%, and AUC values reached up to 0.97. Convolutional neural networks (CNN) were the most frequently used models (four studies), followed by support vector machines (three studies). Comparative analysis revealed that CNN-based approaches showed consistently high performance in ossification of the posterior longitudinal ligament detection, while traditional machine learning methods demonstrated varying effectiveness in cervical spondylotic myelopathy classification. Sample sizes varied significantly, ranging from 28 to 900 patients. MRI protocols also differed across studies, with variations in field strengths, slice thicknesses, and sequences used. Seven studies assessed inter-rater reliability. Most studies lacked external validation, which raises concerns about the generalizability of the models. Additionally, hardware configurations were inconsistently reported, and data augmentation techniques were underutilized, limiting the robustness of the models in smaller datasets.

Conclusion

While AI models for DCD diagnosis using MRI show high diagnostic potential, methodological weaknesses such as insufficient external validation and small sample sizes hinder broader clinical adoption. Future research should focus on larger, standardized, multi-center studies to improve the robustness and clinical relevance of AI-driven tools for DCD diagnosis.

Keywords

Artificial intelligence degenerative cervical disease MRI diagnosis deep learning machine learning

Introduction

Degenerative cervical diseases (DCD), including cervical spondylotic myelopathy (CSM), ossification of the posterior longitudinal ligament (OPLL), and cervical spinal stenosis, are common age-related disorders that lead to considerable neurological disabilities and a decline in health-related quality of life.¹ Magnetic resonance imaging (MRI) is the primary imaging modality for diagnosing and assessing the severity of DCD due to its superior ability to provide soft tissue contrast and detailed visualization of spinal structures.² However, traditional MRI interpretation in DCD diagnosis faces several significant challenges. Inter-observer variability is a major concern, as different radiologists may interpret the same MRI images differently, leading to inconsistent diagnoses.³ The complexity of multi-level pathologies in DCD often makes it difficult to accurately assess the overall severity and pinpoint the primary problematic areas.⁴ Furthermore, subtle changes in early-stage DCD can be easily overlooked, potentially delaying crucial early interventions.⁵ The detailed examination of multi-sequence MRI scans for DCD is also time-consuming, which can delay diagnosis and treatment.⁶ These challenges, combined with the difficulty in early-stage diagnosis, can lead to delayed or inaccurate diagnoses, potentially compromising patient care and outcomes.

Artificial intelligence (AI) models are designed to address these issues by providing quantitative and objective assessments, thereby enhancing diagnostic consistency.⁷ AI, particularly deep learning and machine learning, has developed very fast in recent years and has been in the vanguard of enhancement in the accuracy and effectiveness of medical imaging-based diagnostics.⁸ AI-based techniques are highly successful in various medical image analyses, such as segmentation, classification, and the detection of pathological features from multiple imaging modalities.^9,10 Recently, researchers have focused on developing AI models that can predict the diagnosis or severity of DCD based on MRI analysis of the cervical spine.¹¹ The present research was focused on the performance of AI-driven models in the workflow of the radiologist and clinician, facilitating timely and accurate diagnoses of DCD and enabling early intervention through personalized treatment strategies.¹² Additionally, they provide objective and quantitative descriptions of the disease and its stages, complementing the subjective and qualitative assessments of human experts for improved readability.^13,14

Given the increasing focus on AI applications in DCD diagnosis and prognosis, it is essential that these models are both explainable and interpretable so that human experts can understand and trust the decision-making processes of these models. This interpretability is crucial for fostering effective collaboration between AI systems and healthcare professionals in patient care.¹⁵ Despite this growing interest, there remains a significant lack of comprehensive evaluations of existing AI models in this specific domain, which limits the identification of effective models and the understanding of challenges that must be addressed for clinical translation. To address this, our systematic review aims to critically evaluate the performance, methodological quality, and limitations of current AI-based diagnostic models for DCD using MRI data. Using the PROBAST tool,¹⁶ it will assess specific diseases, modeling techniques, and performance metrics while evaluating the methodological quality and risk of bias (RoB) of the included studies. By synthesizing available evidence, this review seeks to fill a crucial gap in the literature, providing clinicians and researchers with valuable insights that will advance the development of clinically applicable AI tools, improve patient outcomes, and inform future research directions in enhancing diagnostic accuracy and patient care in DCD.

Materials and methods

Literature search

This systematic review did not involve direct patient participation or intervention. All data were extracted from previously published studies that had obtained appropriate ethical approvals. Therefore, ethical approval and patient consent were not required for this review. The scope of this review was confined to studies focusing on the diagnostic performance of AI models in DCD using MRI. The review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.¹⁷ Studies were included based on the following criteria: Published in English between 1 January 2010, and 30 March 2024. Employed AI techniques for the assisted diagnosis or prognosis of DCD using MRI. AI techniques considered included, but were not limited to, machine learning algorithms (e.g., support vector machines, random forests), deep learning models (e.g., convolutional neural networks, recurrent neural networks), and other computational intelligence methods. Utilized MRI data for analysis. Acceptable MRI parameters included various sequences (e.g., T1-weighted, T2-weighted, STIR) and field strengths (e.g., 1.5T, 3T), without restrictions on specific imaging protocols. Focused on DCD, including but not limited to CSM, OPLL, and cervical disc degeneration. Reported quantitative performance metrics for the AI models (e.g., accuracy, sensitivity, specificity, AUC).

To identify relevant studies, we conducted comprehensive searches of Web of Science, PubMed, and Embase databases, focusing on four groups of search terms: (a) AI techniques (e.g., artificial intelligence, deep learning), (b) diagnosis-related terms (e.g., segmentation, prediction), (c) MRI terminology (e.g., T1-weighted, diffusion-weighted imaging), and (d) degenerative cervical disease terms (e.g., cervical spondylosis, ossification of the posterior longitudinal ligament). These groups ensured a broad capture of literature addressing AI applications in MRI-based DCD diagnosis. The complete search query used in the different databases is shown in Supplementary File 1.

After exclusion of duplicates, the titles and abstracts were screened, and only relevant publications proceeded to full-text screening. The decision as to whether a study met the inclusion criteria of the review was performed by two authors (D.Q. and S.X.X.) without the use of automated tools. A third author (C.G.R.) acted as a referee in case of a potential disagreement between the two authors responsible for screening. All articles that did not focus on the use of AI techniques for aided diagnosis in patients with DCD were excluded at the full-text screening stage.

Data extraction

Two authors (D.Q. and S.X.X.) independently extracted data and discussed any discrepancies. Data were extracted with regard to:

Study and clinical parameters: authors, title, year, design, number of patients in training/test set, development/Test Split, ground truth, inter-/intra-rater variability, task, specific diseases, conflict of interest, sources of funding.

Imaging parameters: MRI machine, number of images, field strength, slice thickness, sequences.

AI parameters: algorithm, dimensionality, training duration and hardware, libraries/frameworks/packages, data augmentation, performance measures, explainability/interpretability features, code/data availability.

Quality assessment

PROBAST is a comprehensive tool designed to assess the RoB and applicability concerns in diagnostic model studies.¹⁶ It is structured around four main domains: Participants: Evaluates the selection of study participants; Predictors: Assesses how predictors are managed and measured; Outcome: Looks at how outcomes are defined and determined; Analysis: Reviews the statistical analysis approaches used. Each domain is evaluated through several signaling questions, which guide the user in identifying potential issues. The RoB for each article is classified as Low RoB, High RoB, or Unclear RoB. The assessment is based on the following criteria:

Low Rob: If no relevant shortcomings were identified in the RoB assessment—that is, all domains had low RoB.

High RoB: If a model is developed without any external validation on different participants, downgrading to high RoB should still be considered even if all four domains had low RoB, unless the model development was based on a very large data set or included some form of internal validation.

Unclear RoB: If unclear RoB was noted in at least 1 domain and all other domains had low RoB.

Two authors (D.Q. and S.X.X.) independently assessed the RoB concerns of the included studies using the PROBAST tool. Their independent results were then compared, and any discrepancies were resolved through discussion. A third author (C.G.R.) acted as a referee in case of a potential disagreement between the two authors responsible for the quality assessment. The quality assessment process was conducted without the use of automated tools to ensure a thorough and comprehensive evaluation of each included study.

Results

The inclusion workflow is illustrated in Figure 1. The query yielded 69, 37, and 70 publications from the Web of Science, PubMed, and Embase databases, respectively. After removing 63 duplicate records, 113 publications remained. During the screening of records, 102 articles were excluded. A comprehensive list of the excluded articles and their respective reasons for exclusion is provided in Supplementary File 2. Specifically, three and eight articles were excluded as they were literature reviews and conference reviews, respectively.^18–28 Additionally, 85 publications were excluded due to their irrelevance to the topic,^29–113 while two articles were excluded because they were retracted.^114,115 Furthermore, two animal studies were also excluded, along with one study investigating multiple anatomical sites and 1 study involving cadavers.^116–119 After exclusion, 11 articles are remained and the key data are shown in Table 1.^120–130 Detailed extracted data from the 11 included articles can be found in Supplementary File 3a, b and c.

Figure 1.

Inclusion workflow diagram according to PRISMA 2020.¹⁷

Table 1.

Key characteristics of the selected research articles.

Author, year	Design	Patients	Task	Algorithm	Performance
Wang, 2015¹²⁸	Retrospective	74	Identification of myelopathic levels	SVM, STM, NB	STM: Accuracy: 93.62%, Sensitivity: 84.62%, Specificity: 97.06% NB: Accuracy: 83.51%, Sensitivity: 66.47%, Specificity: 90.31% SVM: Accuracy: 83.27%, Sensitivity: 59.39%, Specificity: 96.48%
Wang, 2017¹²⁶	Retrospective	150	Classification of CSM	SVM, STM, NB	Sensitivity: 93.41%, Specificity: 98.64%, Accuracy: 95.73%
Hopkins, 2019¹²⁰	Post hoc pilot	28	Prediction of CSM diagnosis	DNN	AUC: 0.947, Sensitivity: 90.25%, Specificity: 85.05%, PPV: 81.58%, NPV: 91.94%
Merali, 2021¹²²	Retrospective	289 (9579 images)	Detection of cervical spinal cord compression	CNN (ResNet-50)	AUC: 0.94, Sensitivity: 0.88, Specificity: 0.89, F1-score: 0.82
Shemesh, 2023¹²⁷	Retrospective	900	Detection of cervical OPLL	CNN (VGG16)	Sensitivity: 85%, Specificity: 98%, NPV: 98%, PPV: 85%, Accuracy: 98%, Cohen's kappa: 0.917
Jardon, 2023¹²¹	Retrospective	41	Evaluation of cervical foraminal and central stenosis	AIR Recon DL	Foraminal stenosis: kappa (2D) = 0.76, kappa (3D) = 0.81 Central Stenosis: kappa (2D) = 0.85, kappa (3D) = 0.83
Wang, 2024¹²⁹	Retrospective	114	Classification of CSM	LR, SVM, XGBoost, Light GBM	Accuracy: 81.58%, AUC: 0.85
Kim, 2024¹²³	Retrospective	418 (9737 images)	Assessment of cervical spinal canal stenosis	CycleGAN	Rater 1: CycleGAN + T2 TSE, Weighted kappa = 0.61; Original FFE + T2 TSE, Weighted Kappa = 0.48 Rater 2: CycleGAN + T2 TSE, Weighted kappa = 0.62, Original FFE + T2 TSE, Weighted kappa = 0.51 Rater 3: CycleGAN + T2 TSE, Weighted Kappa = 0.75, Original FFE + T2 TSE, Weighted kappa = 0.71
Qu, 2024¹²⁴	Retrospective	684	Detection of cervical OPLL	CNN (ResNet-34, 50, 101)	ResNet34: Accuracy: 92.98%, Sensitivity: 83.82%, Specificity: 99.03%, AUC: 0.914 (95% CI: 0.906–0.922) ResNet50: Accuracy: 95.32%, Sensitivity: 88.24%, Specificity: 100%, AUC: 0.942 (95% CI: 0.935–0.950) ResNet101: Accuracy: 97.66%, Sensitivity: 94.12%, Specificity: 100%, AUC: 0.971 (95% CI: 0.966–0.976)
Xie, 2024¹³⁰	Retrospective	435 (2610 samples)	Classification of cervical disc degeneration	DT, RF, SVM, XGBoost	Dice Coefficient: 0.93, AUC: 0.95, Accuracy: 89.51%, Precision: 87.07%, Recall: 98.83%, F1 Score: 0.93
Niemeyer, 2023¹²⁵	Retrospective	873 (5182 segments)	Classification of cervical degenerative phenotypes	CNN	Black Disc: Cohen's kappa: 0.681, Sensitivity: 73% Precision for ‘partial’ and ‘yes’ classes: Less than 70% Disc Displacement: Sensitivity for ‘anterior’ and ‘posterior’: Greater than 80%, Cohen's kappa for ‘anterior’: 0.638, Cohen's kappa for ‘posterior’: 0.741 Disc Space Narrowing: Cohen's kappa: 0.722, Sensitivity: 72.6% End Plate Abnormality: Cohen's kappa: 0.271, Sensitivity: 48% Osteophytes: Cohen's kappa for ‘anterior superior’: 0.493

Abbreviations: OPLL, Ossification of the posterior longitudinal ligament; CSM, Cervical spondylotic myelopathy; CI:, Confidence interval; DTI, Diffusion Tensor Imaging; LR, Logistic regression; SVMs, Support Vector Machines; DT, Decision Tree; RF, Random Forest; AUC, Area under the receiver operating characteristic curve; STMs, Support Tensor Machine; NB, Naive Bayes; CNNs, Convolutional Neural Networks; NPV, Negative Predictive Value; PPV, Positive Predictive Value; CycleGAN, Cycle Generative Adversarial Network; DNN, Deep Neural Network.

Study characteristics

Among the 11 included studies, 10 were retrospective, while one was a post hoc pilot study.¹²⁰ Sample sizes varied widely, from 28 patients to 900 patients.^120,127 Eight studies were published after 2020, indicating a recent surge in research interest. The studies covered various DCDs, including CSM,^{120,122,126,128,129} OPLL,^124,127 spinal canal stenosis,^121,123 and cervical disc degeneration.^125,130

Diagnostic performance of AI models

AI models demonstrated high diagnostic performance across studies, although the performance metrics varied (Table 1). Six studies reported accuracy,^{120,122,124,126,128–130} with values ranging from 81.58%¹²⁰ to 98%.¹²⁷ Area under the receiver operating characteristic curve (AUC) values were reported in five studies,^{120,122,124,129,130} ranging from 0.85 to 0.971. Sensitivity and specificity were reported in seven studies,^{122–124,126–129} with sensitivity ranging from 84.62% to 98.83%, and specificity from 90% to 100%. Other metrics, such as precision, recall, and F1-score, were only reported in three studies.^122,125,130

CSM

Five studies focused on CSM.^{120,122,126,128,129} Accuracy was reported in four studies, ranging from 80% to 95.73%. AUC values were reported in three studies, with the highest being 0.947.¹²⁰ Sensitivity and specificity varied across studies, with the highest sensitivity being 93.41%, and specificity reaching 98.64%.¹²⁶ Only one study assessed inter-rater reliability.¹²⁹

OPLL

Two studies focused on OPLL detection.^124,127 Qu et al.¹²⁴ reported an accuracy of 97.66%, while Shemesh et al.¹²⁷ reported 98%. AUC was only reported by Qu et al.¹²⁴ (0.971). Sensitivity and specificity ranged from 85%¹²⁷ to 100%.^120,124 Both the studies reported inter-rater reliability.

Cervical disc degeneration

Two studies addressed cervical disc degeneration.^125,130 Xie et al.¹³⁰ reported an accuracy of 89.51% and an AUC of 0.95. Niemeyer et al.¹²⁵ did not report accuracy or AUC but provided Cohen's kappa for classification tasks, achieving 0.722. Neither study provided sensitivity or specificity, and only Niemeyer et al. assessed inter-rater reliability.

Spinal canal stenosis

Two studies focused on spinal canal stenosis.^121,123 Neither reported accuracy or AUC values. Both assessed inter-rater reliability, with Jardon et al.¹²¹ reporting an increase in Cohen's kappa from 0.76 to 0.81, and Kim et al.¹²³ showing improvements in weighted kappa values from 0.48–0.71 to 0.61–0.75.

Comparative analysis of AI models across different DCD types

A comparative analysis of different AI models across DCD types is shown in Figure 2, with detailed performance metrics presented in Table 1. Among the CSM studies, Wang et al.¹²⁶ and Wang et al.¹²⁹ conducted classification tasks using different machine learning approaches. The traditional machine learning combination (SVM/STM/NB)¹²⁶ achieved higher accuracy, sensitivity, and specificity compared to the newer ensemble method,¹²⁹ which reported accuracy and AUC values. In OPLL detection studies, Qu et al.¹²⁴ systematically compared different CNN architectures, showing that deeper networks (ResNet-101) achieved better performance in all metrics (accuracy, sensitivity, specificity, and AUC) than shallower ones (ResNet-34). Additionally, Shemesh et al.¹²⁷ employed VGG16 network, achieving comparable high performance (accuracy: 98%, sensitivity: 85%, specificity: 98%). For cervical disc degeneration, two studies focused on different aspects with distinct evaluation metrics. Xie et al.¹³⁰ employed a combination of traditional machine learning methods (DT/RF/SVM/XGBoost) for general degeneration classification, reporting accuracy of 89.51% and AUC of 0.95. While Niemeyer et al.¹²⁵ used CNN for specific degenerative phenotype classification, evaluating performance with Cohen's kappa values for different phenotypes (ranging from 0.271 to 0.741). Unlike other DCD types, spinal canal stenosis studies by Jardon et al.¹²¹ and Kim et al.¹²³ focused primarily on improving inter-rater reliability, using kappa and weighted kappa values respectively to assess their approaches.

Figure 2.

Performance comparison of AI models across different DCD types. Performance metrics include accuracy, sensitivity, specificity, and area under the curve (AUC). In CSM classification, traditional machine learning methods were evaluated. OPLL detection employed both CNN-based approaches (ResNet-34, ResNet-101) and VGG16 network. For disc degeneration, two different approaches were used: combined machine learning methods for general classification and CNN for specific phenotype classification with different evaluation metrics (Cohen's kappa). For spinal canal stenosis studies, different evaluation metrics (kappa values) were used due to their focus on inter-rater reliability improvement.

Rob assessment

A wide range of AI methodologies was employed in the included studies, reflecting the exploratory nature of AI applications in diagnosing DCD (Table 1). Convolutional neural networks (CNNs) were the most commonly used method, appearing in four studies,^{122,124,125,127} followed by support vector machines (SVMs), which were utilized in three studies.^126,128,129 Additionally, some studies employed a combination of multiple machine learning algorithms, such as SVM, STM, Naive Bayes, and XGBoost, indicating a trend toward more diverse approaches in three studies.^126,128,129 Generative adversarial networks (GANs) were used in one study,¹²³ and support tensor machines (STMs) were implemented in two studies.^126,128 Regarding hardware configurations, only four studies specified the computational resources used.^{121,123–125} According to the PROBAST tool, six studies were assessed as having a low RoB,^{122,124–126,129,130} while four studies had an unclear RoB,^{121,123,127,128} and one study was rated as having a high RoB (Figure 3).¹²⁰ Detailed assessments are provided in Supplementary File 4.

Figure 3.

This chart provides a comprehensive summary of the risk of bias evaluations across the 11 studies. The left bar chart illustrates the overall risk of bias judgement and the right chart depicts the domain-specific risk of bias assessments.

Analysis of data heterogeneity across studies

Significant heterogeneity was observed in data acquisition and processing across studies (Supplementary File 3a, b, and c). MRI field strengths varied, with studies using either 1.5T,^120,124,125 3.0T,^{121,122,126–130} or both.^123,124 Slice thickness ranged from 0.7¹²¹ to 7.0 mm,^125,128 with most studies using 3.0–4.0mm. Imaging protocols differed, from single T2-weighted sequences^{121,122,125,127} to multiple sequences including T1-weighted, T2-weighted, and DTI.^126,128

Sample sizes varied substantially (28¹²⁰ to 900¹²⁷ patients), as did the number of analyzed images (up to 9737¹²³). Most studies employed 2D analysis,^{120,122–124,126,128,130} while some utilized 3D^121,129 or combined approaches.¹²⁵ Training and testing data distribution also showed considerable variation across studies, with different splitting ratios adopted for model development and validation.

Ground truth establishment

Ground truth labels were defined by experienced clinicians or radiologists in nine studies (Supplementary File 3a).^121–129 In seven of these, the ground truth was established by more than two experts,^{121–125,127,129} enhancing the reliability of the labels. In two studies, only one radiologist or surgeon determined the ground truth.^126,128 Two studies did not specify how ground truth labels were established, which may affect the interpretation of their results.^120,130

Limitations identified

A key limitation observed was the small sample sizes, such as Hopkins et al.¹²⁰ with 28 patients, which may impact generalizability. Only one study¹²² made code and data publicly available, limiting reproducibility. MRI protocol differences (machines, field strengths, slice thicknesses, and sequences) may have introduced variability, affecting image quality and model results. Most studies did not incorporate explainability features, and performance metric inconsistencies further hampered comparison across studies.

Summary of key findings

In summary, our review reveals that AI models demonstrated high diagnostic performance across studies, with accuracies ranging from 80%¹²⁰ to 98%,¹²⁷ and AUC values up to 0.971.¹²⁴ However, we also identified common limitations such as small sample sizes (ranging from 28 to 900 patients), lack of external validation in most studies, and inconsistent reporting of performance metrics. These findings highlight both the potential of AI in DCD diagnosis and the need for more robust, standardized research methodologies.

Discussion

Sample size and data splitting challenges

AI techniques for analyzing DCD using MRI data show significant promise, with sample sizes ranging from 28 to 900 patients. Data splitting approaches varied, including random splitting, cross-validation, and cases where no separate test set was used. Cross-validation offers a rigorous model performance estimate but is computationally expensive, particularly for larger datasets. Some studies applied stratified sampling to balance disease severity subtypes across training and test sets, though this approach was inconsistent across studies.^131,132 Stratified sampling remains crucial for balancing datasets, ensuring robust model evaluation.¹³³

Ground truth labeling and standardization

Variability in ground truth labeling—from expert annotations to automated methods—introduces trust and consistency issues during model training. Disagreements among clinicians highlight the need for standardized protocols and guidelines to improve label consistency and model reliability in cervical spine MRI.¹³⁴ Multiple expert annotations, adjudication steps, and interrater reliability checks can help ensure dependable ground truth data. However, the lack of standardized evaluation protocols and benchmark datasets hinders objective comparisons. Publicly available benchmark datasets and shared tasks would foster a more competitive and replicable research landscape. Collaboration among clinicians, radiologists, and AI developers is essential to ensure clinical relevance, interpretability, and trust in AI systems.

AI techniques and comparative analysis

A variety of AI methodologies were used, with CNNs being the most common,^{122,124,125,127} followed by SVMs,^126,128,129 GANs,¹²³ and decision trees. In CSM classification, traditional machine learning approaches showed varying performance levels,^126,129 which might be attributed to differences in sample sizes and feature selection strategies. For OPLL detection, CNN-based approaches consistently demonstrated high performance, with both deeper architectures (ResNet-101) and VGG16 network showing superior results,^124,127 suggesting that hierarchical feature learning is particularly suited for detecting complex anatomical changes. The diversity in approaches for cervical disc degeneration and spinal canal stenosis reflects the complexity of these conditions,^{121,123,125,130} highlighting the importance of matching model architecture to specific diagnostic requirements.

The generalizability of these findings is primarily limited by methodological considerations. Most studies focused on diagnostic accuracy without considering other important clinical factors such as inference time and model interpretability. Moreover, the lack of standardized evaluation metrics makes it challenging to conduct comprehensive model comparisons across studies.

Impact of data heterogeneity

Technical heterogeneity in data acquisition poses unique challenges for AI applications in DCD diagnosis. Specific variations in MRI parameters, particularly field strengths (1.5T vs. 3.0T) and slice thickness (ranging from 0.7 to 7.0mm), directly affect image quality and feature representation.^131,132 This imaging protocol diversity necessitates careful consideration in model development and validation processes.

The diversity in study scale also significantly impacts model development. While some studies utilized large datasets with thousands of images,^123,127 others were limited by smaller sample sizes.¹²⁰ These differences in data availability influence not only model training strategies but also the reliability of performance evaluations. Future multi-center studies with standardized imaging protocols are essential to establish more reliable benchmarks for model performance.^133,134

Clinical relevance and implementation challenges

Our review reveals promising potential for AI applications in DCD diagnosis using MRI, with several studies reporting high diagnostic accuracies. There is a predominant focus on CSM,^{120,122,126,128,129} reflecting its clinical significance, while other conditions like OPLL also show promising results.^124,127 Qu et al.¹²⁴ achieved 97.66% accuracy in OPLL detection, while Shemesh et al.¹²⁷ reported 98% accuracy. This focus on CSM, while important, also highlights a research gap for other DCD subtypes. AI models have demonstrated potential in diagnosis, offering objective and quantitative assessments. However, their current clinical relevance is limited by the variability in methodologies, which affects comparability between studies. To truly impact clinical practice, AI models must be further refined and standardized.

Methodological quality and external validation

The PROBAST RoB assessment provides valuable insights into the methodological quality and limitations of the included studies. It highlights common study design weaknesses and areas for improvement. Notably, some diagnosis models that were deemed low risk across all elements lacked external validation, underscoring the importance of externally validating models to ensure their performance and generalizability. This finding reinforces the need for future studies to follow established reporting guidelines, such as the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement, to enhance transparency and reproducibility while minimizing potential bias due to incomplete reporting.¹³⁵ In the medical domain, clinician trust in AI models is paramount. Statistical correlations and models alone are insufficient if clinicians do not trust the outputs. For AI models to be trusted and adopted, they must provide transparency, allowing clinicians to understand why certain outputs are generated.^136,137 Techniques such as computer-aided transfer learning can highlight regions of anatomy that are crucial to the model's decision-making process, giving clinicians insights into areas that may warrant further review.^138,139 However, most studies in this review did not incorporate explainability features, which remains a significant limitation in delaying clinical adoption. The development of interpretable AI systems will be critical for fostering clinician trust and ensuring these tools are effectively integrated into decision-making processes.

Limitations and potential biases

Variations in AI algorithms, MRI protocols, performance metrics, and patient populations complicate direct comparisons and limit the potential for meta-analysis. This diversity reflects the early stage of AI applications in DCD diagnosis using MRI and highlights the urgent need for standardized methodologies. The wide range of sample sizes across studies (28‒900 patients) also introduces potential selection bias, with smaller studies risking underpowered results and larger ones facing computational challenges and class imbalance issues.^131,132 Publication bias is another concern, as studies with positive results are more likely to be published, potentially leading to an overestimation of AI's effectiveness in DCD diagnosis. Inconsistencies in ground truth labeling and the lack of standardized evaluation protocols further hinder objective comparisons across studies. Additionally, many studies lacked external validation, raising concerns about the generalizability of AI models.^133,134 Despite these limitations, this systematic review provides a valuable overview of the current research landscape. It highlights existing gaps, methodological challenges, and the critical need for standardization, offering a foundation for future studies in this promising field.

Future research directions

This review identifies critical gaps in AI applications for DCD diagnosis that require urgent attention. There is a pressing need for standardized protocols for data collection, ground truth labeling, and model evaluation, along with the creation of public benchmark datasets. Many studies also lack robust external validation, raising concerns about the generalizability of AI models; future research should prioritize this aspect.^133,134

While CSM has been well-studied,^{120,122,126,128,129} other DCD subtypes require further exploration. Enhancing model interpretability is crucial for fostering clinician trust and adoption, as most studies have not incorporated explainability features.^136–139 Additionally, integrating MRI data with other clinical information and biomarkers through multimodal approaches could enhance the accuracy of AI models.^140,141 The development of AI systems for early detection and diagnosis remains a priority, as this could significantly impact patient outcomes through timely intervention. Future research may also explore the potential of AI in understanding disease patterns and subtypes, which could enhance our understanding of DCD progression. By addressing these areas systematically, the field can advance toward more reliable and clinically applicable AI tools for DCD diagnosis.

Conclusions

AI has significant potential to enhance DCD diagnosis using MRI, particularly for early detection and personalized treatment. However, challenges such as the need for larger, multi-center studies, robust external validation, and improved model interpretability hinder widespread adoption. Addressing these issues through collaboration between AI developers and clinicians is essential for ensuring the clinical relevance and usability of these models. By focusing on these areas, the integration of AI into clinical practice can be more effectively achieved.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076241311939 - Supplemental material for Artificial intelligence in degenerative cervical disease: A systematic review of MRI-based diagnostic models

Supplemental material, sj-docx-1-dhj-10.1177_20552076241311939 for Artificial intelligence in degenerative cervical disease: A systematic review of MRI-based diagnostic models by Qian Du, Xinxin Shao, Minbo Zhang and Guangru Cao in DIGITAL HEALTH

Supplemental Material

sj-docx-2-dhj-10.1177_20552076241311939 - Supplemental material for Artificial intelligence in degenerative cervical disease: A systematic review of MRI-based diagnostic models

Supplemental material, sj-docx-2-dhj-10.1177_20552076241311939 for Artificial intelligence in degenerative cervical disease: A systematic review of MRI-based diagnostic models by Qian Du, Xinxin Shao, Minbo Zhang and Guangru Cao in DIGITAL HEALTH

Supplemental Material

sj-docx-3-dhj-10.1177_20552076241311939 - Supplemental material for Artificial intelligence in degenerative cervical disease: A systematic review of MRI-based diagnostic models

Supplemental material, sj-docx-3-dhj-10.1177_20552076241311939 for Artificial intelligence in degenerative cervical disease: A systematic review of MRI-based diagnostic models by Qian Du, Xinxin Shao, Minbo Zhang and Guangru Cao in DIGITAL HEALTH

Supplemental Material

sj-docx-4-dhj-10.1177_20552076241311939 - Supplemental material for Artificial intelligence in degenerative cervical disease: A systematic review of MRI-based diagnostic models

Supplemental material, sj-docx-4-dhj-10.1177_20552076241311939 for Artificial intelligence in degenerative cervical disease: A systematic review of MRI-based diagnostic models by Qian Du, Xinxin Shao, Minbo Zhang and Guangru Cao in DIGITAL HEALTH

Supplemental Material

sj-docx-5-dhj-10.1177_20552076241311939 - Supplemental material for Artificial intelligence in degenerative cervical disease: A systematic review of MRI-based diagnostic models

Supplemental material, sj-docx-5-dhj-10.1177_20552076241311939 for Artificial intelligence in degenerative cervical disease: A systematic review of MRI-based diagnostic models by Qian Du, Xinxin Shao, Minbo Zhang and Guangru Cao in DIGITAL HEALTH

Supplemental Material

sj-docx-6-dhj-10.1177_20552076241311939 - Supplemental material for Artificial intelligence in degenerative cervical disease: A systematic review of MRI-based diagnostic models

Supplemental material, sj-docx-6-dhj-10.1177_20552076241311939 for Artificial intelligence in degenerative cervical disease: A systematic review of MRI-based diagnostic models by Qian Du, Xinxin Shao, Minbo Zhang and Guangru Cao in DIGITAL HEALTH

Footnotes

Acknowledgments

The authors express my sincere gratitude for the unwavering support from my family members.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Science and Technology Plan Projects of Zunyi City, the Science and Technology Fund Project of Guizhou Provincial Health Commission, the Basic Research Program of Guizhou Provincial Department of Science and Technology (grant numbers: Zunshi Kehe HZ [2024] No. 432, gzwkj2022-480, Qiankehe Foundation-ZK [2024] General-347).

Registration information

Registration ID: CRD42024594379.

Link:

ORCID iD

Qian Du

Supplementary materials

The Supplementary Materials are uploaded with the submission.

References

Nouri

Tetreault

Singh

, et al. Degenerative cervical myelopathy: epidemiology, genetics, and pathogenesis. Spine 2015; 40: E675–E693.

Cohen

Giauque

Hallam

, et al. Evidence-based approach to use of MR imaging in acute spinal trauma. Eur J Radiol 2003; 48: 49–60.

Crockett

Wright

Burke

, et al. Idiopathic scoliosis: the clinical value of Radiologists’ interpretation of pre- and postoperative radiographs with interobserver and interdisciplinary variability. Spine 1999; 24: 2007.

Wiegand

Kettner

Brahee

, et al. Cervical spine geometry correlated to cervical degenerative disease in a symptomatic group. J Manipulative Physiol Ther 2003; 26: 341–346.

Gkasdaris

Chourmouzi

Karagiannidis

, et al. Spinal cord edema with contrast enhancement mimicking intramedullary tumor in patient with cervical myelopathy: a case report and a brief literature review. Surg Neurol Int 2017; 8: 11.

Sathiyamurthy

Waheeda

Sangeetha

. Histopathological study of cervical lesions in a tertiary health care centre in South India. Indian J Pathol Oncol 2021; 8: 447–451.

Ellingson

Salamon

Holly

. Imaging techniques in spinal cord injury. World Neurosurg 2014; 82: 1351–1358.

Litjens

Kooi

Bejnordi

, et al. A survey on deep learning in medical image analysis. Med Image Anal 2017; 42: 60–88.

Jimenez

Racoceanu

. Deep learning for semantic segmentation vs. classification in computational pathology: application to mitosis analysis in breast cancer grading. Front Bioeng Biotechnol 2019; 7: 45.

10.

Soomro

Zheng

Afifi

, et al. Image segmentation for MR brain tumor detection using machine learning: a review. IEEE Rev Biomed Eng 2023; 16: 70–90.

11.

Shi

Miao

Schoepf

, et al. A clinically applicable deep-learning model for detecting intracranial aneurysm in computed tomography angiography images. Nat Commun 2020; 11: 6090.

12.

Galbusera

Casaroli

Bassani

. Artificial intelligence and machine learning in spine research. JOR Spine 2019; 2: e1044.

13.

Jarvik

Gold

Comstock

, et al. Association of early imaging for back pain with clinical outcomes in older adults. JAMA 2015; 313: 1143–1153.

14.

Perez-Cruet

Beeravolu

McKee

, et al. Potential of human nucleus pulposus-like cells derived from umbilical cord to treat degenerative disc disease. Neurosurgery 2019; 84: 272–283.

15.

Hatherley

Sparrow

Howard

. The virtues of interpretable medical AI. Camb Q Healthc Ethics 2023; 33: 1–10.

16.

Moons

KGM

Wolff

Riley

, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med 2019; 170: W1–W33.

17.

Page

McKenzie

Bossuyt

, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Br Med J 2021; 372: 71.

18.

Joint Annual Meeting 2017 Swiss Society of Neurosurgery, Swiss Society of Neuroradiology together with SSNR educational course 7th NeuroSpine Meeting, IG-NOPPS. Clin Neuroradiol 2017; 27: 2.

19.

Annual Scientific Meeting Abstracts of the International Skeletal Society, ISS 2018. Skeletal Radiol 2018; 47: 1315–1325.

20.

Courtemanche

Fortin

Fehlings

, et al. Objective quantitative assessment of cervical spondylotic myelopathy severity using preoperative MRI and machine learning. J Orthop Res 2017; 35: 504–512.

21.

Akbar

Martin

Cohen-Adad

, et al. Multiparametric quantitative MRI as an accurate diagnostic tool for myelopathy. Can J Surg 2019; 62: S47–S87.

22.

Cui

Zhu

Duan

, et al. Artificial intelligence in spinal imaging: current status and future directions. Int J Environ Res Public Health 2022; 19: 11708.

23.

Hohenhaus

Klingler

Hubbe

, et al. Innovative mesoscopic MRI diffusion parameters for characterization of cervical myelopathy-first results of a prospective observational trial. Eur Spine J 2020; 29: 2926–2927.

24.

Hori

Otsuka

Fukunaga

, et al. Prediction of damaged side by deep learning on quantitative MR metrics in patients with mild cervical spondylosis. Neuroradiology 2018; 60: 1135.

25.

Khan

Badhiwala

Wilson

JRF

, et al. Predictive modeling of outcomes after traumatic and nontraumatic spinal cord injury using machine learning: review of current progress and future directions. Neurospine 2019; 16: 678–685.

26.

Lee

. 189. Spinal stenosis grading in magnetic resonance imaging using deep convolutional neural networks. Spine J 2020; 20: S93–S94.

27.

Merali

Colak

Wilson

. Applications of machine learning to imaging of spinal disorders: current status and future directions. Global Spine J 2021; 11: 23S–29S.

28.

Nouri

Martin

Kato

, et al. Prediction of anterior versus posterior surgical approach for degenerative cervical myelopathy based on magnetic resonance imaging pathology: analysis of a global cohort. Can Med Assoc J 2018; 61: S76–S93.

29.

Abuhayi

Bezabh

Ayalew

. Lumbar disease classification using an involutional neural based VGG nets (INVGG). IEEE Access 2024; 12: 27518–27529.

30.

Ahammad

Rajesh

Indumathi

, et al. Identification of cervical spondylosis disease on spinal cord MRI image using convolutional neural network-long short-term memory (CNN-LSTM) technique. J Int Pharm Res 2019; 46: 109–124.

31.

Al-kubaisi

Khamiss

. A transfer learning approach for lumbar spine disc state classification. Electronics (Basel) 2022; 11: 27.

32.

Al-Kafri

Sudirman

Hussain

, et al. Boundary delineation of MRI images for lumbar spinal stenosis detection lough semantic segmentation using deep neural networks. IEEE Access 2019; 7: 43487–43501.

33.

Altun

Alkan

. LSS-UNET: lumbar spinal stenosis semantic segmentation using deep learning. Multimed Tools Appl 2023; 82: 41287–41305.

34.

Alan

Zenkin

Lavadi

, et al. Associating T1-weighted and T2-weighted magnetic resonance imaging radiomic signatures with preoperative symptom severity in patients with cervical spondylotic myelopathy. World Neurosurg 2024; 184: e137–e143.

35.

Altun

Alkan

Altun

. LSS-VGG16 Diagnosis of lumbar spinal stenosis with deep learning. Clin Spine Surg 2023; 36: E180–E190.

36.

Altun

Alkan

. LSS-Net: 3-dimensional segmentation of the spinal canal for the diagnosis of lumbar spinal stenosis. Int J Imaging Syst Technol 2023; 33: 378–388.

37.

Bharadwaj

Christine

, et al. Deep learning for automated, interpretable classification of lumbar spinal stenosis and facet arthropathy from axial MRI. Eur Radiol 2023; 33: 3435–3443.

38.

Belavy

Tagliaferri

Tegenthoff

, et al. Evidence- and data-driven classification of low back pain via artificial intelligence: protocol of the PREDICT-LBP study. PLoS One 2023; 18: e0282346.

39.

Bidabadi

Murray

Lee

GYF

, et al. Classification of foot drop gait characteristic due to lumbar radiculopathy using machine learning algorithms. Gait Posture 2019; 71: 234–240.

40.

Chen

Liu

, et al. Automated magnetic resonance image segmentation of spinal structures at the L4-5 level with deep learning: 3D reconstruction of lumbar intervertebral foramen. Orthop Surg 2022; 14: 2256–2264.

41.

Czap

Sheth

. Overview of imaging modalities in stroke. Neurology 2021; 97: S42–S51.

42.

Fei

Wang

, et al. Deep learning-based auto-segmentation of spinal cord internal structure of diffusion tensor imaging in cervical spondylotic myelopathy. Diagnostics 2023; 13: 17.

43.

Gao

Tibrewala

Hess

, et al. Automatic detection and voxel-wise mapping of lumbar spine modic changes with deep learning. JOR Spine 2022; 5: 10.

44.

Gaonkar

Cook

Yoo

, et al. Imaging biomarker development for lower back pain using machine learning: how image analysis can help back pain. Methods Mol Biol 2022; 2393: 623–640.

45.

Georgiev

Novakova

Bliznakova

. Clinical assessment of CoLumbo deep learning system for central canal stenosis diagnostics. Eurasian J Med Oncol 2023; 7: 42–48.

46.

Gaonkar

Villaroman

Beckett

, et al. Quantitative analysis of spinal canal areas in the lumbar spine: an imaging informatics and machine learning study. AJNR Am J Neuroradiol 2019; 40: 1586–1591.

47.

Ghosh

Chaudhary

. Supervised methods for detection and segmentation of tissues in clinical lumbar MRI. Comput Med Imaging Graph 2014; 38: 639–649.

48.

Grob

Loibl

Jamaludin

, et al. External validation of the deep learning system “SpineNet” for grading radiological features of degeneration on MRIs of the lumbar spine. Eur Spine J 2022; 31: 2137–2148.

49.

Hallinan

Tan

Makmur

, et al. A226: improved productivity using deep learning assisted reporting for MRI lumbar spine. Global Spine J 2023; 13: 135S–214S.

50.

Hallinan

Zhu

Yang

, et al. Deep learning model for automated detection and classification of central canal, lateral recess, and neural foraminal stenosis at lumbar spine MRI. Radiology 2021; 300: 130–138.

51.

Han

Wei

Leung

, et al. Automated pathogenesis-based diagnosis of lumbar neural foraminal stenosis via deep multiscale multitask learning. Neuroinformatics 2018; 16: 325–337.

52.

Henmar

Simonsen

Berg

. What are the gray and white matter volumes of the human spinal cord? J Neurophysiol 2020; 124: 1792–1797.

53.

Huber

Stutz

De Martini

, et al. Performance of machine-learning for central lumbar spinal stenosis detection and grading. Skeletal Radiol 2018; 47: 1319–1325.

54.

Huber

Stutz

Vittoria de Martini

, et al. Qualitative versus quantitative lumbar spinal stenosis grading by machine learning supported texture analysis—experience from the LSOS study cohort. Eur J Radiol 2019; 114: 45–50.

55.

Ichinose

Hama

Tsuji

, et al. Predicting ischemic stroke after carotid artery stenting based on proximal calcification and the jellyfish sign. J Neurosurg 2018; 128: 1280–1288.

56.

Ishimoto

Jamaludin

Cooper

, et al. Could automated machine-learned MRI grading aid epidemiological studies of lumbar spinal stenosis? Validation within the Wakayama spine study. BMC Musculoskelet Disord 2020; 21: 6.

57.

Jamaludin

Lootus

Kadir

, et al. ISSLS PRIZE IN BIOENGINEERING SCIENCE 2017: automation of Reading of radiological features from magnetic resonance images (MRIs) of the lumbar spine without human intervention is comparable with an expert radiologist. Eur Spine J 2017; 26: 1374–1383.

58.

Jin

Luk

Cheung

JPY

, et al. Prognosis of cervical myelopathy based on diffusion tensor imaging with artificial intelligence methods. NMR Biomed 2019; 32: e4114.

59.

Kashiwagi

Sakai

Tsukabe

, et al. Ultrafast cervical spine MRI protocol using deep learning-based reconstruction: diagnostic equivalence to a conventional protocol. Eur J Radiol 2022; 156: 110531.

60.

Kim

Zhang

Cho

, et al. Differential screening of herniated lumbar discs based on bag of visual words image classification using digital infrared thermographic images. Healthcare 2022; 10: 1094.

61.

Kim

Park

, et al. Diagnostic triage in patients with central lumbar spinal stenosis using a deep learning system of radiographs. J Neurosurg SPine 2022; 37: 104–111.

62.

Laiwalla

Ratnaparkhi

Zarrin

, et al. Lumbar spinal canal segmentation in cases with lumbar stenosis using deep-U-net ensembles. World Neurosurg 2023; 178: E135–E140.

63.

Lee

Shin

Chang

. Deep learning algorithm to evaluate cervical spondylotic myelopathy using lateral cervical spine radiograph. BMC Neurol 2022; 22: 147.

64.

Luo

Huan

, et al. Automatic lumbar spinal MRI image segmentation with a multi-scale attention network. Neural Comput Appl 2021; 33: 11589–11602.

65.

Yang

Dou

, et al. Aided diagnosis of cervical spondylotic myelopathy using deep learning methods based on electroencephalography. Med Eng Phys 2023; 121: 104069.

66.

Liew

BXW

Kovacs

Rügamer

, et al. Automatic variable selection algorithms in prognostic factor research in neck pain. J Clin Med 2023; 12: 6232.

67.

Lim

DSW

Makmur

Zhu

, et al. Improved productivity using deep earning-assisted reporting for lumbar spine MRI. Radiology 2022; 305: 160–166.

68.

Lin

Chang

Hsiao

, et al. Development of a machine learning algorithm to correlate lumbar disc height on X-rays with disc bulging or herniation. Diagnostics 2024; 14: 34.

69.

Mannil

Burgstaller

Held

, et al. Correlation of texture analysis of paraspinal musculature on MRI with different clinical endpoints: lumbar stenosis outcome study (LSOS). Eur Radiol 2019; 29: 22–30.

70.

Mannil

Burgstaller

Thanabalasingam

, et al. Texture analysis of paraspinal musculature in MRI of the lumbar spine: analysis of the lumbar stenosis outcome study (LSOS) data. Skeletal Radiol 2018; 47: 947–954.

71.

Masad

Al-Fahoum

Abu-Qasmieh

. Automated measurements of lumbar lordosis in T2-MR images using decision tree classifier and morphological image processing. Eng Sci Technol Int J 2019; 22: 1027–1034.

72.

Miyo

Yasaka

Hamada

, et al. Deep-learning reconstruction for the evaluation of lumbar spinal stenosis in computed tomography. Medicine (Baltimore) 2023; 102: e33910.

73.

Mejia

Arroyave

Saturno

, et al. Use of ChatGPT for determining clinical and surgical treatment of lumbar disc herniation with radiculopathy: a North American spine society guideline comparison. Neurospine 2024; 21: 149–158.

74.

Natalia

Young

Afriliana

, et al. Automated selection of mid-height intervertebral disc slice in traverse lumbar spine MRI using a combination of deep learning feature and machine learning classifier. PLoS One 2022; 17: e0261659.

75.

Mourad

Kolisnyk

Baiun

, et al. Performance of hybrid artificial intelligence in determining candidacy for lumbar stenosis surgery. Eur Spine J 2022; 31: 2149–2155.

76.

Ohtake

Yasaka

Hamada

, et al. Effect of deep learning reconstruction on evaluating cervical spinal canal stenosis with computed tomography. J Comput Assist Tomogr 2023; 47: 996–1001.

77.

Nozawa

Maki

Furuya

, et al. Magnetic resonance image segmentation of the compressed spinal cord in patients with degenerative cervical myelopathy using convolutional neural networks. Int J Comput Assist Radiol Surg 2023; 18: 45–54.

78.

Pang

Lin

, et al. Automated measurement of spine indices on axial MR images for lumbar spinal stenosis diagnosis using segmentation-guided regression network. Med Phys 2023; 50: 104–116.

79.

Ost

Jacobs

Evaniew

, et al. Spinal cord morphology in degenerative cervical myelopathy patients; assessing key morphological characteristics using machine vision tools. J Clin Med 2021; 10: 1–18.

80.

Park

Yang

Park

, et al. Deep learning-based approaches for classifying foraminal stenosis using cervical spine radiographs. Electronics (Basel) 2023; 12: 15.

81.

Park

Kim

Ahn

, et al. Multi-pose-based convolutional neural network model for diagnosis of patients with central lumbar spinal stenosis. Sci Rep 2024; 14: 11.

82.

Prisilla

Guo

Jan

, et al. An approach to the diagnosis of lumbar disc herniation using deep learning models. Front Bioeng Biotechnol 2023; 11: 14.

83.

Rajjoub

Arroyave

Zaidat

, et al. ChatGPT and its role in the decision-making for the diagnosis and treatment of lumbar spinal stenosis: a comparative analysis and narrative review. Global Spine J 2024; 14: 998–1017.

84.

Rao

Desetty

Reddy

, et al. Detect and analyze the performance of lumbar spinal stenosis detection from MRI images by using semantic segmentation technique. J Cardiovasc Dis Res 2021; 12: 702–713.

85.

Roller

Boutin

O’Gara

, et al. Accurate prediction of lumbar microdecompression level with an automated MRI grading system. Skeletal Radiol 2021; 50: 69–78.

86.

Saravi

Zink

Ülkümen

, et al. Automated detection and measurement of dural sack cross-sectional area in lumbar spine MRI using deep learning. Bioeng-Basel 2023; 10: 1072.

87.

Zhao

Wang

, et al. Identification and therapeutic outcome prediction of cervical spondylotic myelopathy based on the functional connectivity from resting-state functional MRI data: a preliminary machine learning study. Front Neurol 2021; 12: 14.

88.

Liu

Yang

, et al. Automatic grading of disc herniation, central canal stenosis and nerve roots compression in lumbar magnetic resonance image diagnosis. Front Endocrinol (Lausanne) 2022; 13: 12.

89.

Tamai

Terai

Hoshino

, et al. Deep learning algorithm for identifying cervical cord compression due to degenerative canal stenosis on radiography. Spine 2023; 48: 519–525.

90.

Toyoda

Terai

Yamada

, et al. A decision tree analysis to predict clinical outcome of minimally invasive lumbar decompression surgery for lumbar spinal stenosis with and without coexisting spondylolisthesis and scoliosis. Spine J 2023; 23: 973–981.

91.

Travis Caton Jr.

Wiggins

Pomerantz

, et al. Effects of age and sex on the distribution and symmetry of lumbar spinal and neural foraminal stenosis: a natural language processing analysis of 43,255 lumbar MRI reports. Neuroradiology 2021; 63: 959–966.

92.

Tumko

Kim

Uspenskaia

, et al. A neural network model for detection and classification of lumbar spinal stenosis on MRI. Eur Spine J 2024; 33: 941–948.

93.

van der Kolk

BYM

Slotman

Nijholt

, et al. Bone visualization of the cervical spine with deep learning-based synthetic CT compared to conventional CT: a single-center noninferiority study on image quality. Eur J Radiol 2022; 154: 110414.

94.

Wang

Kim

Chang

. Deep learning algorithm trained on cervical magnetic resonance imaging to predict outcomes of transforaminal epidural steroid injection for radicular pain from cervical foraminal stenosis. J Pain Res 2023; 16: 2587–2594.

95.

Wang

Tang

Sun

, et al. Multiple sclerosis identification by 14-layer convolutional neural network with batch normalization, dropout, and stochastic pooling. Front Neurosci 2018; 12: 818.

96.

Wang

Xiao

Tan

. Spinal magnetic resonance image segmentation based on U-net. J Radiat Res Appl Sci 2023; 16: 100627.

97.

Wilson

Gaonkar

Yoo

, et al. Predicting spinal surgery candidacy from imaging data using machine learning. Neurosurgery 2021; 89: 116–121.

98.

Won

Lee

, et al. Spinal stenosis grading in magnetic resonance imaging using deep convolutional neural networks. Spine 2020; 45: 804–812.

99.

Yasaka

Tanishima

Ohtake

, et al. Deep learning reconstruction for 1.5T cervical spine MRI: effect on interobserver agreement in the evaluation of degenerative changes. Eur Radiol 2022; 32: 6118–6125.

100.

Yasaka

Tanishima

Ohtake

, et al. Deep learning reconstruction for the evaluation of neuroforaminal stenosis using 1.5T cervical spine MRI: comparison with 3T MRI without deep learning reconstruction. Neuroradiology 2022; 64: 2077–2083.

101.

Yilizati-Yilihamu

Yang

, et al. A spine segmentation method based on scene aware fusion network. BMC Neurosci 2023; 24: 15.

102.

Zhang

Javeed

Greenberg

, et al. Diffusion basis spectrum imaging (DBSI) prognosticates outcomes for cervical spondylotic myelopathy after surgery. J Clin Transl Sci 2022; 6: 62.

103.

Zhang

Javeed

Greenberg

, et al. Diffusion basis spectrum imaging identifies clinically relevant disease phenotypes of cervical spondylotic myelopathy. Clin Spine Surg 2023; 36: 134–142.

104.

Zhang

Jayasekera

Javeed

, et al. Diffusion basis spectrum imaging predicts long-term clinical outcomes following surgery in cervical spondylotic myelopathy. Spine J 2023; 23: 504–512.

105.

Zhang

Ou-Yang

Jiang

, et al. Optimal machine learning methods for radiomic prediction models: clinical application for preoperative T₂*-weighted images of cervical spondylotic myelopathy. JOR Spine 2021; 4: 9.

106.

Zhang

Ou-Yang

Liu

, et al. Predicting postoperative recovery in cervical spondylotic myelopathy: construction and interpretation of T2*-weighted radiomic-based extra trees models. Eur Radiol 2022; 32: 3565–3575.

107.

Zhang

Bhalerao

Hutchinson

. Deformable appearance pyramids for anatomy representation, landmark detection and pathology classification. Int J Comput Assist Radiol Surg 2017; 12: 1271–1280.

108.

Zhao

Sun

Zhou

, et al. Residual-Atrous attention network for lumbosacral Plexus segmentation with MR image. Comput Med Imaging Graph 2022; 100: 10.

109.

Zhang

Yang

Shen

, et al. Seuneter: channel attentive U-net for instance segmentation of the cervical spine MRI medical image. Front Physiol 2022; 13: 12.

110.

Zhao

Chen

, et al. Automatic spondylolisthesis grading from MRIs across modalities using faster adversarial recognition network. Med Image Anal 2019; 58: 101533.

111.

Zhao

Guo

Wang

, et al. Functional MRI evidence for primary motor Cortex plasticity contributes to the disease’s severity and prognosis of cervical spondylotic myelopathy patients. Eur Radiol 2022; 32: 3693–3704.

112.

Zhou

Liu

Zhang

, et al. BUA-Net: boundary and uncertainty-aware attention network for lumbar multi-region magnetic resonance imaging segmentation. Biomed Signal Process Control 2024; 94: 106267.

113.

Zhu

Wang

, et al. Artificial intelligence algorithm-based lumbar and spinal MRI for evaluation of efficacy of Chinkuei Shin Chewan decoction on lumbar spinal stenosis. Contrast Media Mol Imaging 2021; 2021: 1.

114.

Zhang

Zhu

Zhou

, et al. Deep learning-based cervical spine posterior percutaneous endoscopic disc nucleus resection for the treatment of cervical spondylotic radiculopathy. J Healthc Eng 2021; 2021: 1–12.

115.

Wang

Liu

. RETRACTED: study on automatic multi-classification of spine based on deep learning and postoperative infection screening (retracted article). J Healthc Eng 2022; 2022: 1–10.

116.

Biercher

Meller

Wendt

, et al. Using deep learning to detect spinal cord diseases on thoracolumbar magnetic resonance images of dogs. J Vet Intern Med 2021; 35: 2943–3079.

117.

Boudreau

Otamendi

Levine

, et al. Relationship between machine-learning image classification of T₂-weighted intramedullary hypointensity on 3 tesla magnetic resonance imaging and clinical outcome in dogs with severe spinal cord injury. J Neurotrauma 2021; 38: 725–733.

118.

Achar

Hwang

Finkenstaedt

, et al. Deep-learning-aided evaluation of spondylolysis imaged with ultrashort Echo time magnetic resonance imaging. Sensors (Basel) 2023; 23: 8001.

119.

Zhao

Tang

, et al. Deep learning-based high-accuracy detection for lumbar and cervical degenerative disease on T2-weighted MR images. Eur Spine J 2023; 32: 3807–3814.

120.

Hopkins

Weber

Kesavabhotla

, et al. Machine learning for the prediction of cervical spondylotic myelopathy: a post hoc pilot study of 28 participants. World Neurosurg 2019; 127: E436–E442.

121.

Jardon

Tan

Chazen

, et al. Deep-learning-reconstructed high-resolution 3D cervical spine MRI for foraminal stenosis evaluation. Skeletal Radiol 2023; 52: 725–732.

122.

Merali

Wang

Badhiwala

, et al. A deep learning model for detection of cervical spinal cord compression in MRI scans. Sci Rep 2021; 11: 11.

123.

Kim

Seo

, et al. Cerebrospinal fluid flow artifact reduction with deep learning to optimize the evaluation of spinal canal stenosis on spine MRI. Skeletal Radiol 2024; 53: 957–965.

124.

Deng

Sun

, et al. A convolutional neural network for automated detection of cervical ossification of the posterior longitudinal ligament using magnetic resonance imaging. Clin Spine Surg 2023; 37: E106–e112.

125.

Niemeyer

Galbusera

Tao

, et al. Deep phenotyping the cervical spine: automatic characterization of cervical degenerative phenotypes based on T2-weighted MRI. Eur Spine J 2023; 32: 3846–3856.

126.

Wang

Shen

, et al. Classification of diffusion tensor metrics for the diagnosis of a myelopathic cord using machine learning. Int J Neural Syst 2018; 28: 1750036.

127.

Shemesh

Kimchi

Yaniv

, et al. MRI-based detection of cervical ossification of the posterior longitudinal ligament using a novel automated machine learning diagnostic tool. Neurosurg Focus 2023; 54: 11.

128.

Wang

Cui

, et al. Prediction of myelopathic level in cervical spondylotic myelopathy using diffusion tensor imaging. J Magn Reson Imaging 2015; 41: 1682–1688.

129.

Wang

Zhao

Zhu

, et al. Voxel- and tensor-based morphometry with machine learning techniques identifying characteristic brain impairment in patients with cervical spondylotic myelopathy. Front Neurol 2024; 15: 14.

130.

Xie

Yang

Jiang

, et al. MRI Radiomics-based decision support tool for a personalized classification of cervical disc degeneration: a two-center study. Front Physiol 2024; 14: 11.

131.

Mohr

van Rijn

. Fast and informative model selection using learning curve cross-validation. IEEE Trans Pattern Anal Mach Intell 2023; 45: 9669–9680.

132.

Liu

Liao

Jiang

, et al. Fast cross-validation for Kernel-based algorithms. IEEE Trans Pattern Anal Mach Intell 2019; 42: 1083–1096.

133.

Mlambo

Chironda

George

. Risk stratification of COVID-19 using routine laboratory tests: a machine learning approach. Infect Dis Rep 2022; 14: 900–931.

134.

Hojjat

Ayed

Garvin

, et al. Spine labeling in MRI via regularized distribution matching. Int J Comput Assist Radiol Surg 2017; 12: 1911–1922.

135.

Heus

Damen

Pajouheshnia

, et al. Poor reporting of multivariable prediction model studies: towards a targeted implementation strategy of the TRIPOD statement. BMC Med 2018; 16: 20.

136.

Kyrimi

Mossadegh

Tai

, et al. An incremental explanation of inference in Bayesian networks for increasing model trustworthiness and supporting clinical decision making. Artif Intell Med 2020; 103: 101812.

137.

Ennab

McHeick

. Designing an interpretability-based model to explain the artificial intelligence algorithms in healthcare. Diagnostics (Basel) 2022; 12: 1557.

138.

Ragab

Albukhari

Alyami

, et al. Ensemble deep-learning-enabled clinical decision support system for breast cancer diagnosis and classification on ultrasound images. Biology (Basel) 2022; 11: 39.

139.

Deepak

Ameer

. Brain tumor classification using deep CNN features via transfer learning. Comput Biol Med 2019; 111: 103345.

140.

Heo

Kim

Choi

, et al. Prediction of stroke outcome using natural language processing-based machine learning of radiology report of brain MRI. J Pers Med 2020; 10: 286.

141.

Rauschecker

Rudie

Xie

, et al. Artificial intelligence system approaching neuroradiologist-level differential diagnosis accuracy at brain MRI. Radiology 2020; 295: 626–637.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.05 MB

0.03 MB

0.02 MB

0.06 MB