Abstract
Study Design
Retrospective case–control study.
Objectives
To identify risk factors for proximal junctional kyphosis (PJK) after long-segment fusion in adult degenerative scoliosis (ADS) and to develop a machine learning–based prediction model with external validation.
Methods
We retrospectively analyzed 142 ADS patients from two institutions undergoing posterior long-segment fusion with ≥24 months follow-up. Patients from center A (n = 105) formed the training cohort, and those from center B (n = 37) served as the external validation cohort. Demographic, radiographic, and surgical parameters were compared between patients with and without PJK. Independent predictors were determined with multivariate logistic regression. Least absolute shrinkage and selection operator (LASSO) regression identified key variables. Six supervised machine learning algorithms were trained using center A data and validated on center B data. Model interpretability was assessed using Local Interpretable Model-agnostic Explanations (LIME).
Results
PJK occurred in 24 patients (16.9%). Logistic regression identified lower T-score, higher T1–pelvic angle, and female sex as independent predictors, with ASA grade III showing a marginal effect. LASSO retained five features: T score, ASA grade, T1PA, sacral slope, and pelvic incidence. Among algorithms, the back-propagation neural network with LASSO feature selection yielded the best discrimination (external validation AUC = 0.882). LIME analysis confirmed T score, T1PA, and PI as the most influential predictors.
Conclusions
Reduced bone density, impaired sagittal balance, and higher ASA grade increase PJK risk after long-segment fusion in ADS. A neural network combined with LASSO feature selection demonstrated superior predictive performance, supporting its potential for individualized preoperative risk assessment and surgical planning.
Keywords
Background
Adult degenerative scoliosis (ADS) is a common spinal deformity in adulthood, characterized by asymmetric disc degeneration and facet joint arthropathy, which can lead to spinal imbalance and a series of functional impairments.1,2
The development of ADS is closely associated with progressive disc degeneration, dehydration and collapse, facet joint disease, and ligamentous laxity, eventually resulting in spinal instability, rotation, spondylolisthesis, and kyphotic deformity. 1 Epidemiological studies have reported a high prevalence of ADS in individuals over 60 years of age, with rates ranging from 8.3% to 68%, and a mean onset age of approximately 70 years. Given its typically progressive natural course, ADS not only causes back pain and lower-extremity neurological symptoms but also markedly reduces patients’ quality of life. 3 While some patients may undergo conservative treatment, evidence suggests that such strategies often fail to achieve satisfactory outcomes. Consequently, surgery has become the primary treatment option for those with refractory pain, deformity progression, or severe neurological deficits. 4 The surgical goals are to restore overall balance through decompression, deformity correction, and fusion with instrumentation.
However, adult spinal deformity surgery is associated with a relatively high complication rate, among which proximal junctional kyphosis (PJK) represents one of the most frequent modes of failure. A more severe manifestation is proximal junctional failure (PJF), characterized by vertebral fracture at the upper instrumented vertebra (UIV) or UIV+1, screw pullout, adjacent segment subluxation, pronounced instability, or associated neurological deficits. PJF often necessitates revision surgery, with an acute incidence of approximately 5.6%, and 41% of patients requiring reoperation. 5 Both PJK and PJF contribute not only to reduced quality of life but also to significant increases in revision rates and overall healthcare costs. 6
Numerous studies have investigated the risk factors for PJK/PJF, and their etiology is widely considered multifactorial, encompassing patient-, surgery-, and alignment-related parameters. Patient-specific risk factors such as advanced age, osteoporosis or low bone mineral density, sarcopenia, comorbidities, and smoking have all been reported. 7
Despite the introduction of various preventive strategies in recent years—including minimizing proximal soft tissue disruption, careful UIV selection, application of transition rods or U-hooks, prophylactic vertebral augmentation, and preoperative osteoporosis treatment—no universally effective method has been established. Importantly, PJK/PJF continues to occur frequently even when radiographic parameters are optimized. 5 Therefore, accurate preoperative risk prediction of PJK/PJF based on multiple clinical and radiographic factors is critical for improving surgical outcomes. 8
In this context, machine learning (ML) offers a novel avenue of investigation. As a core methodology in artificial intelligence, ML can automatically detect complex patterns in large-scale medical imaging and clinical datasets, outperforming conventional linear models such as logistic regression.9-13 The present study aimed to develop and validate a machine learning–based prediction model for PJK risk in ADS patients undergoing long-segment fusion, with the goal of enhancing risk assessment and clinical decision-making. Furthermore, we sought to explore the performance of different ML algorithms in small-sample data settings.
Methods
Study Design and Patient Population
From 2013 to 2025, consecutive patients diagnosed with ADS who underwent posterior spinal fusion at two independent tertiary care institutions were retrospectively reviewed. Patients from Center A (n = 105) were assigned to the training cohort, while those from Center B (n = 37) formed the external test cohort to evaluate model generalizability. All patients completed a minimum of 24 months of follow-up.The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments, and was approved by the institutional ethics committee. Owing to the retrospective nature of the analysis, the requirement for individual informed consent was waived.
The inclusion criteria were as follows: Patients aged over 50 years; Availability of preoperative spinal CT and MRI; Preoperative Cobb angle greater than 10°; Minimum follow-up duration of 24 months.
The exclusion criteria were: History of childhood or adolescent scoliosis; Secondary scoliosis due to other conditions, such as autoimmune diseases, tuberculosis, malignancy, post-traumatic deformity, or syndromic scoliosis; Follow-up of less than 24 months.
Based on the occurrence of proximal junctional problems (PJK) during follow-up, patients were divided into two groups.
Outcomes
The diagnostic criteria for PJK were based on established definitions in prior literature: an increase in the proximal junctional angle (PJA) of ≥10° compared with preoperative measurement, or an absolute PJA of at least 10° at the final 24-month follow-up. Proximal junctional failure (PJF) was defined as a complication involving fracture at the proximal junction, fixation failure, or kyphosis requiring extension of fusion cranially.14,15 All radiographic parameters were independently assessed by two spine surgeons; discrepancies were resolved by discussion with a third senior surgeon.
Data Collection
A total of 142 patients met the inclusion criteria. Baseline demographic and clinical variables collected included: age, sex, body mass index (BMI), smoking history, alcohol use, hypertension, chronic obstructive pulmonary disease (COPD), malignancy, and diabetes.
Radiographic parameters were measured from pre- and postoperative full-length standing radiographs, including: pelvic incidence (PI), pelvic tilt (PT), lumbar lordosis (LL), sacral slope (SS), sagittal vertical axis (SVA), and T1–pelvic angle (T1PA). Bone density was assessed using both the vertebral bone quality (VBQ) score from preoperative MRI and the femoral neck T-score from dual-energy X-ray absorptiometry (DEXA).
Surgical variables included fusion length and the American Society of Anesthesiologists (ASA) score.
Statistical Analysis
Continuous variables were presented as mean ± standard deviation and compared between groups using independent-sample t-tests, while categorical variables were expressed as counts and percentages and compared using chi-square tests. Variables identified as significant in univariate analyses were further tested in multivariate logistic regression to determine independent risk factors for PJK. The predictive performance of the regression model was also evaluated.
Machine Learning
To identify predictive features for PJK, least absolute shrinkage and selection operator (LASSO) regression was applied to all 20 candidate variables, with the optimal penalty parameter (λ) determined through 10-fold cross-validation. Variables with coefficients reduced to zero were excluded, and the remaining features were used for model development.
Six supervised learning algorithms were trained using both the selected features and the full feature set: Naïve Bayes, support vector machine (SVM), random forest, k-nearest neighbors (KNN), back-propagation neural network, and LightGBM. Each model was trained with 10-fold cross-validation on the training cohort (center A) and subsequently evaluated on the external validation cohort (center B). Model performance was assessed in terms of accuracy, precision, recall, F1 score, and area under the curve (AUC), with the best-performing model chosen based on external validation results.
To improve interpretability, Local Interpretable Model-agnostic Explanations (LIME) was applied to the optimal model. Global feature importance was quantified using mean weight statistics, and local interpretability was explored to assess individual variable contributions to case-level predictions.
Results
Patient Characteristics
Baseline Characteristics of Patients With and Without PJK
Note. ***, **, and * indicate significance levels of 1%, 5%, and 10%, respectively.
Comparison of Demographic and Clinical Variables Between Patients With and Without PJK
Logistic Regression Analysis
Multivariate logistic regression achieved good predictive performance, with accuracy of 0.824, recall of 0.824, precision of 0.781, F1 score of 0.789, and AUC of 0.827 (Figure 1). Significant independent predictors of PJK were lower T score (P = .036), higher T1PA (P = .044), and female sex (P = .050). Specifically, higher T score was associated with decreased PJK risk, whereas higher T1PA increased risk; female sex appeared protective. ASA grade III showed a trend toward significance (OR = 3.258, 95% CI: 0.930-11.412, P = .065), suggesting it may represent a potential risk factor. Other variables, including PI, fusion length, SVA, Cobb angle, BMI, PT, SS, and comorbidities, were not significantly associated with PJK (Table 3). ROC curve of the multivariate logistic regression model predicting proximal junctional kyphosis (AUC = 0.827) Multivariate Logistic Regression Analysis of Risk Factors for Proximal Junctional Kyphosis (PJK) After Long-Segment Fusion in Adult Degenerative Scoliosis
Feature Selection by LASSO Regression
A total of 20 candidate variables were considered for feature selection using LASSO regression with 10-fold cross-validation to reduce overfitting. The coefficient path is shown in Figure 2. As the regularization parameter (λ) increased, most coefficients approached zero. At the optimal λ value of 0.03441239—as indicated by the minimum mean squared error (MSE) in Figure 3—the model achieved its best predictive performance. Five predictors were ultimately retained (Figure 4), while others with coefficients shrunk to zero were excluded. Feature weight visualization demonstrated that T score and ASA were the most influential predictors, followed by T1PA, SS and PI were also retained. Coefficient path of variables in the LASSO regression model Ten-fold cross-validation curve for LASSO regression showing mean squared error across different λ values Predictors retained by LASSO regression: T score, ASA grade, T1 pelvic angle (T1PA), sacral slope (SS), and pelvic incidence (PI)


Machine Learning Model Performance
The predictive performance of six supervised algorithms before and after LASSO feature selection is shown in Figure 5 and summarized in Tables 4 and 5. LASSO generally improved model discrimination by reducing redundant variables. The back-propagation neural network achieved the highest test set performance after feature selection, with an AUC of 0.882, followed by SVM and random forest (Figures 6 and 7). Naïve Bayes, however, showed markedly reduced accuracy after LASSO. Based on these results, the back-propagation neural network was selected as the final prediction model for PJK risk. Receiver operating characteristic (ROC) curves of six supervised machine learning algorithms before and after LASSO feature selection, illustrating performance changes within each model Predictive Performance of Six Supervised Machine Learning Algorithms on the External Validation Cohort Using the Full Set of Variables Predictive Performance of Six Supervised Machine Learning Algorithms on the External Validation Cohort after LASSO Feature Selection Comparative ROC curves of the back-propagation neural network trained with the full variable set vs the LASSO-selected subset, showing improved performance after feature reduction Summary comparison of all machine learning models: (A) test set performance with the full variable set; (B) test set performance with LASSO-selected variables. The back-propagation neural network achieved the highest AUC among all algorithms


Model Interpretability
To enhance interpretability, the LIME method was applied to the final model. Global feature ranking identified PI, T score, and T1PA as the three most influential predictors, followed by ASA and SS. Notably, T score consistently emerged as a strong predictor across logistic regression, LASSO regression, and machine learning models, underscoring its stable importance in PJK risk stratification (Figure 8). Global feature importance ranking in the final model based on local interpretable model-agnostic explanations (LIME) analysis
Discussion
In this retrospective study of 142 patients with adult degenerative scoliosis (ADS) who underwent long-segment posterior spinal fusion, the overall incidence of proximal junctional kyphosis (PJK) was 16.9%. This rate is consistent with previous studies, supporting the external validity and clinical representativeness of our findings. Through multivariate regression and machine learning modeling, we identified several parameters closely associated with postoperative PJK, including pelvic incidence (PI), femoral neck T-score, T1–pelvic angle (T1PA), ASA grade, and sacral slope (SS). Among these, reduced T-score and increased T1PA—reflecting compromised bone quality and insufficient sagittal balance, respectively—emerged as the most significant risk factors. ASA grade III was also linked to higher PJK risk, underscoring the role of systemic frailty.
Our analysis highlights the importance of sagittal alignment parameters in PJK development. PI, a fixed morphologic parameter, determines the fundamental sagittal balance of an individual. Abnormal PI may predispose to mismatch following correction, leading to compensatory loads at the proximal junction. Elevated T1PA indicated insufficient sagittal compensation, integrating thoracic kyphosis and pelvic retroversion, and was associated with significantly higher PJK risk in this cohort. These findings align with prior reports that maintaining T1PA within 10°-30° may reduce junctional complications. 16 Similarly, decreased SS further reflected limited pelvic compensatory reserve. Taken together, these results echo the view of Xu et al. 17 that patients with impaired pelvic compensation are more susceptible to PJK, and suggest that modest PT overcorrection may be beneficial in such individuals.
Low bone density has repeatedly been linked to proximal junctional complications. Chen et al. 18 demonstrated in a systematic review that reduced T-score and Hounsfield units significantly increase PJK risk. Our findings corroborate this association: patients with lower femoral neck T-scores were more prone to PJK, consistent with the concept that osteoporosis impairs screw–bone anchorage and predisposes the proximal vertebrae to stress fractures. Both logistic regression and machine learning models confirmed the stable predictive value of bone mineral density. Du et al. 19 similarly identified BMD <−3.5 SD as a powerful risk factor for collapse in chronic osteoporotic vertebral fracture patients, achieving an AUC of 0.921. Our results extend these observations to the ADS population, emphasizing low bone quality as a central contributor to junctional failure.
In our cohort, only ASA II and III patients were included. ASA grade III patients were significantly more likely to develop PJK, highlighting the influence of systemic comorbidities and reduced physiological reserve. This is consistent with Hackett et al., 20 who reported ASA grade as an independent predictor of postoperative complications and mortality in spine surgery. Our findings suggest that ASA classification remains a simple yet robust clinical tool for risk prediction, reinforcing the need to identify ASA III patients as high-risk candidates who may benefit from perioperative optimization and closer postoperative monitoring.
The observed risk factors reflect multiple interacting mechanisms. Low BMD weakens the screw–vertebra interface and predisposes to proximal junctional fracture18,19,21; paraspinal muscle degeneration compromises local biomechanical stability1,3; impaired pelvic compensation shifts excessive loads to the thoracic spine 17 ; and poor systemic reserve in ASA III patients limits healing and compensatory capacity. 20 Together, these interrelated mechanisms explain the multifactorial etiology of PJK.
Interestingly, the regression model indicated that female sex was associated with a lower risk of PJK (OR <1), which appears counterintuitive given the higher prevalence of osteoporosis among women. This finding should be interpreted cautiously. One possible explanation is that male patients in our cohort tended to present with more severe baseline deformity and a higher proportion of ASA grade III, reflecting greater systemic comorbidity and frailty. Alternatively, the apparent protective effect may reflect sampling variability due to the relatively small cohort size. Larger multicenter studies are warranted to clarify the true direction and magnitude of the sex-related association with PJK risk.
We compared six supervised machine learning algorithms to assess predictive performance. The back-propagation neural network (BP) combined with LASSO feature selection achieved the best performance (test set AUC = 0.882), outperforming SVM, random forest, LightGBM, and other models. BP networks are well suited to capture complex nonlinear relationships, while LASSO mitigated overfitting by excluding redundant features, thus enabling effective learning from a limited sample. Moreover, the relatively shallow architecture of the BP network used in this study, combined with LASSO-based dimensionality reduction, allowed the model to maintain adequate representational power while avoiding overfitting—a common issue in small-sample datasets. Unlike SVMs, which rely on kernel transformations that may become unstable with limited observations, or Random Forests, which can introduce variance through repeated resampling, the BP architecture adjusted its parameters iteratively through gradient descent, achieving balanced bias–variance trade-offs. This likely explains its superior external performance despite the restricted sample size.
By contrast, Naïve Bayes relied on strong independence assumptions, KNN was more vulnerable to noise, and ensemble models such as random forest and LightGBM exhibited tendencies toward overfitting in the small-sample setting. Although SVM performed relatively well, its accuracy and stability were inferior to BP networks.
Our results suggest that in small datasets, the strategy of “feature selection plus moderately complex models” may maximize predictive performance. Compared with prior models, such as Tian’s logistic regression based on radiographic parameters (AUC = 0.806) 22 and Wang’s random forest model (AUC = 0.847), 23 our BP + LASSO model achieved superior performance, approaching state-of-the-art levels. This demonstrates the promise of neural networks, when combined with appropriate dimensionality reduction, in enhancing individualized risk prediction for PJK.
Importantly, this study represents a multicenter design, with training and testing performed on independent institutional datasets. The external validation confirmed the robustness of the BP + LASSO model (AUC = 0.882), supporting its potential generalizability to broader clinical populations.
This study has several limitations. First, although a multicenter design with external validation was adopted, the overall sample size remained relatively small, which may limit generalizability; larger prospective studies are required for further validation. Second, only preoperative demographic and radiographic variables were included, without intraoperative or postoperative management factors such as upper instrumented vertebra selection or prophylactic measures. The exclusion of these variables may limit the comprehensiveness of the model, and future work should consider incorporating them to better reflect the multifactorial nature of PJK development. Third, the median follow-up was less than 3 years, restricting conclusions regarding long-term outcomes. Given that mechanical and biological complications of adult degenerative scoliosis surgery may continue to accumulate over time, a longer follow-up period (ideally ≥5 years) would provide a more comprehensive understanding of the temporal evolution and durability of surgical outcomes. Future prospective studies with extended follow-up are therefore warranted to validate the long-term predictive accuracy of the proposed model. Additionally, although this study incorporated an external validation cohort, the relatively small sample size of the validation set (n = 37) may limit the stability and generalizability of the reported AUC values. Small test sets can lead to greater variability in performance metrics, particularly in machine learning models trained on limited data. Accordingly, future studies involving larger multicenter datasets are necessary to further validate and calibrate the model’s robustness across diverse populations.
Future research should focus on prospective cohorts with larger sample sizes to validate and refine predictive models. Incorporating multimodal data—including radiomics, muscle quality, and advanced imaging biomarkers—with deep learning techniques may further enhance predictive power. The development of clinician-friendly, interpretable prediction tools will also be essential to facilitate real-world adoption, enabling rapid preoperative risk stratification and personalized surgical planning. 24
Footnotes
Author Note
Present or permanent address Beijing Anzhen Hospital, Capital Medical University, 2 Anzhen Road, Chaoyang District, Beijing, China.
Acknowledgments
We thank the National Natural Science Foundation of China and the R&D Program of Beijing Municipal Education Commission for their financial support. We also acknowledge the contributions of the Chinese Institutes for Medical Research, Beijing, for providing essential resources. The Manuscript submitted does not contain information about medical device(s)/drug(s).
Ethical Considerations
As this study utilized retrospective data analysis, a waiver of informed consent was granted by the ethics committee in accordance with national regulations and the Declaration of Helsinki. This study was approved by the Ethics Committee of Beijing Anzhen Hospital (Ethical Approval No. 2025228x). This study was approved by the Ethics Committee of the hospitals. All procedures performed involving human participants were conducted in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki Declaration and its later amendments.
Consent to Participate
Owing to the retrospective nature of the analysis, the requirement for individual informed consent was waived.
Author Contributions
All authors contributed to the study conception and design. Yong Hai supervised the study. Xinglin Liu did the material preparation and data collection work, machine learning, analysis and manuscript writing were performed by Xianglong Meng and Sheyang Xu, Zhiheng Zhao did the visual work and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We thank the National Natural Science Foundation of China and the R&D Program of Beijing Municipal Education Commission for their financial support. We also acknowledge the contributions of the Chinese Institutes for Medical Research, Beijing, for providing essential resources. The Manuscript submitted does not contain information about medical device(s)/drug(s).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data availability is not applicable to this study.
