Abstract
Study design
Retrospective study.
Objective
Osteoporotic vertebral compression fractures (OVCF) affect postmenopausal women, with 30-40% requiring surgical intervention after conservative treatment failure. This study developed a CT and MRI radiomics-based model to predict conservative treatment failure risk.
Methods
We retrospectively analyzed 154 postmenopausal women with OVCF (2016-2024), divided into successful (n = 86) and failed (n = 68) conservative treatment groups. Three-dimensional regions of interest were delineated, and quantitative features extracted using PyRadiomics. Feature selection employed Mann-Whitney U test, Spearman correlation, and LASSO regression. Clinical, radiomics, and combined models were constructed using eight machine learning algorithms with 5-fold cross-validation.
Results
Age and vertebral CT Hounsfield units were significant clinical predictors. From 3668 initial features, 16 key radiomics features were selected. LightGBM performed best for clinical models, while k-nearest neighbors excelled for radiomics models. In testing, the clinical model achieved AUC 0.684 (accuracy 0.71), radiomics model AUC 0.812 (accuracy 0.71), and combined model AUC 0.859 (accuracy 0.806). The combined model significantly outperformed individual models.
Conclusion
The comprehensive CT and MRI radiomics-based model accurately predicts conservative treatment failure risk in postmenopausal women with OVCF. This tool enables early identification of high-risk patients and supports individualized treatment decisions, potentially guiding early surgical intervention for predicted high-risk cases.
Keywords
Introduction
Osteoporotic vertebral compression fracture (OVCF) is a fragility fracture resulting from decreased vertebral bone density and quality due to osteoporosis, which can occur under minimal or even no apparent external force. OVCF is the most common type of osteoporotic fracture, clinically manifesting primarily as thoracolumbar pain.1-3 Postmenopausal women are particularly susceptible to OVCF due to bone loss secondary to estrogen deficiency. 4 OVCF causes severe harm to patients: vertebral compression and collapse can lead to height loss and kyphotic deformity, subsequently causing chronic back pain, cardiopulmonary dysfunction, and gastrointestinal disorders. Due to delayed bone healing processes, OVCF patients have increased rates of delayed union or pseudarthrosis; prolonged bed rest can lead to further bone loss, exacerbating osteoporosis and inducing complications such as thrombosis, pneumonia, and urinary infections.5-7
Treatment for OVCF includes conservative and surgical approaches. For acute OVCF, clinicians typically first choose non-surgical conservative measures, including bed rest, analgesics, external bracing, and anti-osteoporotic medications. 8 However, conservative treatment has notable limitations: it cannot correct vertebral deformity, often leaving patients with persistent back pain 9 ; additionally, prolonged bed rest may cause cardiopulmonary decline, venous thrombosis, and other complications, increasing patient suffering and family burden. More importantly, in patients with severe osteoporosis, conservative treatment outcomes are often unpredictable. Some patients experience nonunion, continued vertebral collapse, progressive kyphosis, or even neural compression despite standard conservative treatment, ultimately requiring surgery. Studies show that approximately 40-45% of OVCF patients fail conservative treatment and require subsequent surgical intervention.10,11 Therefore, early identification of postmenopausal OVCF patients likely to fail conservative treatment and require aggressive surgery is a pressing clinical issue.
Previous literature has identified several risk factors associated with conservative treatment failure. Clinically, advanced age and low bone mineral density (BMD) are significantly associated with OVCF conservative treatment failure. Systematic reviews indicate that MRI findings are often closely related to conservative treatment failure; if imaging features could predict prognosis early in the fracture course, more aggressive treatments such as early surgery could be considered.10,12,13 However, single clinical or imaging indicators often cannot accurately predict individual patient conservative treatment outcomes.
The development of radiomics technology provides new solutions to this challenge. Radiomics quantitatively extracts massive features from medical images, mining potential patterns in image texture, intensity, and shape information, serving as a powerful tool for disease diagnosis and prognosis assessment. Unlike traditional subjective image interpretation, radiomics can extract hundreds to thousands of features from CT, MRI, and other images, including one-, two-, and three-dimensional features, maximizing the exploration of deep information in imaging data.14-16 Recent orthopedic research has also demonstrated unique advantages of radiomics: one study used MRI radiomics models to predict new vertebral fracture risk after vertebroplasty, proving the feasibility of radiomics in fracture risk assessment. 17 Another study showed that CT radiomics could identify high-risk vertebrae more accurately than traditional HU models, predicting OVCF risk in postmenopausal women. 18 These findings suggest great potential for applying radiomics to OVCF outcome prediction.
Based on this background, this study proposes combining CT and MRI radiomics to construct a predictive model for identifying postmenopausal women with OVCF at risk of conservative treatment failure at initial presentation through quantitative analysis of fractured vertebral CT and MRI images. We developed a radiomics signature and established a comprehensive predictive model (nomogram) combining clinical risk factors to evaluate its predictive efficacy and clinical utility for OVCF conservative treatment failure.
Materials and Methods
Patient Selection and Grouping
This study included postmenopausal women with osteoporotic vertebral compression fractures treated at our institution from January 2016 to December 2024. The research conducted has been performed in accordance with the Declaration of Helsinki. The study has been approved by Peking University People’s Hospital and all participants have signed the consent informs. The patients selection process is shown in Figure 1. Flowgram of Patients Selection
Inclusion Criteria:1.Postmenopausal women aged >50 years. 2. Newly diagnosed, single-level OVCF confirmed by X-ray, CT, and MRI. 3. Symptom onset and imaging diagnosis within 4 weeks to ensure fracture acuity and comparability. 4. First-time conservative treatment, including bed rest, analgesics, bracing, and anti-osteoporotic medication.
Exclusion Criteria: 1. Secondary osteoporosis due to other causes, such as long-term glucocorticoid use, endocrine or metabolic disorders (eg, hyperparathyroidism, thyroid disease, Cushing’s syndrome), chronic kidney disease, or malignancy-related bone loss. 2. Pathological fractures (tumor, infection, or metastatic disease). 3. Fractures associated with spinal cord or nerve injury requiring emergency surgical intervention. 4. Previous spinal surgery. 5. Multiple-level vertebral compression fractures at initial diagnosis. 6. Incomplete imaging data (CT or MRI not available). 7. Loss of follow-up.
Patients were divided into two groups based on conservative treatment outcomes: successful conservative treatment group (satisfactory symptom relief, gradual fracture healing, no surgery required) and failed conservative treatment group (converted to surgery within 3 months due to uncontrollable pain or fracture nonunion progression). Criteria for conservative treatment failure included: intolerable pain despite adequate analgesia, or fracture nonunion with pseudarthrosis after bed rest, further vertebral collapse, progressive kyphosis, or even neural compression symptoms, resulting in severe limitation of daily activities. All patients provided informed consent at enrollment.
Imaging Examination and Image Acquisition
The radiomics model construction flowgram is shown in Figure 2. Radiomics Model Construction Workflow. Schematic Diagram Illustrating the Comprehensive Radiomics Analysis Pipeline, including Image Acquisition (CT and MRI), Three-Dimensional Region of Interest (ROI) Segmentation Using ITK-SNAP, High-Throughput Feature Extraction via PyRadiomics, Feature Selection Through Statistical Filtering and LASSO Regression, and Model Development Using Machine Learning Algorithms With 5-Fold Cross-Validation
All patients underwent CT and MRI examinations of the fractured vertebra at initial presentation. CT was performed using multi-slice spiral CT with thin-slice scanning. MRI was completed on 3T scanners, including sagittal T1-weighted (T1WI), T2-weighted (T2WI), and short tau inversion recovery (STIR) sequences.
To ensure consistency of imaging data across different scanners and acquisition parameters, all CT and MRI images undergo three-dimensional resampling and intensity normalization before radiomics feature extraction: (1) CT images: All CT scans are resampled to an isotropic voxel size of 1.0 × 1.0 × 1.0 mm3 using bilinear interpolation to minimize the effects of slice thickness and pixel spacing variability. Gray values are clipped to the range of [−200, 1200] Hounsfield units (HU) and z-score normalized within the region of interest (ROI) to reduce scanner- and kernel-related intensity differences. (2)MRI images (T2-weighted): All T2-weighted images first undergo N4 bias field correction to reduce intensity non-uniformity. They are then resampled to an isotropic voxel size of 1.0 × 1.0 × 1.0 mm3 using cubic spline interpolation. Finally, gray-level intensity values are normalized (mean = 0, standard deviation = 1) to ensure comparability of signal distributions across patients and scanners.
Subsequently, two experienced spine radiologists used ITK-SNAP to delineate three-dimensional regions of interest (ROI) of the fractured vertebra. The ROI encompassed the entire vertebral body cancellous bone region (from superior to inferior endplate), including fracture clefts and surrounding bone marrow edema areas where possible, but excluding posterior vertebral bony appendages. To assess segmentation consistency, 20 patients were randomly selected for ROI re-segmentation by a third radiologist to calculate feature reproducibility (eg, ICC consistency evaluation); results showed most features had ICC>0.9, indicating good reproducibility, with only highly reproducible features retained for subsequent analysis.
Radiomics Feature Extraction
After ROI delineation, we used the Python PyRadiomics toolkit to extract high-throughput quantitative features from each patient’s CT and MRI images. The handcrafted features can be divided into three groups: (I) geometry, (II) intensity, and (III) texture. The geometry features describe the three-dimensional shape characteristics of the tumor. The intensity features describe the first-order statistical distribution of the voxel intensities within the tumor. The texture features describe the patterns, or the second- and high-order spatial distributions of the intensities. Here the texture features are extracted using several different methods, including the gray-level co-occurrence matrix (GLCM), gray-level run length matrix (GLRLM), gray level size zone matrix (GLSZM), and neighborhood gray-tone difference matrix (NGTDM) methods.
Feature Selection
Statistics: We conducted Mann-Whitney U test statistical testing and feature screening for all radiomic features. Only radiomic features with P-value<0.05 were retained.
Correlation: For features with high repeatability, Spearman’s rank correlation coefficient was used to calculate correlations between features (Figure 1. Spearman correlation of each feature), retaining one feature when the correlation coefficient between any two features exceeded 0.9. To maximize feature representation capability, we used a greedy recursive deletion strategy for feature filtering, deleting the most redundant feature in the current set each time. After this process, 23 features were retained.
LASSO: The least absolute shrinkage and selection operator (LASSO) regression model was used on the discovery dataset for signature construction. Depending on the regulation weight λ, LASSO shrinks all regression coefficients toward zero and sets coefficients of many irrelevant features to exactly zero. To find optimal λ, 10-fold cross-validation with minimum criteria was employed, where the final λ value yielded minimum cross-validation error. The retained features with nonzero coefficients were used for regression model fitting and combined into a radiomics signature. Subsequently, we obtained a radiomics score for each patient through linear combination of retained features weighted by their model coefficients. The Python scikit-learn package was used for LASSO regression modeling.
Radiomics Signature
After LASSO feature screening, we input the final features into machine learning models including logistic regression, SVM, random forest, XGBoost, and others for risk model construction. We adopted 5-fold cross-validation to obtain the final radiomics signature.
Furthermore, to intuitively and efficiently assess the incremental prognostic value of the radiomics signature to clinical risk factors, a radiomics nomogram was presented on the validation dataset. The nomogram combined the radiomics signature and clinical risk factors based on logistic regression analysis.
Clinical Signature
The clinical signature building process was nearly identical to the radiomics signature. Features for building the clinical signature were selected by baseline statistics with P-value<0.05. We used the same machine learning models as in the radiomics signature building process. 5-Fold cross-validation and test cohort were fixed for fair comparison.
Radiomics Nomogram
The radiomics nomogram was established combining the radiomics signature and clinical signature. Diagnostic efficacy of the radiomics nomogram was tested in the test cohort, with ROC curves drawn to evaluate diagnostic efficacy.
Results
Clinical Baseline Characteristics
The Comparison of Two Groups’ Characteristics (Success and Failure)
The Comparison of Train and Test Set
Clinical Model Development and Results
Comparisons of the Prediction Performance of the Eight Machine Learning Algorithms in Clinical Models

Clinical Model Performance Comparison. Receiver Operating Characteristic (ROC) Curves Comparing the Performance of Eight Machine Learning Algorithms for the Clinical Model in the Test Set
Radiomics Feature Selection Results
Initially, approximately 3668 features were extracted from each patient’s CT and MRI images, with feature distribution shown in Figure 4A. After initial filtering based on P-values, 213 features were identified (Figure 4B). Subsequently reduced to 94 based on Spearman correlation coefficients. The LASSO regression process is shown in Figure 5A and B. The LASSO regression procedure ultimately reduced the feature count to only 16 radiomics features with independent predictive significance (5 CT, 11 MRI). Feature weights for the 16 features are shown in Figure 6. Distribution of Extracted Radiomics Features. (A) Pie Chart Showing the Proportion of Different Radiomics Feature Categories Extracted From CT and MRI Images, including First-Order Statistics (19.6%), Shape Features (0.8%), and Texture Features: GLCM (24.0%), GLRLM (17.4%), GLSZM (17.4%), GLDM (15.3%), and NGTDM (5.5%). (B) Violin Plots Displaying the P-value Distribution of Each Feature Category after Mann-Whitney U Test Statistical Screening, With Dashed Line Indicating P = 0.05 Significance Threshold LASSO Regression Feature Selection Process. (A) LASSO Coefficient Paths Showing Feature Coefficient Changes as Lambda Increases. The Vertical Dashed Line Indicates the Optimal Lambda Value (λ = 0.0450) Selected by 10-Fold Cross-Validation. (B) Mean Squared Error (MSE) Plot From 10-Fold Cross-Validation for Lambda Selection, With Error Bars Representing Standard Deviation. The Optimal Lambda Minimizes Cross-Validation Error while Maintaining Model Parsimony Feature Importance of the 16 Selected Radiomics Features. Horizontal bar Chart Displaying the Coefficients of Radiomics Features Retained after LASSO Regression. Features Are Derived From Both CT (n = 5) and MRI (n = 11) Images, including Wavelet Transforms, Texture Features (GLCM, GLRLM, GLSZM), and Shape Descriptors. Higher Coefficient Values Indicate Greater Contribution to Predicting Conservative Treatment Failure


Radiomics Model Development
Comparisons of the Prediction Performance of the Eight Machine Learning Algorithms in Radiomics Models

Radiomics Model Performance Comparison. ROC Curves Comparing Eight Machine Learning Algorithms for the Radiomics Model in the Test Set. K-Nearest Neighbors (KNN) Achieved the Best Performance With an AUC of 0.812
Combined Model Development
Comparative Performance of Clinical, Radiomics, and Combined Models

Comparison of Clinical, Radiomics, and Combined Model Performance. (A) ROC Curves in the Training Cohort Showing the Combined Nomogram Model (AUC 0.935, 95% CI 0.892-0.978) Outperforming Both the Radiomics Model (AUC 0.844, 95% CI 0.778-0.909) and Clinical Model (AUC 0.818, 95% CI 0.743-0.893). (B) ROC Curves in the Test Cohort Demonstrating Sustained Superior Performance of the Combined Model (AUC 0.859, 95% CI 0.717-1.000) Compared to Individual Models

Nomogram for Predicting Conservative Treatment Failure Risk. Clinical Decision Support Tool Integrating Clinical Signature (Clinic_Sig) and Radiomics Signature (Rad_Sig) to Calculate Individualized Risk Probability
Discussion
In this study, we constructed clinical feature, radiomics, and combined models to predict conservative treatment failure risk in postmenopausal women with osteoporotic vertebral compression fractures (OVCF). The results demonstrated that the combined model showed optimal discriminative performance in the training set with an AUC of 0.935, significantly superior to the radiomics model (AUC 0.844) and clinical model (AUC 0.818). In the test set, the combined model similarly exhibited the highest predictive capability (AUC 0.859), significantly exceeding the radiomics model (0.812) and clinical model (0.684).
In our clinical model, vertebral CT Hounsfield units (HU) and age played key roles together. Vertebral CT Hounsfield units, as a direct quantitative indicator of bone density, demonstrated significant predictive value in our model. Low HU values indicate significantly decreased trabecular density and cortical thinning within the vertebra, directly reflecting osteoporosis severity. 19 From a biomechanical perspective, low HU values suggest severely insufficient mechanical reserves in the vertebra. 20 After compression fracture occurs, the internal trabecular structure is severely damaged, lacking sufficient bone matrix to support the healing process. 21 Advanced age, as another important predictor, has multifaceted effects. First, with aging, the duration of estrogen deficiency in postmenopausal women extends, exacerbating bone loss. Second, elderly patients have decreased systemic physiological reserves, affecting multiple aspects of fracture healing. More importantly, elderly patients have significantly reduced tolerance for prolonged bed rest. 22 Conservative treatment typically requires strict bed rest for 2-3 months, but prolonged bed rest in elderly patients easily leads to complications including deep vein thrombosis, pneumonia, urinary tract infections, and muscle atrophy. These complications not only increase patient suffering but may further impair fracture healing, creating a vicious cycle. 23
Among CT-derived predictors, Small Area Low Gray Level Emphasis (SALGLE) and Low Gray Level Zone Emphasis (LGLZE) carried the strongest positive weights (Figure 6). These features indicate clustering of low-attenuation trabecular regions, which pathologically correspond to disrupted trabecular networks, poor mineralized matrix, and replacement of bone marrow space with fibrous tissue. Such microstructural fragility impairs mechanical stability and predisposes to nonunion or progressive collapse. Similar associations between CT-derived low-density texture patterns and reduced bone strength have been demonstrated in previous studies, where radiomics captured trabecular heterogeneity and correlated with histomorphometric indices of osteoporosis.24,25 Conversely, protective CT features with negative coefficients (eg, Gray Level Nonuniformity) reflect more uniform high-density trabecular distribution, histologically paralleling preserved bone connectivity and mature callus consolidation, which favor fracture healing. 26
MRI-derived predictors, particularly High Gray Level Run Emphasis (HGLRE) and Zone Variance from T2-weighted images, showed strong positive contributions to failure. These features represent heterogeneous hyperintense marrow signals, pathologically reflecting persistent bone marrow edema, inflammatory infiltration, and impaired microvascularization. Such changes delay osteoblast recruitment and mineralized callus formation, consistent with poor prognosis. Prior MRI-histology correlation studies confirm that extensive marrow edema corresponds to increased osteoclastic activity, incomplete bone bridging, and delayed healing. 27 Two MRI-derived radiomics features, square_girim_ShortRunEmphasis_MR and Ibp_3D_ml_gldm_SmallDependenceHighGrayLevelEmphasis_MR, exhibited negative coefficients in our model, suggesting a protective effect. Short Run Emphasis (SRE) quantifies the predominance of short gray-level runs within the ROI; higher SRE values indicate more fine, fragmented texture. In T2-weighted images, this pattern may correspond to localized, patchy but organized marrow signal changes, reflecting reversible edema or active reparative processes rather than destructive pathology. Similarly, Small Dependence High Gray Level Emphasis (SDHGLE) measures the occurrence of small-scale, high-intensity dependence clusters. Elevated SDHGLE values suggest that hyperintense marrow signals are confined to limited foci rather than diffusely distributed, which may indicate a controlled and spatially restricted inflammatory response.28-30 Prior studies have shown that bone marrow lesions can present with heterogeneous imaging features, where localized edema or fibrosis may coexist with ongoing remodeling.31,32
Taken together, the Rad-score histogram in Figure 6 reflects this dual process: CT features quantify structural fragility of trabeculae (“hard tissue”), while MRI features capture marrow and vascular pathology (“soft tissue”). High Rad-scores correspond to histological patterns of trabecular disruption and persistent inflammatory marrow changes, whereas low Rad-scores reflect intact trabecular continuity and reparative marrow remodeling. These findings support the concept that radiomics signatures act as non-invasive surrogates of the underlying histopathological determinants of fracture healing vs failure.
Our findings align with trends reported in previous literature, demonstrating the important value of radiomics in fracture risk assessment and treatment decision-making. Yu et al 18 showed that lumbar CT-based radiomics models could more accurately identify high-risk vertebrae in postmenopausal women, with OVCF risk prediction superior to traditional models (test set AUC 0.914). 18 Additionally, Wang et al constructed machine learning models integrating CT radiomics features and clinical factors to predict subsequent fracture risk after vertebroplasty, with their final radiomics-clinical nomogram achieving satisfactory predictive performance in external validation (AUC approximately 0.88). 20These studies indicate that radiomics can mine bone structural information difficult to quantify visually in medical images, enhancing prediction of fracture occurrence and outcomes. In contrast, radiomics analyses for OVCF conservative treatment prognosis remain relatively limited. This study fills this gap, with our results demonstrating the feasibility and potential clinical value of applying radiomics to fracture conservative treatment outcome prediction. Radiomics models promise to complement clinical assessment, early identifying high-risk patients likely to fail conservative therapy, thereby helping physicians optimize treatment decisions and improve individualized patient management.
Despite encouraging results, this study has several limitations. First, this was a single-center retrospective analysis with relatively limited sample size (n = 154), potentially subject to selection bias, with model parameter generalizability to other populations yet to be tested. Second, radiomics features are high-dimensional and numerous; although we employed LASSO and other methods for dimension reduction, the model still has potential for overfitting. Performance differences between training and test sets for some machine learning algorithms suggest possible overfitting to training data. Third, this study lacks independent validation from external cohorts; model generalizability and stability have not been tested with multicenter data. Applicability across different hospitals and scanning equipment requires further evaluation. Fourth, model construction relies on patients undergoing both CT and MRI examinations, increasing clinical implementation difficulty and cost, as not all patients routinely undergo dual-modality imaging in practice,which may limit feasibility and accessibility in resource-limited settings. Future studies should explore simplified single-modality models or more widely applicable alternatives to enhance clinical utility and adoption. Finally, the predictive framework is not yet integrated in real time with the hospital PACS system, requiring additional data import and processing steps that may hinder clinical usability.
Future research should focus on validating the proposed model in large-scale, multicenter prospective studies to ensure its stability and generalizability across different institutions, imaging devices, and patient populations. In parallel, efforts should be made to integrate the model seamlessly into hospital PACS or electronic medical record systems, enabling automated risk scoring that generates real-time clinical alerts without additional physician workload. Furthermore, the visualization tool we developed in the form of a nomogram can be further optimized and transformed into a clinical decision support application, allowing physicians to rapidly and intuitively estimate the risk of conservative treatment failure, thereby enhancing the model’s clinical applicability and facilitating its translation into individualized patient care.
Conclusion
In conclusion, this study demonstrates that a comprehensive model based on CT and MRI radiomics can accurately predict conservative treatment failure risk in postmenopausal women with OVCF. This model fully utilizes multimodal information including CT and MRI, combined with clinical risk factors, enabling early identification of patients with poor prognosis. Through nomogram presentation, the model transforms complex radiomics analysis results into a clear and intuitive clinical decision tool, helping physicians develop individualized treatment plans.
Footnotes
Ethical Considerations
This study was conducted in accordance with the ethical principles of the Declaration of Helsinki. The study was reviewed and approved by the Ethics Committee of Peking University People’s Hospital.
Consent to Participate
Consent to Participate declaration are obtained from patients.
Author contribution
Bin Zheng AND Panfeng Yu: contributed equally to this work. Conception and design: Bin Zheng.; Acquisition of data: Panfeng Yu and Ke Ma; Analysis and interpretation of data: Zhenqi Zhu and Yan Liang.; Drafting the article: Bin Zheng AND Haiying Liu; Critically revising the article: Haiying Liu.; All authors have read and agreed to the published version of the manuscript.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the fund of Peking University People's Hospital (RDY2021-09), the fund of Peking University People's Hospital (2023HQ05), Peking University People's Hospital (X021107), and the fund of National Orthopedics and Sports Rehabilitation Clinical Research Center, cultivation project “Application and promotion of pressure sensor assisted craniopelvic spinal traction device” (2021-NCRC-CXJJ-PY-38), Horizontal Project of Peking University People's Hospital (grant number 2022-Z-09), Major Health Special Project of the Ministry of Finance of China (Grant number 2127000432), and Major Health Special Project of the Ministry of Finance of China (Grant number 2127000349).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
