Sage Journals: Discover world-class research

Abstract

Study design

Retrospective study.

Objective

Osteoporotic vertebral compression fractures (OVCF) affect postmenopausal women, with 30-40% requiring surgical intervention after conservative treatment failure. This study developed a CT and MRI radiomics-based model to predict conservative treatment failure risk.

Methods

We retrospectively analyzed 154 postmenopausal women with OVCF (2016-2024), divided into successful (n = 86) and failed (n = 68) conservative treatment groups. Three-dimensional regions of interest were delineated, and quantitative features extracted using PyRadiomics. Feature selection employed Mann-Whitney U test, Spearman correlation, and LASSO regression. Clinical, radiomics, and combined models were constructed using eight machine learning algorithms with 5-fold cross-validation.

Results

Age and vertebral CT Hounsfield units were significant clinical predictors. From 3668 initial features, 16 key radiomics features were selected. LightGBM performed best for clinical models, while k-nearest neighbors excelled for radiomics models. In testing, the clinical model achieved AUC 0.684 (accuracy 0.71), radiomics model AUC 0.812 (accuracy 0.71), and combined model AUC 0.859 (accuracy 0.806). The combined model significantly outperformed individual models.

Conclusion

The comprehensive CT and MRI radiomics-based model accurately predicts conservative treatment failure risk in postmenopausal women with OVCF. This tool enables early identification of high-risk patients and supports individualized treatment decisions, potentially guiding early surgical intervention for predicted high-risk cases.

Keywords

osteoporotic vertebral compression fracture radiomics conservative treatment postmenopausal women machine learning

Introduction

Osteoporotic vertebral compression fracture (OVCF) is a fragility fracture resulting from decreased vertebral bone density and quality due to osteoporosis, which can occur under minimal or even no apparent external force. OVCF is the most common type of osteoporotic fracture, clinically manifesting primarily as thoracolumbar pain.^1-3 Postmenopausal women are particularly susceptible to OVCF due to bone loss secondary to estrogen deficiency.⁴ OVCF causes severe harm to patients: vertebral compression and collapse can lead to height loss and kyphotic deformity, subsequently causing chronic back pain, cardiopulmonary dysfunction, and gastrointestinal disorders. Due to delayed bone healing processes, OVCF patients have increased rates of delayed union or pseudarthrosis; prolonged bed rest can lead to further bone loss, exacerbating osteoporosis and inducing complications such as thrombosis, pneumonia, and urinary infections.^5-7

Treatment for OVCF includes conservative and surgical approaches. For acute OVCF, clinicians typically first choose non-surgical conservative measures, including bed rest, analgesics, external bracing, and anti-osteoporotic medications.⁸ However, conservative treatment has notable limitations: it cannot correct vertebral deformity, often leaving patients with persistent back pain⁹; additionally, prolonged bed rest may cause cardiopulmonary decline, venous thrombosis, and other complications, increasing patient suffering and family burden. More importantly, in patients with severe osteoporosis, conservative treatment outcomes are often unpredictable. Some patients experience nonunion, continued vertebral collapse, progressive kyphosis, or even neural compression despite standard conservative treatment, ultimately requiring surgery. Studies show that approximately 40-45% of OVCF patients fail conservative treatment and require subsequent surgical intervention.^10,11 Therefore, early identification of postmenopausal OVCF patients likely to fail conservative treatment and require aggressive surgery is a pressing clinical issue.

Previous literature has identified several risk factors associated with conservative treatment failure. Clinically, advanced age and low bone mineral density (BMD) are significantly associated with OVCF conservative treatment failure. Systematic reviews indicate that MRI findings are often closely related to conservative treatment failure; if imaging features could predict prognosis early in the fracture course, more aggressive treatments such as early surgery could be considered.^10,12,13 However, single clinical or imaging indicators often cannot accurately predict individual patient conservative treatment outcomes.

The development of radiomics technology provides new solutions to this challenge. Radiomics quantitatively extracts massive features from medical images, mining potential patterns in image texture, intensity, and shape information, serving as a powerful tool for disease diagnosis and prognosis assessment. Unlike traditional subjective image interpretation, radiomics can extract hundreds to thousands of features from CT, MRI, and other images, including one-, two-, and three-dimensional features, maximizing the exploration of deep information in imaging data.^14-16 Recent orthopedic research has also demonstrated unique advantages of radiomics: one study used MRI radiomics models to predict new vertebral fracture risk after vertebroplasty, proving the feasibility of radiomics in fracture risk assessment.¹⁷ Another study showed that CT radiomics could identify high-risk vertebrae more accurately than traditional HU models, predicting OVCF risk in postmenopausal women.¹⁸ These findings suggest great potential for applying radiomics to OVCF outcome prediction.

Based on this background, this study proposes combining CT and MRI radiomics to construct a predictive model for identifying postmenopausal women with OVCF at risk of conservative treatment failure at initial presentation through quantitative analysis of fractured vertebral CT and MRI images. We developed a radiomics signature and established a comprehensive predictive model (nomogram) combining clinical risk factors to evaluate its predictive efficacy and clinical utility for OVCF conservative treatment failure.

Materials and Methods

Patient Selection and Grouping

This study included postmenopausal women with osteoporotic vertebral compression fractures treated at our institution from January 2016 to December 2024. The research conducted has been performed in accordance with the Declaration of Helsinki. The study has been approved by Peking University People’s Hospital and all participants have signed the consent informs. The patients selection process is shown in Figure 1.

Figure 1.

Flowgram of Patients Selection

Inclusion Criteria:1.Postmenopausal women aged >50 years. 2. Newly diagnosed, single-level OVCF confirmed by X-ray, CT, and MRI. 3. Symptom onset and imaging diagnosis within 4 weeks to ensure fracture acuity and comparability. 4. First-time conservative treatment, including bed rest, analgesics, bracing, and anti-osteoporotic medication.

Exclusion Criteria: 1. Secondary osteoporosis due to other causes, such as long-term glucocorticoid use, endocrine or metabolic disorders (eg, hyperparathyroidism, thyroid disease, Cushing’s syndrome), chronic kidney disease, or malignancy-related bone loss. 2. Pathological fractures (tumor, infection, or metastatic disease). 3. Fractures associated with spinal cord or nerve injury requiring emergency surgical intervention. 4. Previous spinal surgery. 5. Multiple-level vertebral compression fractures at initial diagnosis. 6. Incomplete imaging data (CT or MRI not available). 7. Loss of follow-up.

Patients were divided into two groups based on conservative treatment outcomes: successful conservative treatment group (satisfactory symptom relief, gradual fracture healing, no surgery required) and failed conservative treatment group (converted to surgery within 3 months due to uncontrollable pain or fracture nonunion progression). Criteria for conservative treatment failure included: intolerable pain despite adequate analgesia, or fracture nonunion with pseudarthrosis after bed rest, further vertebral collapse, progressive kyphosis, or even neural compression symptoms, resulting in severe limitation of daily activities. All patients provided informed consent at enrollment.

Imaging Examination and Image Acquisition

The radiomics model construction flowgram is shown in Figure 2.

Figure 2.

Radiomics Model Construction Workflow. Schematic Diagram Illustrating the Comprehensive Radiomics Analysis Pipeline, including Image Acquisition (CT and MRI), Three-Dimensional Region of Interest (ROI) Segmentation Using ITK-SNAP, High-Throughput Feature Extraction via PyRadiomics, Feature Selection Through Statistical Filtering and LASSO Regression, and Model Development Using Machine Learning Algorithms With 5-Fold Cross-Validation

All patients underwent CT and MRI examinations of the fractured vertebra at initial presentation. CT was performed using multi-slice spiral CT with thin-slice scanning. MRI was completed on 3T scanners, including sagittal T1-weighted (T1WI), T2-weighted (T2WI), and short tau inversion recovery (STIR) sequences.

To ensure consistency of imaging data across different scanners and acquisition parameters, all CT and MRI images undergo three-dimensional resampling and intensity normalization before radiomics feature extraction: (1) CT images: All CT scans are resampled to an isotropic voxel size of 1.0 × 1.0 × 1.0 mm³ using bilinear interpolation to minimize the effects of slice thickness and pixel spacing variability. Gray values are clipped to the range of [−200, 1200] Hounsfield units (HU) and z-score normalized within the region of interest (ROI) to reduce scanner- and kernel-related intensity differences. (2)MRI images (T2-weighted): All T2-weighted images first undergo N4 bias field correction to reduce intensity non-uniformity. They are then resampled to an isotropic voxel size of 1.0 × 1.0 × 1.0 mm³ using cubic spline interpolation. Finally, gray-level intensity values are normalized (mean = 0, standard deviation = 1) to ensure comparability of signal distributions across patients and scanners.

Subsequently, two experienced spine radiologists used ITK-SNAP to delineate three-dimensional regions of interest (ROI) of the fractured vertebra. The ROI encompassed the entire vertebral body cancellous bone region (from superior to inferior endplate), including fracture clefts and surrounding bone marrow edema areas where possible, but excluding posterior vertebral bony appendages. To assess segmentation consistency, 20 patients were randomly selected for ROI re-segmentation by a third radiologist to calculate feature reproducibility (eg, ICC consistency evaluation); results showed most features had ICC>0.9, indicating good reproducibility, with only highly reproducible features retained for subsequent analysis.

Radiomics Feature Extraction

After ROI delineation, we used the Python PyRadiomics toolkit to extract high-throughput quantitative features from each patient’s CT and MRI images. The handcrafted features can be divided into three groups: (I) geometry, (II) intensity, and (III) texture. The geometry features describe the three-dimensional shape characteristics of the tumor. The intensity features describe the first-order statistical distribution of the voxel intensities within the tumor. The texture features describe the patterns, or the second- and high-order spatial distributions of the intensities. Here the texture features are extracted using several different methods, including the gray-level co-occurrence matrix (GLCM), gray-level run length matrix (GLRLM), gray level size zone matrix (GLSZM), and neighborhood gray-tone difference matrix (NGTDM) methods.

Feature Selection

Statistics: We conducted Mann-Whitney U test statistical testing and feature screening for all radiomic features. Only radiomic features with P-value<0.05 were retained.

Correlation: For features with high repeatability, Spearman’s rank correlation coefficient was used to calculate correlations between features (Figure 1. Spearman correlation of each feature), retaining one feature when the correlation coefficient between any two features exceeded 0.9. To maximize feature representation capability, we used a greedy recursive deletion strategy for feature filtering, deleting the most redundant feature in the current set each time. After this process, 23 features were retained.

LASSO: The least absolute shrinkage and selection operator (LASSO) regression model was used on the discovery dataset for signature construction. Depending on the regulation weight λ, LASSO shrinks all regression coefficients toward zero and sets coefficients of many irrelevant features to exactly zero. To find optimal λ, 10-fold cross-validation with minimum criteria was employed, where the final λ value yielded minimum cross-validation error. The retained features with nonzero coefficients were used for regression model fitting and combined into a radiomics signature. Subsequently, we obtained a radiomics score for each patient through linear combination of retained features weighted by their model coefficients. The Python scikit-learn package was used for LASSO regression modeling.

Radiomics Signature

After LASSO feature screening, we input the final features into machine learning models including logistic regression, SVM, random forest, XGBoost, and others for risk model construction. We adopted 5-fold cross-validation to obtain the final radiomics signature.

Furthermore, to intuitively and efficiently assess the incremental prognostic value of the radiomics signature to clinical risk factors, a radiomics nomogram was presented on the validation dataset. The nomogram combined the radiomics signature and clinical risk factors based on logistic regression analysis.

Clinical Signature

The clinical signature building process was nearly identical to the radiomics signature. Features for building the clinical signature were selected by baseline statistics with P-value<0.05. We used the same machine learning models as in the radiomics signature building process. 5-Fold cross-validation and test cohort were fixed for fair comparison.

Radiomics Nomogram

The radiomics nomogram was established combining the radiomics signature and clinical signature. Diagnostic efficacy of the radiomics nomogram was tested in the test cohort, with ROC curves drawn to evaluate diagnostic efficacy.

Results

Clinical Baseline Characteristics

This study included 154 postmenopausal women with OVCF, including 83 lumbar and 71 thoracic fractures. There were 68 patients in the conservative treatment failure group and 86 in the successful group. Significant differences were found between groups in age and bone mineral density HU (P < 0.05); the failure group had higher mean age and lower HU. Detailed clinical characteristic comparisons are shown in Table 1.

Table 1.

The Comparison of Two Groups’ Characteristics (Success and Failure)

	Success	Failure
Sample size	86	68
VAS	5.76 ± 1.084	5.57 ± 1.273	P = 0.17
Age	70.14 ± 3.717	74.35 ± 6.662	P < 0.001
BMI	20.87 ± 1.80	20.89 ± 1.68	P = 0.46
Housefield unit	102.64 ± 4.43	96.38 ± 9.44	P < 0.001
Facture segments(Lumbar/Thorac)	50/36	33/35	P = 0.235

Table 2 summarizes the characteristic of patients in train set and test set. There are no differences between all characteristic details, which indicates comparability between train and test set.

Table 2.

The Comparison of Train and Test Set

	Train	Test
Sample size	123	31
Outcomes (Success/Failure)	70/53	16/15	P = 0.596
VAS	5.70 ± 1.14	5.61 ± 1.283	P = 0.61
Age	72.15 ± 5.761	71.10 ± 4.756	P = 0.2
BMI	20.88 ± 1.74	20.91 ± 1.80	P = 0.378
Housefield unit	99.89 ± 7.20	100.33 ± 9.34	P = 0.091
Facture segments (Lumbar/Thorac)	66/57	17/14	P = 0.906

Clinical Model Development and Results

Differential factors including age and fracture level CT Hounsfield units were incorporated. After applying machine learning techniques, a multifactor composite clinical predictive model was established. Table 3 summarizes the performance of eight machine learning algorithms in training and test sets. Figure 3 summarizes the performance of eight algorithms in the test set. Among the eight machine learning algorithms, LightGBM performed best with an AUC of 0.684 and accuracy of 0.71 in the test set.

Table 3.

Comparisons of the Prediction Performance of the Eight Machine Learning Algorithms in Clinical Models

	model_name	Accuracy	AUC	Sensitivity	Specificity	PPV	NPV	Precision	Recall	F1	Threshold	Task
0	LR	0.756	0.790	0.473	0.985	0.963	0.698	0.963	0.473	0.634	0.657	Train
1	LR	0.581	0.613	1.000	0.278	0.500	1.000	0.500	1.000	0.667	0.124	Test
2	SVM	0.764	0.805	0.655	0.853	0. 783	0. 753	0.783	0.655	0.713	0.414	Train
3	SVM	0.710	0.528	0.385	0.944	0.833	0.680	0.833	0.385	0.526	0.478	Test
4	KNN	0.748	0.868	0.909	0.618	0.658	0.894	0.658	0.909	0.763	0.400	Train
5	KNN	0.710	0.643	0.462	0.889	0.750	0.696	0.750	0.462	0.571	0.600	Test
6	RandomForest	0.976	0.997	0.964	0.985	0.981	0.971	0.981	0.964	0.972	0.600	Train
7	RandomForest	0.645	0.671	0.692	0.611	0.562	0.733	0.562	0.692	0.621	0.500	Test
8	ExtraTrees	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	Train
9	ExtraTrees	0.677	0.650	0.769	0.611	0.588	0.786	0.588	0.769	0.667	0.300	Test
10	XGBoost	0.894	0.959	0.855	0.926	0.904	0.887	0.904	0.855	0.879	0.437	Train
11	XGBoost	0.710	0.667	0.385	0.944	0.833	0.680	0.833	0.385	0.526	0.702	Test
12	LightGBM	0.756	0.818	0.727	0.7 79	0.727	0.779	0.727	0.727	0.727	0.446	Train
13	LightGBM	0.710	0.684	0.385	0.944	0.833	0.680	0.833	0.385	0.526	0.602	Test
14	MLP	0.748	0.790	0.618	0.853	0.773	0.734	0.773	0.618	0.687	0.226	Train
15	MLP	0.581	0.618	1.000	0.278	0.500	1.000	0.500	1.000	0.667	0.118	Test

Figure 3.

Clinical Model Performance Comparison. Receiver Operating Characteristic (ROC) Curves Comparing the Performance of Eight Machine Learning Algorithms for the Clinical Model in the Test Set

Radiomics Feature Selection Results

Initially, approximately 3668 features were extracted from each patient’s CT and MRI images, with feature distribution shown in Figure 4A. After initial filtering based on P-values, 213 features were identified (Figure 4B). Subsequently reduced to 94 based on Spearman correlation coefficients. The LASSO regression process is shown in Figure 5A and B. The LASSO regression procedure ultimately reduced the feature count to only 16 radiomics features with independent predictive significance (5 CT, 11 MRI). Feature weights for the 16 features are shown in Figure 6.

Figure 4.

Distribution of Extracted Radiomics Features. (A) Pie Chart Showing the Proportion of Different Radiomics Feature Categories Extracted From CT and MRI Images, including First-Order Statistics (19.6%), Shape Features (0.8%), and Texture Features: GLCM (24.0%), GLRLM (17.4%), GLSZM (17.4%), GLDM (15.3%), and NGTDM (5.5%). (B) Violin Plots Displaying the P-value Distribution of Each Feature Category after Mann-Whitney U Test Statistical Screening, With Dashed Line Indicating P = 0.05 Significance Threshold

Figure 5.

LASSO Regression Feature Selection Process. (A) LASSO Coefficient Paths Showing Feature Coefficient Changes as Lambda Increases. The Vertical Dashed Line Indicates the Optimal Lambda Value (λ = 0.0450) Selected by 10-Fold Cross-Validation. (B) Mean Squared Error (MSE) Plot From 10-Fold Cross-Validation for Lambda Selection, With Error Bars Representing Standard Deviation. The Optimal Lambda Minimizes Cross-Validation Error while Maintaining Model Parsimony

Figure 6.

Feature Importance of the 16 Selected Radiomics Features. Horizontal bar Chart Displaying the Coefficients of Radiomics Features Retained after LASSO Regression. Features Are Derived From Both CT (n = 5) and MRI (n = 11) Images, including Wavelet Transforms, Texture Features (GLCM, GLRLM, GLSZM), and Shape Descriptors. Higher Coefficient Values Indicate Greater Contribution to Predicting Conservative Treatment Failure

Radiomics Model Development

Using the 16 radiomics features with machine learning techniques, a multifactor composite predictive model was established. Table 4 summarizes the performance of eight machine learning algorithms in training and test sets. Figure 7 summarizes the performance of eight algorithms in the test set. Among the eight machine learning algorithms, KNN performed best with an AUC of 0.812 and accuracy of 0.71 in the test set.

Table 4.

Comparisons of the Prediction Performance of the Eight Machine Learning Algorithms in Radiomics Models

	model_name	Accuracy	AUC	Sensitivity	Specificity	PPV	NPV	Precision	Recall	F1	Threshold	Task
0	SVM	0.862	0.897	0.891	0.838	0.817	0.905	0.817	0.891	0.852	0.406	Train
1	SVM	0.710	0.731	0.692	0.722	0.643	0.765	0.643	0.692	0.667	0.467	Test
2	KNN	0.748	0.844	0.818	0.691	0.682	0.825	0.682	0.818	0.744	0.600	Train
3	KNN	0.710	0.812	0.846	0.611	0.611	0.846	0.611	0.846	0.710	0.600	Test
4	RandomForest	0.854	0.922	0.782	0.912	0.878	0.838	0.878	0.782	0.827	0.460	Train
5	RandomForest	0.677	0.598	0.692	0.667	0.600	0.750	0.600	0.692	0.643	0.377	Test
6	ExtraTrees	0.797	0.864	0.909	0.706	0.714	0.906	0.714	0.909	0.800	0.437	Train
7	ExtraTrees	0.742	0.714	0.692	0.778	0.692	0.7 78	0.692	0.692	0.692	0.456	Test
8	XGBoost	0.951	0.992	0.982	0.926	0.915	0.984	0.915	0.982	0.947	0.383	Train
9	XGBoost	0.710	0.632	0.538	0.833	0.700	0.714	0.700	0.538	0.609	0.542	Test
10	LightGBM	0.829	0.903	0.818	0.838	0.804	0.851	0.804	0.818	0.811	0.434	Train
11	LightGBM	0.645	0.543	0.385	0.833	0.625	0.652	0.625	0.385	0.476	0.612	Test
12	MLP	0.797	0.865	0.855	0.750	0.734	0.864	0.734	0.855	0.790	0.406	Train
13	MLP	0.710	0.791	1.000	0.500	0.591	1.000	0.591	1.000	0.743	0.327	Test
14	SVM	0.862	0.897	0.891	0.838	0.817	0.905	0.817	0.891	0.852	0.406	Train
15	SVM	0.710	0.731	0.692	0.722	0.643	0.765	0.643	0.692	0.667	0.467	Test

Figure 7.

Radiomics Model Performance Comparison. ROC Curves Comparing Eight Machine Learning Algorithms for the Radiomics Model in the Test Set. K-Nearest Neighbors (KNN) Achieved the Best Performance With an AUC of 0.812

Combined Model Development

Subsequently, the two models were integrated to establish a nomogram model. In the training cohort, the nomogram model achieved an accuracy of 0.894 and AUC of 0.935. In the test cohort, the nomogram model achieved an accuracy of 0.806 and AUC of 0.859. Table 5 and Figure 8 summarize the performance of clinical, radiomics, and combined models in training and test sets. A nomogram was constructed based on the radiomics-clinical model (Figure 9).

Table 5.

Comparative Performance of Clinical, Radiomics, and Combined Models

Model-name	Accuracy	AUC	Sensitivity	Specificity	PPV	NPV	Precision	Recall	F1	Threshold	Task
Clinic signature	0.756	0.818	0.727	0.779	0.727	0.7 79	0.727	0.727	0.727	0.446	Train
Rad signature	0.748	0.844	0.818	0.691	0.682	0.825	0.682	0.818	0.744	0.600	Train
Nomogram	0.894	0.935	0.818	0.956	0.937	0.867	0.937	0.818	0.874	0.545	Train
Clinic signature	0.710	0.684	0.385	0.944	0.833	0.680	0.833	0.385	0.526	0.602	Test
Rad signature	0.710	0.812	0.846	0.611	0.611	0.846	0.611	0.846	0.710	0.600	Test
Nomogram	0.806	0.859	0.846	0.778	0.733	0.875	0.’ 733	0.846	0.786	0.429	Test

Figure 8.

Comparison of Clinical, Radiomics, and Combined Model Performance. (A) ROC Curves in the Training Cohort Showing the Combined Nomogram Model (AUC 0.935, 95% CI 0.892-0.978) Outperforming Both the Radiomics Model (AUC 0.844, 95% CI 0.778-0.909) and Clinical Model (AUC 0.818, 95% CI 0.743-0.893). (B) ROC Curves in the Test Cohort Demonstrating Sustained Superior Performance of the Combined Model (AUC 0.859, 95% CI 0.717-1.000) Compared to Individual Models

Figure 9.

Nomogram for Predicting Conservative Treatment Failure Risk. Clinical Decision Support Tool Integrating Clinical Signature (Clinic_Sig) and Radiomics Signature (Rad_Sig) to Calculate Individualized Risk Probability

Discussion

In this study, we constructed clinical feature, radiomics, and combined models to predict conservative treatment failure risk in postmenopausal women with osteoporotic vertebral compression fractures (OVCF). The results demonstrated that the combined model showed optimal discriminative performance in the training set with an AUC of 0.935, significantly superior to the radiomics model (AUC 0.844) and clinical model (AUC 0.818). In the test set, the combined model similarly exhibited the highest predictive capability (AUC 0.859), significantly exceeding the radiomics model (0.812) and clinical model (0.684).

In our clinical model, vertebral CT Hounsfield units (HU) and age played key roles together. Vertebral CT Hounsfield units, as a direct quantitative indicator of bone density, demonstrated significant predictive value in our model. Low HU values indicate significantly decreased trabecular density and cortical thinning within the vertebra, directly reflecting osteoporosis severity.¹⁹ From a biomechanical perspective, low HU values suggest severely insufficient mechanical reserves in the vertebra.²⁰ After compression fracture occurs, the internal trabecular structure is severely damaged, lacking sufficient bone matrix to support the healing process.²¹ Advanced age, as another important predictor, has multifaceted effects. First, with aging, the duration of estrogen deficiency in postmenopausal women extends, exacerbating bone loss. Second, elderly patients have decreased systemic physiological reserves, affecting multiple aspects of fracture healing. More importantly, elderly patients have significantly reduced tolerance for prolonged bed rest.²² Conservative treatment typically requires strict bed rest for 2-3 months, but prolonged bed rest in elderly patients easily leads to complications including deep vein thrombosis, pneumonia, urinary tract infections, and muscle atrophy. These complications not only increase patient suffering but may further impair fracture healing, creating a vicious cycle.²³

Among CT-derived predictors, Small Area Low Gray Level Emphasis (SALGLE) and Low Gray Level Zone Emphasis (LGLZE) carried the strongest positive weights (Figure 6). These features indicate clustering of low-attenuation trabecular regions, which pathologically correspond to disrupted trabecular networks, poor mineralized matrix, and replacement of bone marrow space with fibrous tissue. Such microstructural fragility impairs mechanical stability and predisposes to nonunion or progressive collapse. Similar associations between CT-derived low-density texture patterns and reduced bone strength have been demonstrated in previous studies, where radiomics captured trabecular heterogeneity and correlated with histomorphometric indices of osteoporosis.^24,25 Conversely, protective CT features with negative coefficients (eg, Gray Level Nonuniformity) reflect more uniform high-density trabecular distribution, histologically paralleling preserved bone connectivity and mature callus consolidation, which favor fracture healing.²⁶

MRI-derived predictors, particularly High Gray Level Run Emphasis (HGLRE) and Zone Variance from T2-weighted images, showed strong positive contributions to failure. These features represent heterogeneous hyperintense marrow signals, pathologically reflecting persistent bone marrow edema, inflammatory infiltration, and impaired microvascularization. Such changes delay osteoblast recruitment and mineralized callus formation, consistent with poor prognosis. Prior MRI-histology correlation studies confirm that extensive marrow edema corresponds to increased osteoclastic activity, incomplete bone bridging, and delayed healing.²⁷ Two MRI-derived radiomics features, square_girim_ShortRunEmphasis_MR and Ibp_3D_ml_gldm_SmallDependenceHighGrayLevelEmphasis_MR, exhibited negative coefficients in our model, suggesting a protective effect. Short Run Emphasis (SRE) quantifies the predominance of short gray-level runs within the ROI; higher SRE values indicate more fine, fragmented texture. In T2-weighted images, this pattern may correspond to localized, patchy but organized marrow signal changes, reflecting reversible edema or active reparative processes rather than destructive pathology. Similarly, Small Dependence High Gray Level Emphasis (SDHGLE) measures the occurrence of small-scale, high-intensity dependence clusters. Elevated SDHGLE values suggest that hyperintense marrow signals are confined to limited foci rather than diffusely distributed, which may indicate a controlled and spatially restricted inflammatory response.^28-30 Prior studies have shown that bone marrow lesions can present with heterogeneous imaging features, where localized edema or fibrosis may coexist with ongoing remodeling.^31,32

Taken together, the Rad-score histogram in Figure 6 reflects this dual process: CT features quantify structural fragility of trabeculae (“hard tissue”), while MRI features capture marrow and vascular pathology (“soft tissue”). High Rad-scores correspond to histological patterns of trabecular disruption and persistent inflammatory marrow changes, whereas low Rad-scores reflect intact trabecular continuity and reparative marrow remodeling. These findings support the concept that radiomics signatures act as non-invasive surrogates of the underlying histopathological determinants of fracture healing vs failure.

Our findings align with trends reported in previous literature, demonstrating the important value of radiomics in fracture risk assessment and treatment decision-making. Yu et al¹⁸ showed that lumbar CT-based radiomics models could more accurately identify high-risk vertebrae in postmenopausal women, with OVCF risk prediction superior to traditional models (test set AUC 0.914).¹⁸ Additionally, Wang et al constructed machine learning models integrating CT radiomics features and clinical factors to predict subsequent fracture risk after vertebroplasty, with their final radiomics-clinical nomogram achieving satisfactory predictive performance in external validation (AUC approximately 0.88). ²⁰These studies indicate that radiomics can mine bone structural information difficult to quantify visually in medical images, enhancing prediction of fracture occurrence and outcomes. In contrast, radiomics analyses for OVCF conservative treatment prognosis remain relatively limited. This study fills this gap, with our results demonstrating the feasibility and potential clinical value of applying radiomics to fracture conservative treatment outcome prediction. Radiomics models promise to complement clinical assessment, early identifying high-risk patients likely to fail conservative therapy, thereby helping physicians optimize treatment decisions and improve individualized patient management.

Despite encouraging results, this study has several limitations. First, this was a single-center retrospective analysis with relatively limited sample size (n = 154), potentially subject to selection bias, with model parameter generalizability to other populations yet to be tested. Second, radiomics features are high-dimensional and numerous; although we employed LASSO and other methods for dimension reduction, the model still has potential for overfitting. Performance differences between training and test sets for some machine learning algorithms suggest possible overfitting to training data. Third, this study lacks independent validation from external cohorts; model generalizability and stability have not been tested with multicenter data. Applicability across different hospitals and scanning equipment requires further evaluation. Fourth, model construction relies on patients undergoing both CT and MRI examinations, increasing clinical implementation difficulty and cost, as not all patients routinely undergo dual-modality imaging in practice,which may limit feasibility and accessibility in resource-limited settings. Future studies should explore simplified single-modality models or more widely applicable alternatives to enhance clinical utility and adoption. Finally, the predictive framework is not yet integrated in real time with the hospital PACS system, requiring additional data import and processing steps that may hinder clinical usability.

Future research should focus on validating the proposed model in large-scale, multicenter prospective studies to ensure its stability and generalizability across different institutions, imaging devices, and patient populations. In parallel, efforts should be made to integrate the model seamlessly into hospital PACS or electronic medical record systems, enabling automated risk scoring that generates real-time clinical alerts without additional physician workload. Furthermore, the visualization tool we developed in the form of a nomogram can be further optimized and transformed into a clinical decision support application, allowing physicians to rapidly and intuitively estimate the risk of conservative treatment failure, thereby enhancing the model’s clinical applicability and facilitating its translation into individualized patient care.

Conclusion

In conclusion, this study demonstrates that a comprehensive model based on CT and MRI radiomics can accurately predict conservative treatment failure risk in postmenopausal women with OVCF. This model fully utilizes multimodal information including CT and MRI, combined with clinical risk factors, enabling early identification of patients with poor prognosis. Through nomogram presentation, the model transforms complex radiomics analysis results into a clear and intuitive clinical decision tool, helping physicians develop individualized treatment plans.

Footnotes

ORCID iDs

Zhenqi Zhu

Yan Liang

Haiying Liu

Ethical Considerations

This study was conducted in accordance with the ethical principles of the Declaration of Helsinki. The study was reviewed and approved by the Ethics Committee of Peking University People’s Hospital.

Consent to Participate

Consent to Participate declaration are obtained from patients.

Author contribution

Bin Zheng AND Panfeng Yu: contributed equally to this work. Conception and design: Bin Zheng.; Acquisition of data: Panfeng Yu and Ke Ma; Analysis and interpretation of data: Zhenqi Zhu and Yan Liang.; Drafting the article: Bin Zheng AND Haiying Liu; Critically revising the article: Haiying Liu.; All authors have read and agreed to the published version of the manuscript.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the fund of Peking University People's Hospital (RDY2021-09), the fund of Peking University People's Hospital (2023HQ05), Peking University People's Hospital (X021107), and the fund of National Orthopedics and Sports Rehabilitation Clinical Research Center, cultivation project “Application and promotion of pressure sensor assisted craniopelvic spinal traction device” (2021-NCRC-CXJJ-PY-38), Horizontal Project of Peking University People's Hospital (grant number 2022-Z-09), Major Health Special Project of the Ministry of Finance of China (Grant number 2127000432), and Major Health Special Project of the Ministry of Finance of China (Grant number 2127000349).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Dai

Liang

Zhang

Dong

Zhou

. Risk factors of vertebral re-fracture after PVP or PKP for osteoporotic vertebral compression fractures, especially in Eastern Asia: a systematic review and meta-analysis. J Orthop Surg Res. 2022;17(1):161. doi:10.1186/s13018-022-03038-z

Patel

Liu

Ebraheim

. Managements of osteoporotic vertebral compression fractures: a narrative review. World J Orthoped. 2022;13(6):564-573. doi:10.5312/wjo.v13.i6.564

Prost

Pesenti

Fuentes

Tropiano

Blondel

. Treatment of osteoporotic vertebral fractures. Orthop Traumatol Surg Res. 2021;107(1s):102779. doi:10.1016/j.otsr.2020.102779

Bhatnagar

Kekatpure

. Postmenopausal osteoporosis: a literature review. Cureus. 2022;14(9):e29367.

Bowers

Anderson

. Delayed union and nonunion: current concepts, prevention, and correction: a review. Bioengineering. 2024;11(6):525.

Liao

Liu

Jiang

Dai

Wang

. Risk factors for nonunion of osteoporotic vertebral compression fracture: a case‒control study. BMC Muscoskelet Disord. 2024;25(1):295.

Adamska

Modzelewski

Stolarczyk

Kseniuk

. Is Kummell’s disease a misdiagnosed and/or an underreported complication of osteoporotic vertebral compression fractures? A pattern of the condition and available treatment modalities. J Clin Med. 2021;10(12):2584.

Alimy

A-R

Anastasilakis

Carey

, et al. Conservative treatments in the management of acute painful vertebral compression fractures: a systematic review and network meta-analysis. JAMA Netw Open. 2024;7(9):e2432041.

Nicol

Verdaguer

Daste

, et al. Chronic low back pain: a narrative review of recent international guidelines for diagnosis and conservative treatment. J Clin Med. 2023;12(4):1685.

10.

Zhang

Fan

Hao

. Risk factors for conservative treatment failure in acute osteoporotic vertebral compression fractures (OVCFs). Arch Osteoporosis. 2019;14(1):24. doi:10.1007/s11657-019-0563-8

11.

Lee

Park

Lee

Suh

Hong

. Comparative analysis of clinical outcomes in patients with osteoporotic vertebral compression fractures (OVCFs): conservative treatment versus balloon kyphoplasty. Spine J. 2012;12(11):998-1005. doi:10.1016/j.spinee.2012.08.024

12.

McCarthy

Davis

. Diagnosis and management of vertebral compression fractures. Am Fam Physician. 2016;94(1):44-50.

13.

Muratore

Ferrera

Masse

Bistolfi

. Osteoporotic vertebral fractures: predictive factors for conservative treatment failure. A systematic review. Eur Spine J. 2018;27(10):2565-2576. doi:10.1007/s00586-017-5340-z

14.

Rastegar

Vaziri

Qasempour

, et al. Radiomics for classification of bone mineral loss: a machine learning study. Diagn Interv Imaging. 2020;101(9):599-610.

15.

Gitto

Cuocolo

Albano

, et al. CT and MRI radiomics of bone and soft-tissue sarcomas: a systematic review of reproducibility and validation strategies. Insights Imaging. 2021;12(1):68.

16.

Gitto

Cuocolo

Huisman

, et al. CT and MRI radiomics of bone and soft-tissue sarcomas: an updated systematic review of reproducibility and validation strategies. Insights Imaging. 2024;15(1):54.

17.

Cai

Shen

Yang

, et al. MRI-Based radiomics assessment of the imminent new vertebral fracture after vertebral augmentation. Eur Spine J. 2023;32(11):3892-3905.

18.

Chen

, et al. Radiomics based on lumbar CT to identify high-risk patients for OVCF in postmenopausal women. Front Aging. 2025;6:1472060.

19.

Zou

Deng

. The use of CT hounsfield unit values to identify the undiagnosed spinal osteoporosis in patients with lumbar degenerative diseases. Eur Spine J. 2019;28(8):1758-1766.

20.

Wang

Zou

Sun

Wang

Ding

. Hounsfield unit for assessing vertebral bone quality and asymmetrical vertebral degeneration in degenerative lumbar scoliosis. Spine. 2020;45(22):1559-1566.

21.

Bigham‐Sadegh

Oryan

. Basic concepts regarding fracture healing and the current options and future directions in managing bone fractures. Int Wound J. 2015;12(3):238-247.

22.

Zhang

Fan

Hao

. Risk factors for conservative treatment failure in acute osteoporotic vertebral compression fractures (OVCFs). Arch Osteoporosis. 2019;14(1):24.

23.

Chen

Wang

Qian

Hua

. Correlation between common postoperative complications of prolonged bed rest and quality of life in hospitalized elderly hip fracture patients. Ann Palliat Med. 2020;9(3):1125133.

24.

Shirvaikar

Huang

Dong

. The measurement of bone quality using gray level co-occurrence matrix textural features. J Med Imaging Health Inform. 2016;6(6):1357-1362. doi:10.1166/jmihi.2016.1812

25.

Rastegar

Vaziri

Qasempour

, et al. Radiomics for classification of bone mineral loss: a machine learning study. Diagn Interv Imaging. 2020;101(9):599-610. doi:10.1016/j.diii.2020.01.008

26.

Adarve-Castro

Soria-Utrilla

Castro-García

, et al. Advanced radiomic prediction of osteoporosis in primary hyperparathyroidism: a machine learning-based analysis of CT images. Radiol Med. 2025;130(7):1084-1091. doi:10.1007/s11547-025-02004-z

27.

Kosaka

Maeyama

Nishio

Nabeshima

Yamamoto

. Histopathologic evaluation of bone marrow lesions in early stage subchondral insufficiency fracture of the medial femoral condyle. Int J Clin Exp Pathol. 2021;14(7):819-826.

28.

PyRadiomics GLRLM feature class. Accessed 2025/09/26.https://github.com/Radiomics/pyradiomics/blob/master/radiomics/GLRLM.py

29.

PyRadiomics GLDM feature class. Accessed 2025/09/26.https://github.com/Radiomics/pyradiomics/blob/master/radiomics/GLDM.py

30.

van Griethuysen

JJM

Fedorov

Parmar

, et al. PyRadiomics: open-Source platform for the extraction of radiomic features from medical images. Documentation release 3.0.

31.

Kostopoulos

Boci

Cavouras

, et al. Radiomics texture analysis of bone marrow alterations in MRI knee examinations. J Imaging. 2023;9(11):252. doi:10.3390/jimaging9110252

32.

Chen

Liu

, et al. Radiomics analysis using magnetic resonance imaging of bone marrow edema for diagnosing knee osteoarthritis. Front Bioeng Biotechnol. 2024;12:1368188. doi:10.3389/fbioe.2024.1368188

Predicting Conservative Treatment Failure in Postmenopausal Women With Osteoporotic Vertebral Compression Fractures: A CT and MRI-Based Radiomics Machine Learning Approach

Abstract

Study design

Objective

Methods

Results

Conclusion

Keywords

Introduction

Materials and Methods

Patient Selection and Grouping

Imaging Examination and Image Acquisition

Radiomics Feature Extraction

Feature Selection

Radiomics Signature

Clinical Signature

Radiomics Nomogram

Results

Clinical Baseline Characteristics

Clinical Model Development and Results

Radiomics Feature Selection Results

Radiomics Model Development

Combined Model Development

Discussion

Conclusion

Footnotes

ORCID iDs

Ethical Considerations

Consent to Participate

Author contribution

Funding

Declaration of Conflicting Interests

References