Abstract
Objective
To improve the S.T.O.N.E. nephrolithometry scoring system by applying a factor weighting method derived from odds ratios (ORs) in logistic regression analysis for a more scientific prediction of stone-free rate (SFR) after percutaneous nephrolithotomy (PCNL).
Methods
We conducted a retrospective cohort study of 283 patients undergoing PCNL. The SFR was the primary outcome. Binary logistic regression identified independent predictors among the S.T.O.N.E. components. Statistically significant factors were assigned weighted points based on their ORs. The improved system's performance was compared to the traditional S.T.O.N.E. score using receiver operating characteristic analysis, Net Reclassification Improvement (NRI), and risk stratification based on Youden index-optimized cutoffs.
Results
Stone size, obstruction, and stone density were independent predictors of SFR (all p < .05) and were assigned weighted scores of 2, 3, and 3, respectively, creating a revised score (range 0–11). The improved system demonstrated comparable discriminative ability to the traditional system (AUC: 0.804 vs 0.814, p > .05) but provided significantly superior risk reclassification (NRI = 0.233, p = .007). Using the Youden index-derived cutoff of 5.5, the improved system stratified patients into low-risk (scores 0–5, n = 123) and high-risk (scores 6–11, n = 160) groups. The SFR was 91.9% in the low-risk group versus 51.9% in the high-risk group (p < .001). This low-risk group had a significantly higher SFR than the low-risk group defined by the traditional system (91.9% vs 81.4%, p = .011).
Conclusion
The factor-weighted S.T.O.N.E. scoring system provides enhanced clinical risk stratification for PCNL outcomes. It effectively identifies a distinct cohort of low-risk patients with a > 90% probability of stone-free success, offering improved utility for preoperative patient counseling and surgical planning.
Keywords
Introduction
Percutaneous nephrolithotomy (PCNL) is currently the most commonly used method for treating renal calculus, a common disease in urology. In their guidelines, the European Association of Urology, the American Urology Association, and the Urological Association of Asia recommend PCNL as the standard treatment for renal calculi larger than 2 cm.1–3 Given the advances in technology, even large dehiscent stones can now be cleared using PCNL. Achieving stone-free status with minimal complications is the ideal surgical outcome of PCNL 4 ; therefore, the stone-free rate (SFR) following PCNL is the most noteworthy efficacy index.
Many scholars have successively proposed preoperative stone scoring systems for PCNL, including the size, tract length, obstruction, number of involved calyces, and essence/stone density (S.T.O.N.E.) nephrolithometry score, and the SHAnghai LINgth, Inner structure, Hilar density, (Shanghai Lithometry Index) (SHA.LIN.) score, and Guy's grading.5–7 These systems are mainly used to predict the SFR. Among these systems, the S.T.O.N.E. score has a high degree of acceptance and is supported by a large number of studies.8,9
The S.T.O.N.E. score is widely validated, however, the equal-contribution assumption may limit stone-free prediction. We propose optimization through data-driven weighting. Therefore, this study aimed to refine the S.T.O.N.E. scoring system by applying a factor weighting method, using odds ratios (ORs) derived from logistic regression analysis, to improve its predictive value for SFR after PCNL.
Materials and methods
This study was conducted in accordance with the ethical standards of the Helsinki Declaration of 1975, as revised in 2024. This retrospective study was approved by the Institutional Review Board of Huai'an First Affiliated Hospital of Nanjing Medical University. The requirement for informed consent was waived due to the retrospective nature of the study. All patient data were fully deidentified to ensure patient confidentiality and anonymity. The reporting of this study conforms to the Strengthening the Reporting of Observational Studies in Epidemiology guideline. 10
A total of 354 consecutive patients who underwent PCNL at our hospital between January 2021 and December 2022 were screened, and 283 eligible patients were included.
Inclusion criteria: (1) Adult patients (≥18 years old) who underwent PCNL for renal calculi; (2) Availability of both preoperative and postoperative noncontrast computed tomography (NCCT) scans; (3) Complete clinical and radiological data for S.T.O.N.E. score calculation.
Exclusion criteria: (1) Patients without either preoperative or postoperative NCCT imaging; (2) Pediatric patients (<18 years old); (3) Patients with concurrent urinary tract anomalies (e.g. horseshoe kidney, ectopic kidney); (4) Patients undergoing simultaneous procedures for other urological conditions; (5) Cases with incomplete medical records or missing essential data for analysis.
Observation indices
These indices were (1) the general characteristics of age, sex, height, weight, BMI, and operator; (2) the maximum cross-sectional area of stones, tract length, obstruction, number of involved calyces, and stone density; (3) the stone clearance and occurrence of complications.
Measurement and grading of indicators
The following were the five indicators from the S.T.O.N.E. system:
6
The maximum cross-sectional area of stones (S) was defined as the area of the largest cross-section of the stone in the NCCT images. The area was classified into four grades: 0∼399 mm2, 400∼799 mm2, 800∼1599 mm2, and ≥ 1600 mm2. Tract length (T) was defined as the shortest distance between the dome of the target calyx and the skin taken in the CT cross-section and was divided into two grades: ≤ 100 mm and > 100 mm. Obstruction (O): Two grades—mild hydronephrosis and moderate or severe hydronephrosis—were established. Mild hydronephrosis was defined as dilated renal calyces with normal renal papilla morphology and no obstruction or mild hydronephrosis. Moderate hydronephrosis was defined as dilated renal calyces with loss of renal papilla morphology, and severe hydronephrosis was defined as a balloon-like expansion of the renal calyces with thinning of the renal cortex. Number of involved calyces (N): the three grades were the involvement of one to two calyces; the involvement of three calyces, and complete staghorn-shaped stones. Stone density (E): the area of interest within the largest cross-section of the stone was selected on the CT image and its average CT value was read as the CT value of the stone. Stone density was classified into two grades: ≤ 950 HU and > 950 HU.
Outcome assessment
The primary outcome was the SFR. Stone-free status was rigorously assessed by postoperative NCCT four to six weeks after surgery. It was defined as the absence of any residual fragments or the presence of only clinically insignificant residual fragments < 4 mm in diameter. 2 The presence of fragments ≥4 mm was classified as residual stones. Secondary outcomes included complications, graded according to the Clavien-Dindo classification system. 11
Surgical technique
PCNL puncture channels were all 18F; 265 and 18 patients had single-channel and double-channel punctures, respectively. Double J and nephrostomy tubes were retained after surgery. A constant-speed pressure-limiting pump was used as the water source during surgery. The pump speed was 680 mL/min and the pressure limit was 700 mmHg. The Auriga XL (StarMedTec GmbH,Starnberg,Germany) holmium laser surgery system that was produced on 19 August 2016 was used. During lithotripsy, the core diameter was 600 µm, the lithotripsy mode was used, and the frequency was 12 Hz. The energy of a single pulse was 3500 mJ. All of the patients underwent NCCT scans preoperatively and four to six weeks postoperatively.
Statistical methods and score development
Statistical analyses were performed using SPSS version 26.0 (IBM Corp., Armonk, NY, USA) and R software (R Foundation for Statistical Computing, Vienna, Austria). The measurement information was expressed as mean ± standard deviation, while categorical variables are expressed as frequencies and percentages, and the difference was considered statistically significant at p < .05.
Binary logistic regression analysis was employed to identify independent predictors of stone-free status among the five S.T.O.N.E. components. Variables with a significance level of p < .05 in univariate analysis were included in the multivariate model. ORs with 95% confidence intervals were calculated for significant predictors.
For the revised scoring system, statistically significant factors from the multivariate analysis were assigned weighted scores based on their OR values according to established epidemiological strata 12 (Table 1): OR 1.20–1.49 (1 point), 1.50–2.99 (2 points), 3.00–9.99 (3 points), and ≥10.00 (4 points).
Assignment of scores in factor weighting system.
OR: odds ratio.
The total score of the two systems was calculated. Receiver operating characteristic (ROC) curve analysis was conducted to evaluate the predictive performance of both scoring systems for stone-free status. Patients were then stratified into low-risk and high-risk groups using the optimal cutoff points. The performance of the new system was compared to the traditional S.T.O.N.E. system using Chi-square tests. A p-value < .05 was considered statistically significant. The optimal cutoff points for risk stratification were determined using the Youden index (J = sensitivity + specificity = 1), which identifies the threshold that maximizes the sum of sensitivity and specificity.
The net reclassification improvement (NRI) was calculated to assess the improvement in risk prediction offered by the revised scoring system compared to the traditional S.T.O.N.E. score. Statistical significance of the NRI was tested using z-tests.
Results
Clinical data
The baseline characteristics of the 283 patients are summarized in Table 2. The mean age was 49.3 ± 10.4 years, and the cohort was predominantly male (66.4%). The majority of patients had a stone area <400 mm2 (66.4%), a tract length ≤100 mm (69.6%), and a stone density >950 HU (88.0%).
Baseline patient and stone characteristics (n = 283).
Operation outcomes
The overall SFR was 69.3% (196/283). Complications occurred in 28.3% (80/283) of patients. The distribution of complications according to the Clavien-Dindo classification is detailed in Table 3.
Postoperative complications (Clavien-Dindo classification).
Logistic regression and score development
Stone clearance status was set as the dependent variable, and The maximum cross-sectional area of stones (S), tract length (T), obstruction (O), number of involved calyces (N), and stone density (E) were used as independent variables for the binary logistic regression. The results are shown in Table 4. Stone size (S), obstruction (O), and stone density (E) were significant independent predictors of stone-free status (p < .05). Tract length (T) and number of involved calyces (N) were not significant predictors (p > .05).
Binary logistic regression analysis of factors associated with stone-free status.
B: regression coefficient; S.E.: standard error; Wald: Wald test; CI: confidence interval; OR: odds ratio.
* p < .05; ** p < .01; *** p < .001.
Based on the results of the above set of logistic regression, we concluded that the maximum cross-sectional area of stones, obstruction, and stone density significantly affected the SFR. Therefore, we modified weights to these three indices in the traditional S.T.O.N.E. scoring system according to OR values. Specifically, the maximum cross-sectional area of stones grades 2 and 3 were combined without adding points when the Grade 3 (OR = 1.159, Se = 19173.742) showed the Hauck Dunner effect. The final improved S.T.O.N.E. scoring system was shown in Table 5.
Improved S.T.O.N.E. score system.
Comparison of two scoring systems in predicting SFR
ROC curve analysis revealed that the improved scoring system demonstrated good discriminative ability for predicting stone-free status (AUC = 0.804, 95% CI [0.753–0.854]) (Figure 1). The Youden index identified an optimal cutoff point of 5.5 for risk stratification, with corresponding sensitivity of 88.5% and specificity of 42.3%.
For the traditional S.T.O.N.E. scoring system, the AUC was 0.814 (95% CI [0.764–0.864]). The Youden index determined an optimal cutoff of 8.5 for this system, with sensitivity of 93.1% and specificity of 49.0%.
A comparison of the AUCs using DeLong's test is presented in Table 6. There are no statistical differences between the two systems (p > .05), that is the improved system demonstrated comparable discriminative ability to the traditional system.
Paired-sample AUC differences.
Z: Z-score; S.E.: standard error; CI: confidence interval.
* p < .05; ** p < .01; *** p < .001.
Validation of the improved scoring system
The total score for each patient was calculated (range: 0–11). Using the optimal cutoff points of 5.5, patients were bisected into a low-risk group (scores 0–5, n = 123, 43.5%) and a high-risk group (scores 6–11, n = 160, 56.5%). The SFR was 91.9% (113/123) in the low-risk group and 51.9% (83/160) in the high-risk group (p < .001), the results were shown in Table 7.
Comparison of the incidence of postoperative stone-free rates between the two groups.
Comparison with the traditional S.T.O.N.E. system
Using the traditional S.T.O.N.E. system (score range 5–13), 183 patients (64.7%) were classified as low-risk (5–8) and 100 (35.3%) as high-risk (9–13). The SFR was 81.4% (149/183) in the traditional low-risk group and 47.0% (47/100) in the high-risk group. The results were shown in Table 8.
Comparison of the stone-free rates between low-risk groups patients in the two scoring systems.
The improved system identified a low-risk group with a significantly higher SFR than the traditional system (91.9% vs 81.4%, p = .011), while the SFR in the high-risk groups was not significantly different (51.9% vs 47.0%, p = .473).
NRI analysis confirmed that the revised scoring system provided significant improvement in risk stratification compared to the traditional system (NRI = 0.233, p = .007), particularly in correctly reclassifying high-risk patients (event NRI = 0.391, p < .001).
Discussion
This study presents a statistically refined version of the S.T.O.N.E. nephrolithometry score, which demonstrates superior clinical utility in preoperative risk stratification for PCNL. The principal finding is that our factor-weighted scoring system, utilizing cutoff points derived from the Youden index, provides a more clinically relevant stratification compared to the traditional system. Specifically, it identifies a low-risk cohort with a remarkably high SFR (91.9%), thereby offering surgeons a more reliable tool for predicting a successful outcome and enhancing preoperative patient counseling.
Our analysis confirmed that stone size (S) is a paramount predictor, consistent with all existing nephrolithometry systems.5–7,13–15 Larger stones require more time and effort to fragment and clear. Although most scholars consider the measurement of volume using three-dimensional imaging software to be the gold standard for assessing stone size, 16 we employed the cross-sectional area to maintain consistency with the original S.T.O.N.E. scoring system. Obstruction (O), reflecting hydronephrosis, was another strong predictor. A dilated collecting system can make stone fragment clearance more challenging and may be associated with impaired renal drainage, contributing to residual fragments. Stone density (E), measured in Hounsfield Units, was the third significant factor. Denser stones are often harder to fragment with laser lithotripsy, potentially leading to larger residual fragments that are difficult to evacuate. 9
The methodological framework established in this study—employing multivariate regression to identify independent predictors, assigning weights based on effect sizes (ORs), and determining optimal cutoff points using the Youden index—has broader implications beyond the S.T.O.N.E. scoring system. This approach can be readily applied to refine other surgical prediction models in urology, such as the Guy's stone score or SHA.LIN. scoring system, and indeed to prognostic tools across medical specialties. Many existing scoring systems, e.g. the Caprini score for venous thromboembolism, 17 were developed with expert consensus-based or arbitrarily assigned weights. Our data-driven methodology provides a standardized, statistically robust alternative to optimize these tools, potentially improving their predictive accuracy and clinical utility. This represents a pathway toward more personalized risk assessment, where scoring systems are continuously refined and validated using contemporary patient data and rigorous statistical methods.
Interestingly, tract length (T) and number of involved calyces (N) were not significant predictors in our multivariate model. This contrasts with the original S.T.O.N.E. derivation study. 6 For tract length, this could be because all our procedures were performed by experienced surgeons who can effectively manage longer tracts. The loss of significance for the number of calyces may be due to its collinearity with stone size; a large stone will inevitably fill multiple calyces. Furthermore, the original S.T.O.N.E. score for “N” incorporates stone burden within the definition (“complete staghorn”), which might be better captured by the “S” factor alone. Our smaller sample size, particularly the low number of patients with staghorn stones (n = 21) which led to statistical instability (Hauck-Donner effect), limits definitive conclusions for this parameter.
The most significant advancement of our revised system is its enhanced performance in risk stratification, as evidenced by the NRI. Although the overall discriminative capacity (AUC) was equivalent between the two systems, the NRI of 0.233 (p = .007) confirms that the revised model achieves a statistically significant improvement in correctly reclassifying patients, particularly those at high risk for residual stones (event NRI = 0.391, p < .001). This underscores a critical concept: the clinical value of a predictive tool is not solely dependent on its overall ranking ability (AUC) but also on its accuracy in categorizing patients into clinically actionable risk groups. The use of the Youden index to define the optimal cutoff (5.5 for the improved score) further strengthens the statistical rigor of our risk stratification, moving beyond arbitrary median splits to a criterion that maximizes both sensitivity and specificity.
Our study has limitations. Its retrospective, single-center design introduces potential for selection bias. The sample size was insufficient for some rarer categories (e.g. very large stones, staghorns), affecting the stability of those estimates. External validation in a larger, multicenter prospective cohort is essential to confirm these findings and further refine the weights.
Conclusion
The data-driven, factor-weighted revision of the S.T.O.N.E. score presented here significantly improves the prediction of stone-free status after PCNL. By employing rigorous statistical methods, including odds ratio-based weighting and Youden index-derived cutoffs, this tool may provide enhanced risk stratification. It is particularly effective in identifying a low-risk patient cohort with a > 90% probability of success, thereby offering valuable clinical utility for preoperative planning and patient counseling. Future prospective, multicenter studies are warranted to validate and potentially refine these findings.
ROC curves for the scores in the two systems.
Footnotes
Acknowledgments
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
