Sage Journals: Discover world-class research

Abstract

Objective

To improve the S.T.O.N.E. nephrolithometry scoring system by applying a factor weighting method derived from odds ratios (ORs) in logistic regression analysis for a more scientific prediction of stone-free rate (SFR) after percutaneous nephrolithotomy (PCNL).

Methods

We conducted a retrospective cohort study of 283 patients undergoing PCNL. The SFR was the primary outcome. Binary logistic regression identified independent predictors among the S.T.O.N.E. components. Statistically significant factors were assigned weighted points based on their ORs. The improved system's performance was compared to the traditional S.T.O.N.E. score using receiver operating characteristic analysis, Net Reclassification Improvement (NRI), and risk stratification based on Youden index-optimized cutoffs.

Results

Stone size, obstruction, and stone density were independent predictors of SFR (all p < .05) and were assigned weighted scores of 2, 3, and 3, respectively, creating a revised score (range 0–11). The improved system demonstrated comparable discriminative ability to the traditional system (AUC: 0.804 vs 0.814, p > .05) but provided significantly superior risk reclassification (NRI = 0.233, p = .007). Using the Youden index-derived cutoff of 5.5, the improved system stratified patients into low-risk (scores 0–5, n = 123) and high-risk (scores 6–11, n = 160) groups. The SFR was 91.9% in the low-risk group versus 51.9% in the high-risk group (p < .001). This low-risk group had a significantly higher SFR than the low-risk group defined by the traditional system (91.9% vs 81.4%, p = .011).

Conclusion

The factor-weighted S.T.O.N.E. scoring system provides enhanced clinical risk stratification for PCNL outcomes. It effectively identifies a distinct cohort of low-risk patients with a > 90% probability of stone-free success, offering improved utility for preoperative patient counseling and surgical planning.

Keywords

S.T.O.N.E. scoring system improved S.T.O.N.E. scoring system factor weighting method percutaneous nephrolithotomy stone-free rate

Introduction

Percutaneous nephrolithotomy (PCNL) is currently the most commonly used method for treating renal calculus, a common disease in urology. In their guidelines, the European Association of Urology, the American Urology Association, and the Urological Association of Asia recommend PCNL as the standard treatment for renal calculi larger than 2 cm.^1–3 Given the advances in technology, even large dehiscent stones can now be cleared using PCNL. Achieving stone-free status with minimal complications is the ideal surgical outcome of PCNL⁴; therefore, the stone-free rate (SFR) following PCNL is the most noteworthy efficacy index.

Many scholars have successively proposed preoperative stone scoring systems for PCNL, including the size, tract length, obstruction, number of involved calyces, and essence/stone density (S.T.O.N.E.) nephrolithometry score, and the SHAnghai LINgth, Inner structure, Hilar density, (Shanghai Lithometry Index) (SHA.LIN.) score, and Guy's grading.^5–7 These systems are mainly used to predict the SFR. Among these systems, the S.T.O.N.E. score has a high degree of acceptance and is supported by a large number of studies.^8,9

The S.T.O.N.E. score is widely validated, however, the equal-contribution assumption may limit stone-free prediction. We propose optimization through data-driven weighting. Therefore, this study aimed to refine the S.T.O.N.E. scoring system by applying a factor weighting method, using odds ratios (ORs) derived from logistic regression analysis, to improve its predictive value for SFR after PCNL.

Materials and methods

This study was conducted in accordance with the ethical standards of the Helsinki Declaration of 1975, as revised in 2024. This retrospective study was approved by the Institutional Review Board of Huai'an First Affiliated Hospital of Nanjing Medical University. The requirement for informed consent was waived due to the retrospective nature of the study. All patient data were fully deidentified to ensure patient confidentiality and anonymity. The reporting of this study conforms to the Strengthening the Reporting of Observational Studies in Epidemiology guideline.¹⁰

A total of 354 consecutive patients who underwent PCNL at our hospital between January 2021 and December 2022 were screened, and 283 eligible patients were included.

Inclusion criteria: (1) Adult patients (≥18 years old) who underwent PCNL for renal calculi; (2) Availability of both preoperative and postoperative noncontrast computed tomography (NCCT) scans; (3) Complete clinical and radiological data for S.T.O.N.E. score calculation.

Exclusion criteria: (1) Patients without either preoperative or postoperative NCCT imaging; (2) Pediatric patients (<18 years old); (3) Patients with concurrent urinary tract anomalies (e.g. horseshoe kidney, ectopic kidney); (4) Patients undergoing simultaneous procedures for other urological conditions; (5) Cases with incomplete medical records or missing essential data for analysis.

Observation indices

These indices were (1) the general characteristics of age, sex, height, weight, BMI, and operator; (2) the maximum cross-sectional area of stones, tract length, obstruction, number of involved calyces, and stone density; (3) the stone clearance and occurrence of complications.

Measurement and grading of indicators

The following were the five indicators from the S.T.O.N.E. system:⁶

The maximum cross-sectional area of stones (S) was defined as the area of the largest cross-section of the stone in the NCCT images. The area was classified into four grades: 0∼399 mm², 400∼799 mm², 800∼1599 mm², and ≥ 1600 mm².

Tract length (T) was defined as the shortest distance between the dome of the target calyx and the skin taken in the CT cross-section and was divided into two grades: ≤ 100 mm and > 100 mm.

Obstruction (O): Two grades—mild hydronephrosis and moderate or severe hydronephrosis—were established. Mild hydronephrosis was defined as dilated renal calyces with normal renal papilla morphology and no obstruction or mild hydronephrosis. Moderate hydronephrosis was defined as dilated renal calyces with loss of renal papilla morphology, and severe hydronephrosis was defined as a balloon-like expansion of the renal calyces with thinning of the renal cortex.

Number of involved calyces (N): the three grades were the involvement of one to two calyces; the involvement of three calyces, and complete staghorn-shaped stones.

Stone density (E): the area of interest within the largest cross-section of the stone was selected on the CT image and its average CT value was read as the CT value of the stone. Stone density was classified into two grades: ≤ 950 HU and > 950 HU.

Outcome assessment

The primary outcome was the SFR. Stone-free status was rigorously assessed by postoperative NCCT four to six weeks after surgery. It was defined as the absence of any residual fragments or the presence of only clinically insignificant residual fragments < 4 mm in diameter.² The presence of fragments ≥4 mm was classified as residual stones. Secondary outcomes included complications, graded according to the Clavien-Dindo classification system.¹¹

Surgical technique

PCNL puncture channels were all 18F; 265 and 18 patients had single-channel and double-channel punctures, respectively. Double J and nephrostomy tubes were retained after surgery. A constant-speed pressure-limiting pump was used as the water source during surgery. The pump speed was 680 mL/min and the pressure limit was 700 mmHg. The Auriga XL (StarMedTec GmbH,Starnberg,Germany) holmium laser surgery system that was produced on 19 August 2016 was used. During lithotripsy, the core diameter was 600 µm, the lithotripsy mode was used, and the frequency was 12 Hz. The energy of a single pulse was 3500 mJ. All of the patients underwent NCCT scans preoperatively and four to six weeks postoperatively.

Statistical methods and score development

Statistical analyses were performed using SPSS version 26.0 (IBM Corp., Armonk, NY, USA) and R software (R Foundation for Statistical Computing, Vienna, Austria). The measurement information was expressed as mean ± standard deviation, while categorical variables are expressed as frequencies and percentages, and the difference was considered statistically significant at p < .05.

Binary logistic regression analysis was employed to identify independent predictors of stone-free status among the five S.T.O.N.E. components. Variables with a significance level of p < .05 in univariate analysis were included in the multivariate model. ORs with 95% confidence intervals were calculated for significant predictors.

For the revised scoring system, statistically significant factors from the multivariate analysis were assigned weighted scores based on their OR values according to established epidemiological strata¹² (Table 1): OR 1.20–1.49 (1 point), 1.50–2.99 (2 points), 3.00–9.99 (3 points), and ≥10.00 (4 points).

Table 1.

Assignment of scores in factor weighting system.

Range of OR	Assign scores
1.2–1.4	1
1.5–2.9	2
3.0∼9.9	3
>10.0	4

OR: odds ratio.

The total score of the two systems was calculated. Receiver operating characteristic (ROC) curve analysis was conducted to evaluate the predictive performance of both scoring systems for stone-free status. Patients were then stratified into low-risk and high-risk groups using the optimal cutoff points. The performance of the new system was compared to the traditional S.T.O.N.E. system using Chi-square tests. A p-value < .05 was considered statistically significant. The optimal cutoff points for risk stratification were determined using the Youden index (J = sensitivity + specificity = 1), which identifies the threshold that maximizes the sum of sensitivity and specificity.

The net reclassification improvement (NRI) was calculated to assess the improvement in risk prediction offered by the revised scoring system compared to the traditional S.T.O.N.E. score. Statistical significance of the NRI was tested using z-tests.

Results

Clinical data

The baseline characteristics of the 283 patients are summarized in Table 2. The mean age was 49.3 ± 10.4 years, and the cohort was predominantly male (66.4%). The majority of patients had a stone area <400 mm² (66.4%), a tract length ≤100 mm (69.6%), and a stone density >950 HU (88.0%).

Table 2.

Baseline patient and stone characteristics (n = 283).

Characteristic	Value
Age (years), mean ± SD	49.3 ± 10.4
Sex, n (%)
Male	188 (66.4)
Female	95 (33.6)
BMI (kg/m²), mean ± SD	25.2 ± 3.1
Laterality, n (%)
Left	160 (56.5)
Right	123 (43.5)
Stone size (S), n (%)
<400 mm²	188 (66.4)
400–799 mm²	89 (31.4)
800–1599 mm²	6 (2.1)
≥1600 mm²	0 (0)
Tract length (T), n (%)
≤100 mm	197 (69.6)
>100 mm	86 (30.4)
Obstruction (O), n (%)
None/mild	139 (49.1)
Moderate/severe	144 (50.9)
Number of calyces (N), n (%)
1–2	59 (20.8)
3	203 (71.7)
Complete staghorn	21 (7.4)
Stone density (E), n (%)
≤950 HU	34 (12.0)
>950 HU	249 (88.0)

Operation outcomes

The overall SFR was 69.3% (196/283). Complications occurred in 28.3% (80/283) of patients. The distribution of complications according to the Clavien-Dindo classification is detailed in Table 3.

Table 3.

Postoperative complications (Clavien-Dindo classification).

Complications
Grade	n (%)	Description
Grade I	38 (13.4)	Hematuria (35), bleeding from nephrostomy (3)
Grade II	40 (14.1)	Postoperative fever requiring antibiotics (34), blood transfusion (6)
Grade III	0 (0)	-
Grade IV	2 (0.7)	Septic shock requiring ICU care
Grade V	0 (0)	-
Total	80 (28.3)

Logistic regression and score development

Stone clearance status was set as the dependent variable, and The maximum cross-sectional area of stones (S), tract length (T), obstruction (O), number of involved calyces (N), and stone density (E) were used as independent variables for the binary logistic regression. The results are shown in Table 4. Stone size (S), obstruction (O), and stone density (E) were significant independent predictors of stone-free status (p < .05). Tract length (T) and number of involved calyces (N) were not significant predictors (p > .05).

Table 4.

Binary logistic regression analysis of factors associated with stone-free status.

surgical outcome		B	Se	Wald	p	OR
The maximum cross-sectional area of stones (S)	<400mm² (Reference)			8.989	.011*
	400∼799mm²	1.06	0.354	8.989	.003**	2.887
	800∼1600mm²	0.148	19173.742	0	1	1.159
Tract length (T)	>100	0.173	0.346	0.249	.618	1.188
Obstruction (O)	2	1.236	0.362	11.658	.001**	3.441
Number of involved	1 (Reference)			1.467	.48
Calices (N)	2	0.732	0.605	1.467	.226	2.08
	3	22.359	9932.978	0	.998	51.3E + 09
Stone density (E)	>950HU	1.665	0.586	8.063	.005**	5.288

B: regression coefficient; S.E.: standard error; Wald: Wald test; CI: confidence interval; OR: odds ratio.

* p < .05; ** p < .01; *** p < .001.

Based on the results of the above set of logistic regression, we concluded that the maximum cross-sectional area of stones, obstruction, and stone density significantly affected the SFR. Therefore, we modified weights to these three indices in the traditional S.T.O.N.E. scoring system according to OR values. Specifically, the maximum cross-sectional area of stones grades 2 and 3 were combined without adding points when the Grade 3 (OR = 1.159, Se = 19173.742) showed the Hauck Dunner effect. The final improved S.T.O.N.E. scoring system was shown in Table 5.

Table 5.

Improved S.T.O.N.E. score system.

The maximum cross-sectional area of stones (S)	Score	Tract length (T)	Score	Obstruction (O)	Score
< 400mm2	0	≤ 100mm	0	Mild hydronephrosis	0
≥ 400mm2	2	> 100mm	1	Moderate or severe hydronephrosis	3

Number of involved calyces (N)	Score	Stone density (E)	Score
1–2 calyces	0	≤ 950 HU	0
3 calyces	1	> 950 HU	3
Complete staghorn-shaped stones	2

Comparison of two scoring systems in predicting SFR

ROC curve analysis revealed that the improved scoring system demonstrated good discriminative ability for predicting stone-free status (AUC = 0.804, 95% CI [0.753–0.854]) (Figure 1). The Youden index identified an optimal cutoff point of 5.5 for risk stratification, with corresponding sensitivity of 88.5% and specificity of 42.3%.

For the traditional S.T.O.N.E. scoring system, the AUC was 0.814 (95% CI [0.764–0.864]). The Youden index determined an optimal cutoff of 8.5 for this system, with sensitivity of 93.1% and specificity of 49.0%.

A comparison of the AUCs using DeLong's test is presented in Table 6. There are no statistical differences between the two systems (p > .05), that is the improved system demonstrated comparable discriminative ability to the traditional system.

Table 6.

Paired-sample AUC differences.

Pairs	Z	p	AUC differences	S.E	95% CI
Improved score – traditional score	−.672	.502	−.010	.224	−0.040 – −0.020

Z: Z-score; S.E.: standard error; CI: confidence interval.

* p < .05; ** p < .01; *** p < .001.

Validation of the improved scoring system

The total score for each patient was calculated (range: 0–11). Using the optimal cutoff points of 5.5, patients were bisected into a low-risk group (scores 0–5, n = 123, 43.5%) and a high-risk group (scores 6–11, n = 160, 56.5%). The SFR was 91.9% (113/123) in the low-risk group and 51.9% (83/160) in the high-risk group (p < .001), the results were shown in Table 7.

Table 7.

Comparison of the incidence of postoperative stone-free rates between the two groups.

Group	Low risk	High risk	χ²	p
Stone-free	113	83	52.245	.000
Residual stones	10	77	52.245	.000

Comparison with the traditional S.T.O.N.E. system

Using the traditional S.T.O.N.E. system (score range 5–13), 183 patients (64.7%) were classified as low-risk (5–8) and 100 (35.3%) as high-risk (9–13). The SFR was 81.4% (149/183) in the traditional low-risk group and 47.0% (47/100) in the high-risk group. The results were shown in Table 8.

Table 8.

Comparison of the stone-free rates between low-risk groups patients in the two scoring systems.

Group	traditional S.T.O.N.E. scoring system	improved S.T.O.N.E. scoring system	χ²	p
Stone-free	113	149	6.524	.011
Residual stones	10	34	6.524	.011

The improved system identified a low-risk group with a significantly higher SFR than the traditional system (91.9% vs 81.4%, p = .011), while the SFR in the high-risk groups was not significantly different (51.9% vs 47.0%, p = .473).

NRI analysis confirmed that the revised scoring system provided significant improvement in risk stratification compared to the traditional system (NRI = 0.233, p = .007), particularly in correctly reclassifying high-risk patients (event NRI = 0.391, p < .001).

Discussion

This study presents a statistically refined version of the S.T.O.N.E. nephrolithometry score, which demonstrates superior clinical utility in preoperative risk stratification for PCNL. The principal finding is that our factor-weighted scoring system, utilizing cutoff points derived from the Youden index, provides a more clinically relevant stratification compared to the traditional system. Specifically, it identifies a low-risk cohort with a remarkably high SFR (91.9%), thereby offering surgeons a more reliable tool for predicting a successful outcome and enhancing preoperative patient counseling.

Our analysis confirmed that stone size (S) is a paramount predictor, consistent with all existing nephrolithometry systems.^5–7^,13–15 Larger stones require more time and effort to fragment and clear. Although most scholars consider the measurement of volume using three-dimensional imaging software to be the gold standard for assessing stone size,¹⁶ we employed the cross-sectional area to maintain consistency with the original S.T.O.N.E. scoring system. Obstruction (O), reflecting hydronephrosis, was another strong predictor. A dilated collecting system can make stone fragment clearance more challenging and may be associated with impaired renal drainage, contributing to residual fragments. Stone density (E), measured in Hounsfield Units, was the third significant factor. Denser stones are often harder to fragment with laser lithotripsy, potentially leading to larger residual fragments that are difficult to evacuate.⁹

The methodological framework established in this study—employing multivariate regression to identify independent predictors, assigning weights based on effect sizes (ORs), and determining optimal cutoff points using the Youden index—has broader implications beyond the S.T.O.N.E. scoring system. This approach can be readily applied to refine other surgical prediction models in urology, such as the Guy's stone score or SHA.LIN. scoring system, and indeed to prognostic tools across medical specialties. Many existing scoring systems, e.g. the Caprini score for venous thromboembolism,¹⁷ were developed with expert consensus-based or arbitrarily assigned weights. Our data-driven methodology provides a standardized, statistically robust alternative to optimize these tools, potentially improving their predictive accuracy and clinical utility. This represents a pathway toward more personalized risk assessment, where scoring systems are continuously refined and validated using contemporary patient data and rigorous statistical methods.

Interestingly, tract length (T) and number of involved calyces (N) were not significant predictors in our multivariate model. This contrasts with the original S.T.O.N.E. derivation study.⁶ For tract length, this could be because all our procedures were performed by experienced surgeons who can effectively manage longer tracts. The loss of significance for the number of calyces may be due to its collinearity with stone size; a large stone will inevitably fill multiple calyces. Furthermore, the original S.T.O.N.E. score for “N” incorporates stone burden within the definition (“complete staghorn”), which might be better captured by the “S” factor alone. Our smaller sample size, particularly the low number of patients with staghorn stones (n = 21) which led to statistical instability (Hauck-Donner effect), limits definitive conclusions for this parameter.

The most significant advancement of our revised system is its enhanced performance in risk stratification, as evidenced by the NRI. Although the overall discriminative capacity (AUC) was equivalent between the two systems, the NRI of 0.233 (p = .007) confirms that the revised model achieves a statistically significant improvement in correctly reclassifying patients, particularly those at high risk for residual stones (event NRI = 0.391, p < .001). This underscores a critical concept: the clinical value of a predictive tool is not solely dependent on its overall ranking ability (AUC) but also on its accuracy in categorizing patients into clinically actionable risk groups. The use of the Youden index to define the optimal cutoff (5.5 for the improved score) further strengthens the statistical rigor of our risk stratification, moving beyond arbitrary median splits to a criterion that maximizes both sensitivity and specificity.

Our study has limitations. Its retrospective, single-center design introduces potential for selection bias. The sample size was insufficient for some rarer categories (e.g. very large stones, staghorns), affecting the stability of those estimates. External validation in a larger, multicenter prospective cohort is essential to confirm these findings and further refine the weights.

Conclusion

The data-driven, factor-weighted revision of the S.T.O.N.E. score presented here significantly improves the prediction of stone-free status after PCNL. By employing rigorous statistical methods, including odds ratio-based weighting and Youden index-derived cutoffs, this tool may provide enhanced risk stratification. It is particularly effective in identifying a low-risk patient cohort with a > 90% probability of success, thereby offering valuable clinical utility for preoperative planning and patient counseling. Future prospective, multicenter studies are warranted to validate and potentially refine these findings.

Figure 1.

ROC curves for the scores in the two systems.

Footnotes

Acknowledgments

We thank Anahid Pinchis from Liwen Bianji (Edanz) () for editing the English text of a draft of this manuscript.

ORCID iD

Bingjian Wei

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Skolarikos

Geraghty

Somani

, et al. European association of urology guidelines on the diagnosis and treatment of urolithiasis. Eur Urol 2025; 88: 64–75.

American Urological Association. Kidney stones: surgical management guideline, https://www.auanet.org/guidelines-and-quality/guidelines/kidney-stones-surgical-management-guideline (accessed 21 December 2023).

Taguchi

Cho

, et al. The Urological Association of Asia clinical guideline for urinary stone disease. Int J Urol 2019; 26: 688–709.

El-Nahas

Eraky

Shokeir

, et al. Factors affecting stone-free rate and complications of percutaneous nephrolithotomy for treatment of staghorn stone. Urology 2012; 79: 1236–1241.

Guohui

Hanzhong

, et al. The establishment and evaluation of SHA.LIN nephrolithometry scoring system for predicting the stone-free rate of percutaneous nephrolithotomy. Chin J Urol 2015; 36: 746–751.

Okhunov

Friedlander

, et al. S.T.O.N.E. nephrolithometry: novel surgical classification system for kidney calculi. Urology 2013; 81: 1154–1160.

Thomas

Smith

Hegarty

, et al. The Guy's stone score - grading the complexity of percutaneous nephrolithotomy procedures. Urology 2011; 78: 277–281.

Sirirak

Sangkum

, et al. External validation of the S.T.O.N.E. score in predicting stone-free status after rigid ureteroscopic lithotripsy. Res Rep Urol 2021; 13: 147–154.

Danis

Polat

Bozkurt

, et al. Application of S.T.O.N.E. nephrolithometry score for prediction of stone-free status and complication rates in patients who underwent percutaneous nephrolitotomy for renal stone. J Laparoendosc Adv Surg Tech A 2022; 32: 372–377.

10.

Vandenbroucke

von Elm

Altman

, et al. Strengthening the reporting of observational studies in epidemiology (STROBE): explanation and elaboration. Epidemiology (Cambridge, Mass.) 2007; 18: 805–835.

11.

De la Rosette

Opondo

Daels

, et al. Categorisation of complications and validation of the Clavien score for percutaneous nephrolithotomy. Eur Urol 2012; 62: 246–255.

12.

Monson

. Dose-response relationships for environmental agents: a methodological review. Am J Public Health 1980; 70: 229–235.

13.

Xing

Jianming

, et al. Clinical study of the application of S.T.O.N.E. nephrolithometry scoring system for the percutaneous nephrolithotomy. Chin J Urol 2014; 35: 40–44.

14.

Guohui

Hanzhong

, et al. Evaluation and comparison of SHA.LIN, S.T.O.N.E nephrolithometry scoring system and Guy's stone score in predicting the accuracy of the ercutaneous nephrolithotomy (PCNL) surgical outcomes. Chin J Urol 2016; 37: 199–205.

15.

Khan

Nazim

Farhan

, et al. Validation of S.T.O.N.E. nephrolithometry and Guy's stone score for predicting surgical surgical outcome after percutaneous nephrolithotomy. Urol Ann 2020; 12: 324–330.

16.

De Coninck

Traxer

. The time has come to report stone burden in terms of volume instead of largest diameter. J Endourol 2018; 32: 265–266.

17.

Caprini

Arcelus

, et al. Clinical assessment of venous thromboembolic risk in surgical patients. Semin Thromb Hemost 1991; 17: 304–312.

A data-driven revision of the S.T.O.N.E. nephrolithometry score: Improved predictive accuracy for stone-free status after PCNL

Abstract

Objective

Methods

Results

Conclusion

Keywords

Introduction

Materials and methods

Observation indices

Measurement and grading of indicators

Outcome assessment

Surgical technique

Statistical methods and score development

Results

Clinical data

Operation outcomes

Logistic regression and score development

Comparison of two scoring systems in predicting SFR

Validation of the improved scoring system

Comparison with the traditional S.T.O.N.E. system

Discussion

Conclusion

Footnotes

Acknowledgments

ORCID iD

Funding

Declaration of conflicting interests

References