Sage Journals: Discover world-class research

Abstract

Study Design

A retrospective study.

Objective

To develop a prognostic score for mortality and treatment failure in Spinal epidural abscess (SEA), based on simplicity and multidimensional assessment principles.

Methods

One-hundred-fifty patients were reviewed. Variables assessed included comorbidities, functional status, clinical presentation, Frankel classification, and biochemical and radiological parameters. The main outcomes were the 90-day mortality and treatment failure, corresponding to any intensification of the initial treatment plan. Variables were sorted out with a factorial analysis. Logistic regressions were performed, and the new score was derived from the coefficients. ROC curves with Area Under Curve, calibration plots, and cross-validation were performed.

Results

Forty-three patients (29%) had treatment failure, and 15 died (10%) by 90 days. Factorization created 3 groups: C omorbidities (C), S everity (S), and F unction (F). For 90-day mortality, Odds ratios were 1.20 (P = .0002), 1.15, (P = .03), 1.36, (P < 10⁻⁴) for C, S, F, respectively. The new score ‘CSF’ had 1 point per item, ranging from zero to 3. OR increased by 1.2/point for 90-day mortality (P < 10⁻⁴), AUC was .86. For failures OR increased by 1.15/point (P = .014), AUC was .58, and increased to .64 for patients who survived after 90 days, probably due to competing risks.

Conclusions

Comorbidities, Severity, and Function is a new simplistic tool, easy to use in daily practice; its performances were excellent for 90-day mortality, and acceptable for failures. Simple tools are more likely to be adopted into practice. External validation of this technique is desirable.

Keywords

spinal epidural abscess prognostic score mortality treatment failure

Introduction

Spinal epidural abscess (SEA) often occurs in the context of hematogenous seeding and is potentially life-threatening.¹ The mortality risk is significant, with 90-day mortality approximately 15%.^2-4 SEA is associated with a neurological deficit in up to 50%, and survivors may suffer lifelong impairment.² Globally, the incidence is increasing due to a combination of greater prevalence of comorbid conditions, increased life expectancy, and improved diagnosis.² Both medical and surgical treatment play a role, although medical treatment alone has gained in popularity over the last decades. However, the risk of failure of medical treatment is high, up to 40%.^2-4

Surgical management is not without risk of failure, as some patients require revision or unplanned secondary surgery for recurrent abscesses or subsequent spinal instability. SEA often occurs in the context of physiologic frailty and comorbidity. An accurate assessment of the risks associated with the disease and treatment is crucial to help clinicians and patients in their management and decision-making.

Patient-based risk assessment remains a challenge. Diabetes, age over 65, antibiotic resistance, elevated inflammatory markers, intravenous drug use (IVDU), and radiological instability are common prognostic factors.^3,5-7 An assessment based on a single variable is not sufficiently discriminant to guide clinicians. Several scores or prognostic tools have been proposed, amongst them the 11-item modified Frailty Index (mFI-11)^8,9 mFI-5,¹⁰ Mortality in Spinal Infection (MSI),¹¹ Charlson Comorbidity Index (CCI),^8,12,13 SORG Orthopaedic Research Group (SORG),^14,15 Spinal Infection Treatment Evaluation Score (SITE),¹⁶ and the postoperative decrease in CRP.^7,17

However, the performances of these tools come with numerous weaknesses, methodological issues, and clinical paradoxes. The first issue is the simplicity of the score, which is directly linked to its ability to be deployed in daily practice. All the scores mentioned above have more than 5 items, and 4 have more than ten items. The SITE score includes the Charlson and American Society of Anaesthesiologists (ASA) scores. In addition, SEA is a relatively low-incidence disease compared to hip fractures, for example. All these scores were developed from several years of recruitment period or multicentric cohorts. In daily practice, it is unlikely that the volume of patients allows clinicians to explore the full range of these scores: there are too few events compared to a too large range of predictions. Therefore, there is a need for a new simple score that can be easily deployed in daily practice.

Moreover, collinearities, meaning correlations between items¹⁸ are a silent but potentially deleterious issue when the aim is to develop a discriminative score. There are at least 2 main consequences: a decrease in the multidimensional assessment capability and the impossibility of relying on the score’s ponderation. Even if mFI-11^8,9 does not have ponderation, half of its items are related to cardiac disease, meaning they assess the same dimension. For MSI, a pondered score, the ASA score, and CCI score probably have associations, and the septic status might also increase the ASA score. For the SITE score, another pondered score, the neurological status is linked to canal stenosis and probably to the ability to ambulate. For SORG, there is collinearity between the multiple inflammatory markers, including platelets,^19,20 and between the Albumin and the age.^14,21

The baseline value of the score, also called intercept,^18,22,23 represents the value when all the parameters are at their baseline and may lead to a clinical paradox. For SORG,¹⁴ a patient with all parameters at their minimum is unlikely to exist. The SITE score is “upside down” and the lower figures indicate the worst conditions. Consequently, the figures for the healthiest patients depend on the score ponderation, which, again, may be unreliable. Also, the baseline for SITE corresponds to patients with an S2-S5 infection, with no pain and no specific radiological features; these particular radiological features are not compatible with an S2-S5 infection (spinal column deformity and disc erosion). It is unlikely a patient could reach the baseline with SITE.

Classifying variables according to their interdependence allows for determining the dimensions of the dataset; this process is called factorial analysis.²⁴ The new variables created after the factorization allow a multidimensional assessment.²⁵ We hypothesized that a factorial analysis could help develop a new performant prognostic score, addressing several of the issues mentioned above.

Therefore, this study aimed to develop a prognostic score for mortality and failure in SEA, focusing on simplicity, multidimensional assessment, and clinical coherence principles.

Material and Methods

Ethics

This study received the Hospital Review Board approval (Waikato District Health Board, Clinical Audit Support Unit; registration Number 4341PDOR220726, demand 4341P). As data were analyzed in retrospect, patient consent was not required.

Setting and Participants

This study was conducted at a tertiary referral spine center, servicing over 900,000 people. The hospital coding provided a list of patients over 18 years old diagnosed with spinal epidural abscess from January 2010 to November 2022. Patients included in the study had their electronic medical records checked to confirm a radiological diagnosis of SEA.

Variables

The demographic data collected were the age at admission, gender, ethnicity, comorbidities included in the mFI-11 and CCI, and functional status (independent or dependent). The clinical information were the clinical presentation (particularly, septic shock), the location of the infection (cervical, thoracic, lumbar, multilevel, and extra-spinal locations), the neurological status (using the Frankel classification: from E, no neurological deficit to A, complete motor and sensitive deficit).²⁶ The laboratory values were the C-reactive protein (CRP, mg/L), Haemoglobin (g/L), White Blood Cell Count (WBC, g/L), Platelets, and renal profile. Information about the microbiology were the type of bacteria and the resistance/sensitivity profile. The radiological information was the presence of osteomyelitis or discitis. The SORG, mFI-11, and CCI were subsequently calculated.^8,9,27

Outcomes

The risk of death was assessed at 90 days as a primary outcome of interest. Any intensification of the treatment defined the failure, for example, a failure of initial medical management with antibiotics requiring surgery (that is, the initial treatment plan was nonoperative, but the patient subsequently required surgery), or the failure of surgical treatment (that is requiring a second surgery for either recurrent abscess collection or subsequent need to stabilize due to ensuing instability).

Bias

When elaborating a score, there is a risk of overfitting the models. K-fold cross-validation was performed to test the score's performance.²⁸

Statistics

When describing the data, we report the median, first, and third quartiles for quantitative variables. We report counts for categorical variables. The associations between all the variables and the outcome were tested using a Generalized Linear Regression.

The main objective was to elaborate on a prognostic score for the risk of death and treatment failure. It was a four-step process.

The first step was a factorial analysis using a correlation matrix and Pearson correlations. The factorization aimed to identify all the correlations between the variables. After mapping all the correlations between all the variables (these correlations can be visualized in a colored matrix), the variables were regrouped in clusters of similar clinical meaning. For instance, in the case of inflammation, there is a raised CRP, white cells, and a decrease in hemoglobin; these 3 variables are correlated and can be regrouped in a cluster called “inflammation”. A similar process can lead to a cluster “comorbidity,” with the variables “medical history of a heart attack,” “chronic kidney failure,” and “chronic obstructive pulmonary disease.” After the process of factorization, the second step was to test its quality. A good factorization meant a low correlation between the clusters, allowing a multidimensional assessment. The correlations between the clusters were tested using Pearson correlations. Similarly, the internal consistency between the clusters was tested with a Cronbach’s alpha score; a low internal consistency (<.70) was expected, reflecting the multidimensional assessment.

After assessing the quality of the factorization, the third step was to produce a point-based score. The clusters became the items of the score, and the points attributed to each item were calculated using logistic regressions. Regressions were performed to test the association between the items, the 90-day mortality, and treatment failure. The coefficients of the regressions allowed to determine the points attributed to the items.

The last step was for the score performance assessment and the cross-validation. C statistics were performed, and the Area Under Curves (AUC) was calculated to assess discrimination. Calibration was assessed graphically with calibration plots.²³ These plots compare the actual and predictive probabilities: scores with a good calibration had a nonparametric curve (dashed curve) close to the ideal curve (grey line). K-fold cross-validation was performed with test datasets, with k = 4 for failures (37 observations each) and k = 3 (50 observations each). The subdivision of the primary dataset in 3 or 4 smaller datasets for the cross-validation inevitably led to a decrease in power, and smaller counts were expected for each score figure. The primary alpha risk was set at .05.

Results

Cohort Description

A total of 150 patients were analyzed. The median age was 63 years (range 17 to 93; Q1: 53; Q3: 93). There were 58 females (38.67%). The characteristics of the population are detailed in Table 1. A total of 69 patients had a primary medical treatment (46%). Amongst them, 16/69 patients had a failure and required surgery. Eighty-one patients had surgery as the initial treatment; 27/81 had revision surgery. A total of 43/150 patients (29%) had either a failure of the primary treatment and required either surgery or a second surgery. There was a total of 9 deaths at 30 days (6%), 15 deaths (10%) at 90 days, and 21 deaths at 1 year (14.6%).

Table 1.

Descriptive data. Quantitative Variables: Median (Q1-Q3). Categorical Variables: Number (%).

	All Cohort	90-day Mortality			Failure
Variables		No	Yes	P	No	Yes	P
Age	63 (53 - 73)	62 (53 - 70)	75 (68 - 82)	.0004**	64 (54 - 75)	62 (53 - 68)	.29
CCI	3 (1 - 4)	3 (1 - 4)	2 (1 - 4)	.56	2 (1 - 4)	3 (2 - 3)	.88
mFI-11	1 (0 - 2)	1 (0 - 2)	3 (2 - 4)	<.0001***	1 (0 - 2)	1 (1 - 2)	.59
SORG score	9 (4 - 19)	8 (4 - 17)	27 (17 - 33)	<.0001***	8 (4 - 20)	10 (5 - 18)	.82
CSF	1 (1 - 2)	1 (1 - 2)	2 (2 - 3)	<.0001***	1 (1 - 2)	1 (1 - 2)	.10
CS₃F	2 (2 - 3)	2 (2 - 3)	4 (3 - 5)	<.0001***	2 (2 - 3)	3 (2 - 3)	.009**
CRP (mg/L)	170 (95 - 270)	177 (100 - 270)	130 (86 - 270)	.77	166 (94 - 270)	195 (98 - 318)	.19
WBC (10⁹/L)	12 (9 - 16)	12 (9 - 16)	14 (10 - 16)	.15	12 (9 - 15)	13 (10 - 16)	.20
Platelet count (10⁹/L)	259 (191 - 396)	258 (191 - 398)	293 (170 - 326)	.35	270 (190 - 408)	253 (198 - 332)	.16
Hb (g/L)	117 (106 - 131)	118 (108 - 132)	108 (96 - 120)	.26	116 (106 - 130)	118 (104 - 132)	.81
Albumin (g/L)	28 (23-32)	27 (23-32)	26 (22-29)	.11	29 (23-33)	28 (23-31)	.36
Gender				.31			.66
Males	92 (61)	81 (60)	11 (73)		65 (62)	27 (59)
Females	58 (39)	54 (40)	4 (27)		39 (38)	19 (41)
Level				.32			.37
Cervical	21 (14)	21 (16)	0 (0)		15 (14)	6 (13)
Thoracic	32 (21)	28 (21)	4 (27)		26 (25)	6 (13)
Lumbar	92 (61)	81 (60)	11 (73)		60 (58)	32 (70)
Bacteria				.46			.25
Staphylococcus	103 (69)	91 (67)	12 (80)		68 (65)	35 (76)
Streptococcus	10 (7)	10 (7)	0 (0)		9 (9)	1 (2)
Other	37 (25)	34 (25)	3 (20)		27 (26)	10 (22)
Resistance to antibiotics				.051			.94
Yes	22 (16)	17 (14)	5 (33)		15 (16)	7 (16)
Multifocal				.013*			.005**
Yes	96 (64)	82 (61)	14 (93)		59 (57)	37 (80)
Radiological severity				.30			.059
No	45 (31)	43 (33)	2 (13)		36 (35)	9 (20)
Bone	84 (57)	73 (55)	11 (73)		57 (56)	27 (60)
Disc	18 (12)	16 (12)	2 (13)		9 (9)	9 (20)
Location				.71			.76
Dorsal	84 (62)	76 (62)	8 (57)		56 (61)	28 (64)
Ventral	52 (38)	46 (38)	6 (43)		36 (39)	16 (36)
Frankel at presentation				.20			.36
A	3 (2)	2 (1)	1 (7)		3 (3)	0 (0)
B	4 (3)	3 (2)	1 (7)		2 (2)	2 (4)
C	16 (11)	16 (12)	0 (0)		11 (11)	5 (11)
D	49 (33)	42 (31)	7 (47)		30 (29)	19 (41)
E	77 (52)	71 (53)	6 (40)		57 (55)	20 (43)
Altered sensation				.28			.66
Yes	42 (28)	36 (27)	6 (40)		28 (27)	14 (30)
Functional status				<.0001***			.78
Not independent	15 (10)	8 (6)	7 (47)		10 (10)	5 (11)
Myocardial infarction				.21			.57
Yes	9 (6)	7 (5)	2 (13)		7 (7)	2 (4)
Chronic heart failure				.052			.45
Yes	6 (4)	4 (3)	2 (13)		5 (5)	1 (2)
Previous vascular disease				.17			.24
Yes	3 (2)	2 (1)	1 (7)		3 (3)	0 (0)
Stroke				.001*			.029**
Yes	10 (7)	6 (4)	4 (27)		10 (10)	0 (0)
COPD				.047*			.074
Yes	11 (7)	8 (6)	3 (20)		5 (5)	6 (13)
Diabetes				.40			.85
Yes	28 (19)	24 (18)	4 (27)		19 (18)	9 (20)
Chronic renal failure				.0002**			.50
Yes	17 (11)	11 (8)	6 (40)		13 (12)	4 (9)
Active malignancy				.058			.55
Yes	2 (1)	1 (1)	1 (7)		1 (1)	1 (2)

The variables associated with the risk of failure were multifocal infection (OR = 3.15, 95% CI 1.4-7.5, P = .002) and history of stroke (OR = 7, 95% CI 1.78-31, P = .02). The involvement of the disc or the bone has some association with failure (OR = 1.97, 95% CI .98-4.65, P = .059), Table 1.

None of the CCI, mFI-11, and SORG scores were associated with the risk of failure (P = .87, .58, and .81, respectively).

The variable associated with the risk of death was age (OR = 1.07, 95% CI [1.03-1.12], P = .0002), P = .03), history of stroke (OR = 7, 95% CI [1.9-30], P = .002), congestive heart failure (OR = 5, 95% CI [1.2-40], P = .02), a functional status not independent (OR = 13, 95% CI [2.3-24], P = .0006), chronic obstructive pulmonary disease (COPD) (OR = 2.5, 95% CI [1-5], P = .04), chronic renal failure (CRF) (OR = 7, 95% CI [2-25]), and multifocal infection (OR = 6.5, 95% CI [1.7-41], P = .01), Table 1. The mFI-11 and SORG scores were associated with both the 30-day and 90-day mortality; the ORs were 3.1 95% CI [1.9-5.5] P < 10⁻⁴, and 1.07 95% CI [1.03-1.12] P = .0006, for mFI-11 and SORG, respectively, at 90-day.

Factorial Analysis

Age and hemoglobin were the variables with the maximum number of significant correlations (Figure 1).

Figure 1.

Pre-factorisation correlation matrix. Numerous correlations are found between variables. The age and haemoglobin were the 2 variables with the maximum correlations. There were excluded from the score because of the risk or collinearity. Functional status was separated alone because of a fewer number of correlations, and because it brought a relevant clinical information. Other clusters were the comorbidities, avec the severity parameters.

There were several significant positive or negative correlations between the comorbidities, especially between myocardial infarction, chronic renal failure, congestive cardiac failure, previous vascular disease, diabetes, and COPD (Heatmap). These variables were grouped in an entity called Comorbidities, and all the patients with 1 or more criteria were considered the same entity.

The functional was correlated to an active malignancy (r = .15, P = .05), age (r = .3, P < 10⁻⁴), diabetes (r = .16, P = .04), and chronic renal failure (r = .24, P = .002). Functional status was considered a proper cluster called Function, because it was the variable with the stronger association with 90-day and because diabetes and chronic renal failure were already in Comorbidities. Moreover, Function brought a piece of relevant clinical information, different than Comorbidities.

There was a significant negative correlation between CRP >300 mg/L and Haemoglobin <100 g/L: r = −.17, P = .05, WBC (r = .18, P = .02), Platelets (r = −.39, P < 10⁻⁴). Albumin was correlated with WBC (r = −.15, P = .05), hemoglobin (r = .35, P < 10⁻⁴), and CRP (r = −.17, P = .04). These variables gave information about the biochemical Severity of the infection and were included in a subgroup called Biochemical Severity. Involvement of either disc or bone represents the Radiological Severity and was correlated with Biochemical Severity (r = .21, P = .04).

There was a correlation between altered sensation and motor weakness Frankel < D (r = .35, P = .0009). Moreover, multifocal infection had some correlation with sensation (r = .12, P = .04). All these variables gave information about the clinical Severity of the infection and were included in a subgroup called Clinical Severity. Biochemical Severity and Clinical Severity were correlated (r = .3, P = .002), mainly due to correlations between multifocal infection, neurological weakness, and inflammatory markers (heatmap). Thus, Clinical, Radiological, and Biochemical Severity were gathered in the same Severity group.

After factorization (Figure 2), there was some correlation between Comorbidities and Function (r = .15, P = .04). Cronbach’s alpha score .12. The product of this factorization was named CSF (standing for Comorbidities, Severity, Function). The patients could be C₀S₀F₀, C₁S₀F₀, C₀S₁F₀, C₀S₀F₁, C₁S₁F₀, C₁S₀F₁, C₀S₁F₁, or C₁S₁F₁, Table 2.

Figure 2.

Post-factorisation correlation matrix. There was only a small correlation between Function and Comorbidities. This correlation was expected as Function was associated with some comorbidities.

Table 2.

Items of CFS Score.

Item	Points
Comorbidities:	1
Any of
• Cardiac condition (includes hypertension, IHD, CHF, previous cardiac arrest, PPM or IVD insertion, valve replacement etc.)
• Renal condition
• COPD
• Active malignancy
• Diabetes
• Immunosuppression
• Systemic inflammatory disease
Severity	1
Any of
Clinical severity
• Multifocal infection (including outside spine and several spine locations)
• Septic shock/Hemodynamic instability
• Motor weakness Frankel < D
• Sensation dysfunction
Biological severity	(1)
• CRP >300 mg/L
• WBC >15 10⁹/L
• Hb <80 g/L
• Albumin <20 g/L
Radiological severity	(1)
• Disc or bone involvement	(1)
Function: Functional status not independent	1

Point-Based CSF Score

Comorbidities, Severity, and Function were significantly associated with the 90-day and 1-year mortality. For the 90-day mortality, the OR were 1.20, 95% CI [1.1-1.31] P = .0002, 1.15, 95%IC [1.01-1.27] P = .03, and 1.36, 95% CI [1.18-1.58] P < 10⁻⁴ for Comorbidities, Severity, and Function, respectively. The OR being relatively similar, Severity, Gravity, and Comorbidities were attributed 1 point each.

Severity was significantly associated with the risk of treatment failure, OR = 1.38 (95% CI [1.15-1.65], P = .007. Comorbidities had some association with the risk of failure (OR = 2.17, P = .08). There was no association with Function.

Score Performances

Comorbidities, Severity, and Function could vary between zero and 3 points. There were 14, 76, 51, and 8 patients with a CSF score of 0, 1, 2, and 3 points, respectively. CSF score was significantly associated with the risk of death at 90 days and 1-year; the OR increased by 1.2 per point, on average, for both endpoints (CI 95% [1.1-1.3]), P < 10⁻⁴). The observed mortality rates were 0%, 1%, 15%, and 75% at 90-day, and 0%, 2%, 23%, 75% at 1-year. The AUCs were .86 and .83 for 90-day and 1-year mortality, respectively. For the 90-day mortality, the AUCs decreased from .86 to .74, .81, and .79 when removing the weight of C, S, and F, respectively. The calibration plots showed good calibration (Figure 3).

Figure 3.

Cross-validation of CSF for 90-day mortality. The area under the curve was stable around .80. The calibration plots showed a steadily increase of the observed risks with the estimated risks. The slopes and intercepts were a bit labile, and this may be explained by the small samples.

CSF score was significantly associated with the risk of failure, and the OR increased by 1.15 per point on average (CI95% [1.02-1.28], P = .014). The observed rates of failures were 0%, 48%, 44%, and 4%, with an AUC = .58, considering all the cohorts. When considering the patients who survived after 90 days, the AUC improved up to .64. The AUC further improved to .69 when the score was calculated, attributing 1 point for both Clinical, Radiological, and Biochemical Severity (CS₃F), and the observed failure rates were 0%, 8%, 28%, 47%, and 33% for 0, 1, 2, 3 and ≥4 points, respectively. The calibration plots showed stable AUC, but slopes were labile and tended to be higher than 1 (Figure 4).

Figure 4.

Cross-validation of CSF for failures. The area under the curve was stable around .70. The calibration plots showed the observed risks were labile for the 2 higher estimations, which may be explained by small samples and competition between mortality and failures. Patients with high score were also at risk of death and some were “removed” from the failure group.

Discussion

This study demonstrated the extent of the cross-correlations amongst candidate predictive variables for death or treatment failure in SEA. Age and hemoglobin were the 2 variables most frequently correlated with others. Considered separately, they were some of the very few variables correlated with the risk of treatment failure. The factorization process created 3 group variables: Comorbidities, Severity, and Functional status; this process transformed a highly correlated set of variables into very informative items with absence or very low cross-correlation. The ‘CSF’ score, elaborated from the factorization, had high performance and discrimination to predict the 90-day mortality. When the Severity component is considered separately by Clinical, Radiological, and Biochemical criteria (CS₃F), the score was also correlated to the risk of failure, with reasonably good discrimination but labile calibration.

The CSF score is innovative because compared to the other scores, it is the simplest (only 3 items, no ponderation, and does not involve other scores), and does not present any massive collinearities (allowing a better multidimensional assessment). Also, the baseline score (CSF = 0) is credible and does not lead to a clinical paradox (Table 3). The CSF score represents statistical proof of “clinical common sense”; it helps to investigate several independent dimensions of the diseases, which may explain better performances, especially for failures. Collinearities between items are frequent in scores (Table 3); for example, in the mFI-11, 5 of the 11 items are related to the cardiovascular system. Several comorbidities or inflammatory markers are often found as prognostic factors.^14,29-31 Collinearities alter the regression coefficients, and may be responsible for decreased performances because of a lack of dimensions and discrimination. The process of targeting the lowest internal consistency between items is the opposite approach of the 1 used in questionnaires for psychology, for instance. In the latter situation, the highest consistency is suitable for assessing unobservable information.²⁴ In the context of a score, items should differentiate the patients to create a panel of different situations. Strayhorn et al. emphasized the need to understand better the underlying patterns between variables and the global situation.³² They detailed the impact of mediator variables and interaction effects. Our study only focuses on correlations between variables and does not explore mediators.

Table 3.

Comparison With the Other Scores.

Score	Number of Items	Ponderation	Intercept (Baseline Value)	Contains Other Scores	Multidimensional	Collinearities	Predicts Mortality	Predicts Medical Treatment Failure
mFI-11⁸	11	No	Plausible but zero doesn’t necessarily mean healthy patient	No	Yes but mainly oriented toward comorbidities (4/5 items)	Yes: congestive cardiac failure with myocardial infarction, hypertension, and angina and peripheral vascular disease; cerebrovascular accident and neurological deficit after cerebrovascular accident	Yes	Yes
mFI-5¹⁰	5	No	Plausible but zero doesn’t necessarily mean healthy patient	No	Yes but mainly oriented toward comorbidities (8/11 items)	Yes: congestive cardiac failure with diabetes and hypertension	Yes	Marginal effect
MSI¹¹	8 (with categorical variables, total 23 items)	Yes	Plausible but unlikely: Non-septic patient with low CRP	Yes: ASA, Charlson	Yes	Yes: CCI with hepatopathy, renal insufficiency, and probably ASA. Septic patient state with CRP and probably ASA >2	Yes	Unknown
CCI¹³	19	Yes	Plausible but doesn't preclude disease severity	No	No	Very likely between the comorbidities	Yes	No
SORG¹⁴	7	Yes (requires the online tool)	Unlikely: 18 yo patient with very low neutrophil/lymphocyte ratio, very low albumin and platelet	No	Doesn’t assess function	Yes: between albumin and age; probably between the biological inflammatory markers; probably between diabetes and haemodialysis (as diabetes is a cause of kidney failure)	Yes	Unknown (no in our study)
SITE¹⁶	5 (with categorical variables, total 16 items)	Yes	Unlikely: The location S2-S5 isn’t compatible with the radiological findings (disc erosion and deformity)	No	Yes	Probably not	Yes	Unknown
CSF	3	No	Plausible (functional patient with no comorbidities and no severity criteria)	No	Yes	Low	Yes	Yes

The calibration of the prognostic score determines its performance. However, in the context of external validation, the variables' weight in scores may vary, and the calibration may decrease in quality. In extreme cases, the variables contained in scores are not even associated with the outcome. Kim et al found diabetes and Methicillin-resistant Staphylococcus aureus (MRSA) as prognostic factors for failure,²⁹ but these 2 variables did not have an association with failure in our cohort. It is the same conclusion for Page et at., who found an active malignancy, organism identification, and gender as predictive factors of failure.^33,34 Clustering the variables in patterns with a clinical signification helps to “catch” the information through the internal cross-correlations of the variables into the cluster. For instance, diabetes, a significant factor for Kim et al and Patel et al,^7,29 was correlated with almost all the variables in the cluster Comorbidities.

The performances of CSF were excellent when predicting mortality but needed to be more consistent for failure. Amongst all the scores published, only the MFI-11 has shown a significant association with the risk of surgical treatment.⁸ Dominguez et al have found a marginal association between the mFI-5 and the risk of failure¹⁰ (Table 3). Severity was the main item associated with treatment failure, while all 3 items of CSF were meaningful for mortality. One of the hypotheses is the competition³⁵ between failure and death at the early stage. The patients with high CSF scores died earlier, so they could not have a failure anymore - this is why the AUC improved when considering only the patients who survived after 90 days. This is also why only 4% of the patients with a score CSF = 3 had a treatment failure: most patients with CSF = 3 had rather died.

This series presents some differences with the literature data that are worth mentioning. The location of the abscess is a parameter often reported as a predictive variable. The location could be the segment of the spine involved (cervical, thoracic, or lumbar), or the position ventral or dorsal to the thecal sac.^5,13 None of these parameters involved the risk of treatment failure in our cohort. Diabetes is a predictive factor commonly found in the literature.^5,7,29 It’s unclear why diabetes was not found significant in this cohort; it is not a matter of power because the P-value of .8 for diabetes was far from significant. Ethnicity was a prognostic factor found in a previous study in our center but was not significant in this cohort.¹³ The involvement of discs and bone was a strong predictive factor in the SITE score.¹⁶ The association was attenuated in our cohort.

This study has several limitations. The surgical delay was not considered, while it was shown to be a crucial parameter.^2,36 The delay to the first antibiotherapy is also a key parameter that was not assessed. The main reason is that the population of this study is mainly from a remote area, and the care in the community was not easily standardized and accessible. Also, there needed to be more power when performing the cross-validation. We chose a k-fold number of 4 for failures and 3 for death, which allowed us to observe up to 15 events for death in each test dataset. A higher k-fold would have led to unobserved events. The slopes were labile, which can be explained by the small size of the test samples.²³ We grouped the highest figures of the score for failure (4 and 5), so it would decrease the likelihood of non-event. Despite proceeding with this grouping, no event has been observed for score = 3 for the calibration plot left bottom. Another limitation is the imprecision of some parameters, such as the radiological characteristics. Moreover, although failures corresponded to an escalation in the treatment, there was a lack of details for the reasons for treatment failure and why the treating surgeon decided to modify treatment plans. For instance, it could be an increase in instability, cord compression, or a second new abscess location. Finally, the factorization was maybe too much for failures, with a narrow range of figures. This is probably why CSF was better when differentiating Clinical, Radiological, and Biological Severity for failures.

Conclusion

CSF is a three-item score that is easy to memorize and use in daily practice. Its performance was excellent and robust to predict 90-day mortality. CS₃F, which detailed the clinical, biological, and radiological criteria of disease severity, was associated with the risk of failure, but the performances were not as well calibrated as those for mortality. A competition between treatment failure and mortality was responsible for an overall decrease in the score performance when assessing the risk of failure. In daily practice, the multidimensional assessment of the patients leads to as good discrimination as more refined scores. The numerous cross-correlations between the variables helped to catch information. They led to a change in paradigm when developing the score: not using “AND” anymore, but “OR” when considering variables of the same dimension.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Ethical Statement

ORCID iDs

Baptiste Boukebous

Joseph F. Baker

References

Batson

. The function of the vertebral veins and their rôle in the spread of metastases - PMC. Ann Surg. 1940;112(1):138-149. doi:10.1097/00000658-194007000-00016

Schwab

Shah

. Spinal epidural abscess: diagnosis, management, and outcomes. J Am Acad Orthop Surg. 2020;28(21):e929-e938. doi:10.5435/JAAOS-D-19-00685

Tetsuka

Suzuki

Ogawa

Hashimoto

Kato

. Spinal epidural abscess: a review highlighting early diagnosis and management. JMA J. 2020;3(1):29-40. doi:10.31662/jmaj.2019-0038

Ameer

Knorr

Munakomi

Mesfin

. Spinal epidural abscess. StatPearls. St. Petersburg, FL: StatPearls Publishing; 2022. https://www.ncbi.nlm.nih.gov/books/NBK441890/. Accessed January 31, 2023.

Shah

Ogink

Nelson

Harris

Schwab

. Nonoperative management of spinal epidural abscess: development of a predictive algorithm for failure. J Bone Joint Surg Am. 2018;100(7):546-555. doi:10.2106/JBJS.17.00629

Baum

Viljoen

Gifford

, et al. Baseline parameters and the prediction of treatment failure in patients with intravenous drug use-associated spinal epidural abscesses. J Neurosurg Spine. 2022;36(4):660-669. doi:10.3171/2021.7.SPINE21689

Patel

Alton

Bransford

Lee

Bellabarba

Chapman

. Spinal epidural abscesses: risk factors, medical versus surgical management, a retrospective review of 128 cases. Spine J. 2014;14(2):326-330. doi:10.1016/j.spinee.2013.10.046

Vettivel

Bortz

Passias

Baker

. Pyogenic vertebral column osteomyelitis in adults: analysis of risk factors for 30-day and 1-year mortality in a single center cohort study. Asian Spine J. 2019;13(4):608-614. doi:10.31616/asj.2018.0295

Yagi

Fujita

Okada

, et al. Impact of frailty and comorbidities on surgical outcomes and complications in adult spinal disorders. Spine. 2018;43(18):1259-1267. doi:10.1097/BRS.0000000000002596

10.

Dominguez

Shah

Ampie

, et al. Spinal epidural abscess patients have higher modified frailty indexes than back pain patients on emergency room presentation: a single-center retrospective case-control study. World Neurosurg. 2021;152:e610-e616. doi:10.1016/j.wneu.2021.06.035

11.

Lener

Wipplinger

Lang

Hartmann

Abramovic

Thomé

. A scoring system for the preoperative evaluation of prognosis in spinal infection: the MSI-20 score. Spine J. 2022;22(5):827-834. doi:10.1016/j.spinee.2021.12.015

12.

Lindsey

Xiong

Lightsey

, et al. C-Reactive protein-to-albumin ratio in spinal epidural abscess: association with post-treatment complications. J Am Acad Orthop Surg. 2022;30(17):851-857. doi:10.5435/JAAOS-D-22-00172

13.

Hunter

Cussen

Baker

. Predictors of failure for nonoperative management of spinal epidural abscess. Global Spine J. 2021;11(1):6-12. doi:10.1177/2192568219887915

14.

Karhade

Shah

Bono

, et al. Development of machine learning algorithms for prediction of mortality in spinal epidural abscess. Spine J. 2019;19(12):1950-1959. doi:10.1016/j.spinee.2019.06.024

15.

Shah

Karhade

Groot

, et al. External validation of a predictive algorithm for in-hospital and 90-day mortality after spinal epidural abscess. Spine J. 2023;23(5):760-765. doi:10.1016/j.spinee.2023.01.013

16.

Pluemer

Freyvert

Pratt

, et al. A novel scoring system concept for de novo spinal infection treatment, the Spinal Infection Treatment Evaluation Score (SITE Score): a proof-of-concept study. J Neurosurg Spine. 2023;38:396-404. doi:10.3171/2022.11.SPINE22719

17.

Hunter

Baker

. Early reduction in C-reactive protein following treatment for spinal epidural abscess: a potential treatment guide. Global Spine J. 2023;18:219256822211398. doi:10.1177/21925682221139801

18.

Harrell

. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Berlin: Springer International Publishing; 2015. doi:10.1007/978-3-319-19425-7

19.

Faix

. Biomarkers of sepsis. Crit Rev Clin Lab Sci. 2013;50(1):23-36. doi:10.3109/10408363.2013.764490

20.

Thomas

Storey

. The role of platelets in inflammation. Thromb Haemostasis. 2015;114(3):449-458. doi:10.1160/TH14-12-1067

21.

Weaving

Batstone

Jones

. Age and sex variation in serum albumin concentration: an observational study. Ann Clin Biochem. 2016;53(Pt 1):106-111. doi:10.1177/0004563215593561

22.

Van Calster

McLernon

van Smeden

Wynants

Steyerberg

Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative . Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17(1):230. doi:10.1186/s12916-019-1466-7

23.

Steyerberg

Vergouwe

. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925-1931. doi:10.1093/eurheartj/ehu207

24.

Devellis

Thorpe

. Scale Development Theory and Applications. Vol. 26. 2nd ed. Thousand Oaks, CA: Sage Publications; 2021.

25.

Cronbach

. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16(3):297-334. doi:10.1007/BF02310555

26.

Frankel

Hancock

Hyslop

, et al. The value of postural reduction in the initial management of closed injuries of the spine with paraplegia and tetraplegia. I. Paraplegia. 1969;7(3):179-192. doi:10.1038/sc.1969.30

27.

Predictive Algorithms – SORG Orthopaedic Research Group . https://sorg.mgh.harvard.edu/predictive-algorithms/. Accessed May 18, 2023.

28.

Justice

Covinsky

Berlin

. Assessing the generalizability of prognostic information. Ann Intern Med. 1999;130(6):515-524. doi:10.7326/0003-4819-130-6-199903160-00016

29.

Kim

Melikian

, et al. Independent predictors of failure of nonoperative management of spinal epidural abscesses. Spine J. 2014;14(8):1673-1679. doi:10.1016/j.spinee.2013.10.011

30.

Schell

Kim

Trivedi

Ahn

. 30-day mortality following surgery for spinal epidural abscess: incidence, risk factors, predictive algorithm, and associated complications. Spine. 2019;44(8):E500-E509. doi:10.1097/BRS.0000000000002875

31.

Schoenfeld

Wahlquist

. Mortality, complication risk, and total charges after the treatment of epidural abscess. Spine J. 2015;15(2):249-255. doi:10.1016/j.spinee.2014.09.003

32.

Strayhorn

Collins

Brick

, et al. Using factorial mediation analysis to better understand the effects of interventions. Transl Behav Med. 2022;12(1):ibab137. doi:10.1093/tbm/ibab137

33.

Page

Gui

Steiner

Ammanuel

Greeneway

Brooks

. External review and validation of a spinal epidural abscess predictive score for clinical failure. World Neurosurg. 2022;163:e673-e677. doi:10.1016/j.wneu.2022.04.068

34.

Page

Greeneway

Ammanuel

Brooks

. Development and validation of a predictive model for failure of medical management in spinal epidural abscesses. Neurosurgery. 2022;91(3):422-426. doi:10.1227/neu.0000000000002043

35.

Biau

Ferguson

Chung

, et al. Local recurrence of localized soft tissue sarcoma: a new look at old predictors. Cancer. 2012;118(23):5867-5877. doi:10.1002/cncr.27639

36.

Xiong

Crawford

Striano

Lightsey

Nelson

Schwab

. The NIMS framework: an approach to the evaluation and management of epidural abscesses. Spine J. 2021;21(12):1965-1972. doi:10.1016/j.spinee.2021.05.012

Keeping It Simple: Developing a Prognostic Tool for Spinal Epidural Abscess

Abstract

Study Design

Objective

Methods

Results

Conclusions

Keywords

Introduction

Material and Methods

Ethics

Setting and Participants

Variables

Outcomes

Bias

Statistics

Results

Cohort Description

Factorial Analysis

Point-Based CSF Score

Score Performances

Discussion

Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

Ethical Statement

ORCID iDs

References