Abstract
Study Design:
Narrative review.
Objectives:
To review the relevant literature regarding scoring systems for vertebral metastases and quantify their role in contemporary orthopedic practice.
Methods:
A literature search of PubMed, Google Scholar, and Embase was performed on February 7, 2017. Eight scoring systems were selected for detailed review—7 of which were scores focused solely on patient prognosis (Tokuhashi, Tomita, Bauer, Oswestry Spinal Risk Index, Van der Linden, Rades, and Katagiri). The eighth system reviewed was the Spinal Instability Neoplastic Score, which examines for impending spinal instability in patients with vertebral metastases and represents a novel approach compared with hitherto scoring systems.
Results:
The Bauer and Oswestry Spinal Risk Index have the most accurate prognostic predictive ability, with the newer Oswestry Spinal Risk Index being favored by the contemporary literature as it demands less investigation and is therefore more readily accessible. There was a growing trend in studies designed to customize scoring systems for individual cancer pathological subtypes. The Spinal Instability Neoplastic Score shows good reliability for predicting instability among surgeons and oncologists.
Conclusions:
The increased understanding of cancer pathology and subsequent development of customized treatments has led to prolonged survival. For patients with vertebral metastases, this affects surgical candidacy not only on the basis of prognosis but also provides prolonged opportunity for the development of spinal instability. Scoring systems have a useful guidance role in these deciding for/against surgical intervention, but in order to remain contemporary ongoing review, development, and revalidation is mandatory.
Introduction
With the advancement in treatments and consequent prolonged survival across a myriad of cancer diagnoses, the dilemma of whether to offer surgery for spinal metastases is an increasingly common clinical scenario. The true incidence of spinal metastases is unknown. 1 Estimates of prevalence based on autopsy examination of vertebral bodies have ranged from 36% to 70%. 2 –4 What is known is that the incidence of surgical intervention for spinal metastases is increasing. 5 Potential benefits from surgical intervention must be balanced against the associated morbidity, risks, and the envisaged length of postoperative survival. To aid this decision, numerous prognostic scoring systems for patients with vertebral metastases have been designed. This article aims to review contemporary vertebral metastasis scoring systems of most clinical relevance.
Search Methodology
A comprehensive search of PubMed, Embase, and Google Scholar was performed to include literature until February 7, 2017. Search terms included “scoring system” or “score” and “metastatic” or “metastases” and “spine” or “vertebral.” All scoring systems designed to aid assessment of surgical candidacy for patients with vertebral metastases were considered. The search limits were English language and full text available. Abstracts and presentations were not considered. Reference lists from the identified articles were further scrutinized to identify any additional studies of interest.
Overview of Scoring Systems and Selection Criteria for Detailed Review
There have been a number of approaches adopted when addressing patients with vertebral metastases. These include, but are not limited to, studies designed for the entire spectrum of vertebral metastasis or those designed to more specifically target early or late stages, scoring systems designed to help decipher surgical candidates as opposed to those designed to identify radiotherapy candidates, scores designed to consider all cancer types in contrast to those that exclude certain primary tumor types, and finally scores designed to predict survival/prognosis versus those to evaluate for immediate risk of instability. 6 –9 In the interest of clarity, this article focuses on 5 scoring systems most likely to be of clinical use to orthopedists: the Tokuhashi score (TS), the Tomita score, the Bauer score, the Oswestry Spinal Risk Index (OSRI), and finally the Spinal Instability Neoplastic Score (SINS). 6,7,10,11 The Van der Linden scoring system, the Rades score, and the Katagiri score are considered in less detail. While these 3 scores have each made valuable contributions, they have not been considered as comprehensively by the literature nor are they as widely utilized clinically. Scores were considered for inclusion within this review based on the selection criteria outlined in Table 1.
Selection Criteria for Scoring Systems in Vertebral Metastases.
Tokuhashi Score
First published in 1990 and updated 2005, this is perhaps the most widely recognized prognostic scoring system for spinal metastases. 6,12 The prototype TS consisted of 6 parameters: general condition, number of extraspinal bony metastases, number of vertebral body bony metastases, metastases to other internal organs, primary site of cancer, and the presence of palsy (as classified by Frankel et al 13 ). The original 6 parameters were retained in the updated version; however, important changes were made to the calculation and interpretation of the score, most notably an increase of maximum score from 12 to 15. The additional 3 points came from adding 3 new facets to the primary tumor component of the score. Table 2 provides the updated means by which each parameter is allocated value and the newly defined prognostic categories for score interpretation. 6
The Revised Tokuhashi Score.
Abbreviations: PS, performance score; Frankel, Frankel score.
Since the introduction of the revised TS, numerous authors have attempted to examine its external validity, and these are summarized in Table 3. The reported predictive accuracy of the TS ranged from 51% to 88%, with multiple investigators concluding it to be a suboptimal prognostic predictor. 14 –20 Other authors have questioned importance of certain parameters, for example, primary site and whether or not these have any significant impact on prognosis and therefore perhaps should not feature in prognostic scoring systems. 21 However, despite limitations in predictive value the TS has enjoyed relative acceptance among authors who concluded that it nevertheless makes a useful contribution to the decision-making process. 17,19,22 –24 On the other hand, the relative infrequency of prospective studies along with the heterogeneous methodology and low numbers seen in the retrospective studies must be considered. A review by Zoccali et al, which examined 10 studies from the period 2007 to 2013, concluded that the TS was more useful for patients with a good prognosis but less helpful in patients expected to survive less than 1 year. 25 Additionally, other authors have examined the TS with respect to specific tumor subtypes (eg, lung/myeloma) and found the TS to be inaccurate. 26,27 Within this context, caution is advised to any clinician who may either disregard or readily adopt the current TS based on the literature to date.
Studies That Have Examined the Prognostic Performance of the Revised Tokuhashi Score.
Abbreviations: TS, Tokuhashi score; MSCC, metastatic spinal cord compression.
aWhere provided, overall accuracy is preferentially quoted. Where accuracy was assessed at specified follow-up/dates, this is also below the along result.
Tomita Score
The Tomita score was first proposed in 2001. 7 It was developed retrospectively based on 67 patients from a single center. Following on from the work of Tokuhashi et al, the Tomita score was streamlined to 3 parameters: tumor growth, visceral metastases, and the number of bony metastatic lesions. These factors had been selected using Cox regression and hazard ratios resulting in, in theory, a scoring system in which the components are weighted according to their prognostic importance. The Tomita score is outlined in Table 4.
The Tomita Score.
Multiple studies have reported significant association between survival and the Tomita score. 11,23,28 –31 The simple design of the Tomita score as well as its patient-centered approach has incurred favor as a clinical score to be used independently, or often in combination with the TS. 32,33 Other authors have favored the use of the Tomita score due to the emphasis it places on the biology of the primary tumour. 23,30 This emphasis on primary tumor pathology may be the reason it has been shown to maintain predictive value even when specific pathological subtypes (eg, prostate cancer) are examined. 34
While the Tomita score is more user friendly than the TS, it has also been reported to have suboptimal reliability when predicting survival. 31 Like the Tokuhashi score, the Tomita has also been found to be inaccurate with regard to specific tumor subtypes. 26,27 Furthermore, the methodology behind the creation of the Tomita score has been criticized for lack of supporting data in the original paper and poor specificity in predicting prognosis. 10 Several authors have directly compared the performance of the Tokuhashi and Tomita scores and found the TS to be superior. 28,32,35 Nevertheless, the Tomita score continues to play a role in clinical decision making, most likely because the data required is readily accessible to the majority of clinicians.
Modified Bauer Score
The original Bauer score was designed to address both spinal and extremity metastases. First published in 1995, it predates both the Tokuhashi and the Tomita scores. 36 It arrived after the original iteration of the Tokuhashi and predates the Tomita score. The Bauer scoring system is based on 2 consecutive, prospective series of cancer patients and used Cox regression analysis to identify weighted prognostic variables. Bauer et al identified 5 key criteria from their analysis: absence of visceral metastases, absence of pathological fracture, solitary skeletal metastasis, not primary lung cancer, and primary tumor is breast, kidney, lymphoma, or myeloma. In their analysis of 7 preoperative scoring systems for spinal metastases, Leithner et al modified the Bauer score by excluding scoring for pathological fractures (Table 5). 30 Leithner et al reported the modified Bauer to have the best correlation with survival period across all 7 spinal prognostic scores reviewed. 30 This has been replicated in 2 other studies reviewing 7 and 6 prognostic scores, respectively. 37,38 Consequentially, the modified Bauer score is considered superior when assessing heterogeneous patient populations. 39 Furthermore, like the Tomita score, it is also a relatively simple and user-friendly scoring system. 38 It is not known what scoring system is most favored in clinical practice; however, the modified Bauer score is currently the most favored prognostic scoring system in the literature. However, the modified Bauer score only recognizes 4 primary cancer types within its calculation and delivers a relatively simplistic model with broad survival categories. While this may aid the modified Bauer model to appear statistically precise, clinically there is likely to be a large spectrum of prognoses within these subgroups. This may become increasingly apparent within the context of prolonged survival, particularly in the case of breast cancer, which is at present classified in the same way as renal cancer by the Bauer score, but has been recommended to be downgraded from “fast growth” to “moderate growth” in the Tomita score, and to obtain a score of “3” instead of “5” in the TS. Unlike the Tomita score and TS, the Bauer score is not so easily adaptable.
The Modified Bauer Score.
Other authors have identified the lack of consideration of general health and nutritional status as a limitation of the modified Bauer score. 40 Ghori et al further refined the modified Bauer to include preoperative ambulatory status and serum albumin in an attempt to correct for this. 40 Ghori et al proposed this improved the accuracy from 64% to 74%. Goodwin et al subsequently examined the accuracy of this modification on an independent cohort of 161 patients and found this new modification to be 80% accurate. 41 While this may be a statistical improvement to explain variance in 1-year survival, it removes the advantage of the Bauer score’s user-friendliness. Work to further hone the Bauer score continues.
Oswestry Spinal Risk Index
Balain et al undertook a prospective cohort analysis of 199 patients presenting with spinal metastases and compared the revised Tokuhashi, the Tomita, and the modified Bauer scores. 11 In so doing, a new model was developed and named the Oswestry Spinal Risk Index:
where OSRI is the Oswestry Spinal Risk Index; PTP the primary tumor pathology; and GC the general condition (as graded using the Karnofsky Performance Status).
A more detailed outline of how the score is calculated is available in Table 6. Of note, scoring of general condition is reversed compared with the revised TS in order to obtain an “index of risk.” The developers of the OSRI reported similar concordance of the OSRI when compared with the Tomita, revised Tokuhashi, and modified Bauer. The authors further report that the OSRI has a larger coefficient of determination than any of the 3 other scores. The coefficient of determination is a measure of how well a given model explains variation within the sample. It is worth noting, however, that the differences in these coefficients were not large; for example, revised Tokuhashi coefficient = 0.18; OSRI coefficient = 0.28. This implies that the OSRI was marginally superior in explaining variation within the sample (and therefore a more accurate predictor of survival). While this superior explanation of sample variation may be statistically significant, it remains to be seen if whether or not it is clinically significant.
The Oswestry Spinal Risk Index.
Abbreviation: KPS, Karnofsky Performance Status.
aTotal score is the sum of the 2 subscores.
The analysis performed by Balain et al is well designed, using appropriate and relatively sophisticated statistical techniques. Rather than investigating for a difference in survival between individual prognostic subgroups defined by each given scoring system, Balain et al tested for the overall calibration and discriminatory power of each scoring system. This allowed the investigators to distinguish if any system outperformed another for the sample as a whole. The result was somewhat surprising; no one scoring system was found to be superior.
On this basis, the authors undertook to develop a streamlined system to be useful in clinical practice without the need for extensive investigations, thus avoiding what may ultimately be unnecessary delay to decision making. The authors also suggest that mitotic index may have increased the accuracy of the model, but the contribution of this variable was not examined by Balain et al.
Recently, the OSRI has been externally validated by 2 author groups. Whitehouse et al and Fleming et al both applied the OSRI to 100 and 121 patients, respectively, and found that the newly established risk index held true when tested on independent cohorts. 42,43 The attractiveness of the OSRI is its simplicity with only 2 key factors considered and it does not rely on a full imaging workup.
Spinal Instability Neoplastic Score
Developed by the Spine Oncology Study Group (SOSG) in 2010, the SINS is composed of 6 parameters (spine neoplasia location, mechanical pain, type of bone lesion, spinal alignment, presence of vertebral body collapse, involvement of the posterolateral elements). The SINS is unique when compared with other established scoring systems used in decision making in patients with vertebral metastases as it is designed to estimate the degree of instability (and therefore indicate the necessity for immediate intervention) rather than the traditional approach of estimating overall prognosis. This is an ambitious goal given that, by the authors’ own admission, spinal instability has not been clearly defined in the literature. SOSG consequentially defined neoplastic instability as “loss of spinal integrity as a result of a neoplastic process that is associated with movement-related pain, symptomatic or progressive deformity, and/or neurological compromise under physiological loads.” 44
The SINS was devised using a combination of the “best available” literature and expert opinion. The utilized literature was derived from 2 systematic reviews performed by members of the SOSG panel. The first systematic review addressed cervical instability within the context of neoplasia and the second examined what defines instability within the context of metastatic disease in the thoracolumbar spine. The review addressing cervical instability has not been published to date while the review addressing the thoracolumbar spine was made available in the literature in 2011. 45 According to the SOSG, these 2 systematic reviews provided a framework for the development of the SINS via a systematic method of distillation of expert opinion known as the Delphi technique. 46 The 6 different parameters were assigned differing numerical grades with a minimum score being 0 and a maximum score being 18. The authors then classified 0 to 6 as stable, 7 to 12 as impending instability, and 13 to 18 as unstable. The authors recommend surgical consultation for any patient with a score of 7 or more.
Unlike prognostic scoring systems, the SINS estimates a risk of an event that is preventable (and ideally should not occur), which makes statistical validation difficult. This is further clouded by the fact that “spinal instability” is a concept that is not only without a clear, consensus definition but is furthermore a clinical concept rather than a physical finding. This explains the use of expert opinion rather than data for the derivation of the SINS. Without the availability of data to examine the accuracy of the SINS, investigators have targeted interrater reliability as a means of assessing validity.
In 2016, Arana et al examined the use of the SINS by 132 clinicians using a database of 90 patients and concluded that the SINS enjoys relative consistency between differing specialties (including oncologists, radiologists, surgeons, etc). 47 This comes with 2 significant caveats however; first (as the editors of the relevant journal acknowledge) within the cohort analyzed reviewer familiarity with the SINS is likely to be higher than the control population, potentially biasing the study; and second (somewhat surprisingly) interobserver agreement was only “fair” among orthopedic surgeons, radiologists, physicians with ≥14 years of clinical experience. On the other hand, Fourney et al reported near-perfect agreement among members of the SOSG; however (as coauthors of the SINS), once again this sample would have increased familiarity with the SINS. 48
Campos et al performed a similar study using a smaller sample size totaling 6 clinicians (3 spinal surgeons, a general orthopedist, a radiotherapist, and a palliative care physician), reporting excellent agreement, while Fisher et al compared the reliability of the SINS between 37 radiologists and also reported excellent agreement. 49,50 Somewhat contrary to the findings of Arana et al, Teixeira et al found decreased agreement for clinicians with limited experience with spinal metastatic disease. 51 One should note, however, a key difference between the studies performed by Arana et al and Teixeira et al is the composition of their reviewers; for Arana et al the reviewers comprised radiologists (23), radiation oncologists(22), orthopedic surgeons (16), neurosurgeons (14), and oncologist (8), all of whom would have a baseline experience with the scenario of vertebral metastases, while Teixeira et al compared 6 orthopedic surgeons, 3 non-orthopedic surgeons, and 4 general practitioners.
This dichotomy between Arana and Teixeira is important, as one of the intended goals of the SINS was to produce a score that would be useful not only to surgeons but also to all specialties when attempting to estimate instability risk. The SOSG suggest that the SINS can be used as a triaging tool to help physicians decide when surgical input is warranted. While, on balance, the literature would support the conclusion of strong agreement for the SINS between clinicians who routinely encounter vertebral metastases, Teixeira et al’s work suggests SINS reliability among practitioners where this scenario is outside the scope of routine practice remains in doubt. This area requires further examination.
However, Versteeg et al examined over 300 cases in 2 centers between 2009 and 2013 and concluded that the SINS had resulted in increased awareness of instability among practitioners and consequent earlier referral for assessment. 52 Following on from this Versteeg et al performed a systematic review of the literature examining the impact of the SINS on oncologic decision making and concluded the SINS had provided a more uniform framework for reporting of spinal neoplastic instability in the literature. 53 There is also a growing trend within the literature of exploring the use of the SINS as a marker for risk of other adverse skeletal events, for example, as a predictor for compression/pathological fractures (Aiba et al 54 ) and radiotherapy failure (Huisman et al 55 ). The SINS is therefore likely to gain increasingly widespread familiarity and use among all clinicians.
Van der Linden Score
In 2005, Van der Linden developed a prognostic scoring system based on a randomized control trial of 342 consecutive patients managed with radiotherapy (Table 7). 8 These patients were derived from a larger database maintained by the Dutch Bone Metastasis Study (DBMS) group. Although the sample size was relatively large, there were a number of inclusion and exclusion criteria that may have skewed the sample. First, none of the 342 patients had neurological impairment at the time of entry to the trial. Second, patients with renal cell carcinoma and multiple myeloma were excluded, as were patients with cervical metastases. Finally, only patients who could be managed within a single radiation treatment field were included. This resulted in only 30% of all the randomized patients within the DBMS data set being included within the study by Van der Linden et al.
The Van der Linden Score.
Abbreviations: KPS, Karnofsky Performance Score.
aGroup A median survival = 3.0 months. Group B median survival = 9.0 months. Group C median survival = 18.7 months.
Similarly to the OSRI, the Van der Linden score included the performance status and the primary tumor type, with the addition of the presence or absence of visceral metastases. The Van der Linden score was further validated on 231 Canadian patients with spinal metastases, but again this was based on a sample with no significant neurological involvement or bony instability. 56 Given the system was designed and validated to select radiotherapy candidates in patients without spinal cord compression or vertebral bone instability, as well the above-mentioned restrictions on the original sample, it may be considered to be of limited value to orthopedic surgeons. Leithner et al concurred with this viewpoint when they examined the value of 7 prognostic scoring systems with vertebral metastases. 30
Rades Score
The Rades score was designed to estimate the survival of patients with metastatic spinal cord compression (MSCC). 9 Published in 2008, the Rades score is derived from Cox multivariate survival analysis of 1852 patients, all of whom had MSCC managed with radiotherapy. The Rades score is outlined in Table 8. Using this scoring system, Rades et al successfully differentiated which patients benefitted from longer courses of radiotherapy. The Rades score considers the type of primary tumor, the presence of bone and/or visceral metastases, the time lag between diagnosis and development of MSCC, the ambulatory status before radiotherapy, and the duration of motor deficits before radiotherapy. The Rades score was validated on a sample of 439 patients in 2010. 57 The Rades score is designed only for patients with advanced vertebral metastases and ongoing MSCC. Its purpose is to identify those patients most suited to a short rather than long course of radiotherapy. This is perhaps why it has better predictive power in patients with short survival and poorer predictive power in longer surviving patients. 58 Interestingly, this is in direct contrast to the Van der Linden score, which excludes patients with active spinal cord compression and better predicts prognosis in longer surviving patients. 58 Ultimately, however, the Rades score (like the Van der Linden score) adds limited value to orthopedic practitioners.
The Rades Score.
Abbreviations: MSCC, metastatic spinal cord compression; RT, radiotherapy.
Katagiri Score
The Katagiri score was designed as a prognostic score for patients with bony metastasis at any site. 59 Katagiri et al performed a retrospective review of 350 patients who had received treatment (either surgical or nonsurgical) and identified 5 significant prognostic factors for their sample. These factors included the primary lesion, visceral metastases, performance status, multiple skeletal metastases, and perhaps, most notably, a history of chemotherapy. An updated version of the Katagiri score was published in 2014, with the addition of abnormal laboratory data as a further prognostic factor and reported increased accuracy. 60 The updated Katagiri score is outlined in Table 9. The score ranges from 0 to 10; for interpretation purposes, Katagiri et al categorized the scores as follows: ≤3 = low-risk group with a survival rate of >80% at 12 months; 4 to 6 = intermediate-risk group with a survival rate of 30% to 80% at 12 months; and 7 to 10 = high-risk group with a survival rate of <10% at 12 months.
The Revised Katagiri Score.
Abbreviations: CRP, C-reactive protein; LDH, lactate dehydrogenase.
aDisseminated metastases: pleural, peritoneal, or leptomeningeal.
bAbnormal laboratory values: CRP ≥ 4 mg/dL; LDH > 250 IU/L; albumin < 3.7 g/dL.
cCritical laboratory values: <100 000/dL (platelets); serum calcium ≥ 10.3 mg/dL; or total bilirubin ≥1.4.
In the description of the original sample, Katagiri et al report that 309 patients had metastases to axial bone, but do not state if these involved the vertebral column. This data is also not provided with the updated score. This perhaps calls into question the usefulness of this scoring system for the uniquely complex scenario arising around vertebral metastases. Furthermore, in the data set used to generate the updated Katagiri score, only 59 underwent surgical intervention. The Katagiri score therefore needs validation in a patient group in which a significant number of patients underwent surgery in order to be applicable to potential surgical candidates. However, perhaps the most contentious issue with the Katagiri score is its inclusion of prior chemotherapy. How well a scoring system can account for the degree of intervention and sensitivity of a tumor to chemotherapy is unclear. 61
Discussion
Modern practice demands scoring systems that can be used not only as a clinical adjunct in decision making but also for comparative audits and multicenter research. The large number of scorings systems for a relatively specific pathology reflects the desire for such an instrument and also the failure of any one scoring system to meet these requirements. 39 Predicting cancer prognoses, in any context, is difficult. It is therefore to be expected that any system designed to predict the prognosis of cancer patients with the additional complication of vertebral metastases faces numerous inherent challenges. As noted by Chen et al, up until the advent of the OSRI, all major spinal metastases scoring systems were composed of 3 central factions: the primary tumor type, the patients’ general medical condition, and the metastatic burden. 62 Each of these components has unique weaknesses, regardless of the permutation or combination used to create a scoring system.
With regard to the incorporation of primary tumor subtype within scoring systems, each of these primary tumors will have their own natural history. Lung cancer is classically considered very aggressive with a poor prognosis across scoring systems, while other primary tumor types display several variations in natural history depending on unique characteristics, for example, hormone receptor status in breast cancer. Wang et al demonstrated that depending on breast cancer hormone status both the Tokuhashi and the Tomita scores should be altered. 63 While this seems like a sensible progression, it may add another layer of complexity without any real-life benefit. Tan et al subsequently compared this modification in a cohort of 132 patients and reported a slight increase in accuracy but this difference was neither clinically nor statistically significant. 64 The challenge is for a scoring system to be both easily computable (and therefore clinically user-friendly) and also account for the finer nuances of each cancer subtype. Several authors have addressed this issue by developing customized scoring systems for specific tumor pathologies including prostate, gastric, and nasopharyngeal carcinomas. 65 –67
Oncologic treatments and investigations continue to evolve dynamically. The majority of prognostic scoring systems to date have incorporated in some way the patients’ general medical condition. In comparison to the presence or absence of other sites of metastasis or indeed the pathology of the primary tumor, a patient’s general condition is much more difficult to define and there is no consensus of the most effective method to do this. Without question, the most popular methods used by scoring systems to date has been the Karnofsky Performance Status. 68 Other systems have used alternative methods—the Eastern Cooperative Oncology Group status is used by the Katagiri score. 7,60,69 However, neither the modified Bauer score, perhaps the most robustly validated score to date, nor the Tomita score account for the general condition. 7,30,37,38 The contribution of general medical condition to date is varied, although in the most recently developed system, the OSRI, it is 1 of the 2 components.
Tokuhashi et al noted that many patients with vertebral metastases are not suitable candidates for surgical intervention. 69 It is therefore understandable that many scoring systems incorporated the presence or absence of either other bony metastases or visceral metastases, as their presence may negate a patient’s ability to tolerate surgery. The Tokuhashi, Tomita, and the modified Bauer scores all account for at least one of these variables. When Balain et al developed the OSRI they also found the presence or absence of visceral metastasis to be significantly associated with survival but nevertheless excluded it from their model. The authors explain that this is because it had similar predictive ability to other factors already incorporated within the OSRI, but required significantly more investigation. A difficulty with any system that relies on extensive imaging is that this is not always readily available. A patient presenting out-of-hours or in a state of neurological compromise can sometimes mandate urgent surgical intervention as opposed to urgent investigation, thereby rendering some of the scoring systems less useful.
Another challenge with the use of scoring systems, even those developed within the last decade, is that the evolution of oncologic treatment and ever-improving ability to investigate disease burden has rapidly outdated them. What disease characteristics correlated well or poorly in the past may not necessarily do so in the present day if radiotherapy, chemotherapy, or surgical regimens have improved. The development of tyrosine kinase has significantly improved survival time for both metastatic renal cell carcinoma as well as lung cancer and is but one example of the dramatic affect new forms of treatments can have on patients’ outlook. 70,71 On the surgical side, percutaneous techniques for spinal stabilization have meant, in theory at least, that patients can undergo a surgical procedure with a lower risk of blood loss and with less morbidity than previously possible.
Prognostic scoring systems have successfully aided difficult decisions for decades across the diaspora of clinical scenarios: trauma, burns, congestive cardiac failure, and more. Bone metastases most commonly affect the axial skeleton. 72 Many patients with vertebral metastases are not operative candidates, precluding the surgeon from providing pain-relieving and/or quality of life improving procedures. However, in many cases metastatic bone disease is a chronic, progressive condition with expected survival ranging from months to years. 72 Therefore, accurately predicting survival is the key factor when selecting treatment modality for this cohort. 30 This then begs the question: “How accurate/reliable is any given scoring system for vertebral metastases?” As seen from Table 10, not only is this not definitively answered by the literature, it is also not addressed in any consistent manner. Scoring system accuracies continue to be quoted in a myriad of different ways (eg, correlation coefficients, kappa and risk indices, % accuracy at specified follow-up dates, c/r 2 statistics estimating the accuracy of fit of the prediction algorithm to the actual data sample, etc). These represent a broad spectrum of conceptual approaches (many of which use differing methods of calculation) in how to evaluate the accuracy of any system, for example, from simple comparison of actual versus predicted survival to estimating how well the score accounts for variation within a given patient cohort, often confusing the matter further. This level of statistical heterogeneity makes it almost impossible for the clinician to conclude which system is theoretically superior.
Quoted Statistical Methods and Results for Evaluating Accuracy of Survival Prediction Scoring Systems.
aPaper does not address a specific primary tumor subtype, for example, renal cell carcinoma.
bWithin the category of Survival Prediction Accuracy and Presence of Statistical Significant Association categories numerous methods of analyses are quoted in the literature including the following: Cox regression analysis, 15,22 Fisher exact test, 16,69,86 log rank, 24,29,30,74,87,88 chi square, 19 Spearman’s rank, 88 McNemar, 28 Spearman’s test, 42 and direct comparison of hazard ratios.
While the clinical usefulness of scoring systems is beyond doubt, they must be considered for what they are—imperfect prognostication estimates. Contemporary practice is aided by an ever-expanding knowledge and understanding of tumor biology as well as advances in surgical approaches and instrumentation. 73 While in the wider context of patient care this represents welcome progress, it also means that individual scoring systems cannot be considered contemporary for long and need constant updating. Since 2007, and in particular within the past 4 years, many authors have reviewed the established (primarily the Tokuhashi and Tomita scores) prognostic systems and have concluded them to be suboptimal. 16 –18,20,28,31,74 Consequentially, multiple authors have suggested modifications of existing scoring systems either by adding new variables, for example, laboratory values, or customizing existing systems to individual primary tumor pathologies. 24,28,31,62,75,76 The potential need to tailor scoring systems to include tumor-specific as well as tumor nonspecific factors was most recently echoed by Luksanapruksa et al in a systematic review and meta-analysis of prognostic factors in patients with spinal metasatses. 77
Furthermore, disagreements of the reliability of scoring systems may not only be due to advancements treatment but also due to differences in both patient groups and approaches to management, for example, the usefulness of the Tokuhashi varies between patients populations (Canada vs Iran vs South Korea). 16,32,78 Multiple studies have been performed comparing scoring systems, yet no consensus has been reached. 30,37,38 Each attempt at survival prediction has its own strengths and weaknesses. Tomita and Tokuhashi (perhaps the 2 most established scoring systems) fail to differentiate between the medium- and long-term prognostic groups (as outlined by timescales provided by each scoring system)—the Tomita score failed to significantly differentiate those expected to survive 2 to 3 years compared to those expected to survive 4 to 5 years. 38 This may be seen as a realization of the problems discussed above. Therefore, scoring systems must be taken within the context of the health care system and resources available. Ultimately, as noted by Pointillart et al, the potential for rapid and maintained improvement in clinical outcome and quality of life when selecting surgical candidates takes precedence rather than basing decisions on prognostic factors or scoring systems. 74 This sentiment has been echoed by numerous authors advocating individualized treatment and the involvement of multidisciplinary teams. 20,38,76,79 –82
This therefore broadens the circumstances for surgical intervention in the context of vertebral metastases. It may be the case that prognostic scoring systems will be unable to stay current with rapid progress in treatment across the spectrum of primary tumor diagnosis (all of which to not advance at the same pace). The literature has therefore sought alternative information to aid the decision-making process, and the most obvious reaction within the literature is the development of the SINS, which identifies instability as a key factor when considering surgical intervention. Other variables now considered within the literature include functional status, cost, and improvements in pain/quality of life. 83 –85
Conclusion
Despite a growing body of literature examining the role of scoring systems in the management of vertebral metastases, sparse conclusions may be drawn. Individual studies have examined the performance of scoring systems for varied primary tumor pathologies, differing indications for intervention, and across multiple patient populations all within the context of local health resources and practices. Given this spectrum of individual cancer pathologies and clinical scenarios, the lack of clear conclusions is perhaps unsurprising. The most consistent finding by the literature is that while individual scoring systems may not enjoy high accuracy across entirety of this spectrum, they are an invaluable resource when considering candidacy for surgical intervention. 17,19,22 –24,30,53 No one system has demonstrated superiority over any of other; however, with regard to survival prediction the Tokuhashi, Tomita, and modified Bauer have the most robust validation data and all have comparable predictive performance. The future role of the SINS has yet to be determined. Ongoing research is required to continually update and validate scoring systems to remain contemporary to modern treatments and practices, and the authors would furthermore recommend streamlining of statistical evaluation methods to facilitate comparison between scoring systems. For now, prognostic scoring systems act as an aid to decision making and no more than a guide to clinical practice.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
