Abstract
Study Design
Retrospective Cohort Study.
Objectives
The purpose of this study was to identify the role of lumbar paraspinal muscle fatty infiltration using the Goutallier classification in predicting perioperative outcomes following elective lumbar surgery.
Methods
A retrospective review was conducted on patients who underwent elective one- or two-level lumbar decompressions or instrumented fusions for degenerative pathology at a single institution over a 3 year period. Patients were stratified by procedure type. Data included demographics, perioperative outcomes, and the 5-item Modified Frailty Index (MFI-5). Fatty infiltration was graded at L4-5 using the Goutallier classification (intraclass correlation coefficient = 0.908). Opportunistic osteoporosis screening used computed tomography-based Hounsfield units (HU) at L1-2. The relationships between Goutallier grade, demographics, MFI-5 score, and postoperative outcomes were analyzed using Chi-squared analyses, Fisher’s exact test, Analysis of Variance, and multivariable logistic and linear regression.
Results
In total, 314 patients met the inclusion criteria. Mean age was 68.9 ± 8.6 years; mean Goutallier score was 2.2 ± 1.1 and MFI-5 was 1.3 ± 1.0. Goutallier score significantly correlated with age, American Society of Anesthesiologists grade, steroid use, MFI-5, discharge disposition, and 180 day complications and reoperation. Subgroup analyses revealed differing associations between Goutallier score and comorbidities/outcomes across procedure types. Multivariable regression confirmed Goutallier score as predictive of 180 day complications, reoperation, non-home discharge, and frailty.
Conclusion
Goutallier score is a predictive marker of frailty and postoperative outcomes in lumbar spine surgery. Goutallier classification is an effective tool that can aid in risk stratification for patients undergoing lumbar spinal surgery.
Introduction
The American population over the age of 65 years is among the fastest-growing in the world and is expected to reach 88 million by 2050, a 105% increase from 2015.1,2 With an increasing rise in the elderly population, chronological age has been an important risk factor in assessing post-operative outcomes after surgery.3,4 More recently, however, frailty has been studied as a measure of perioperative risk assessment in the elderly since it captures patient characteristics that other risk-stratification tools fail to consider, allowing for assessment of physiologic reserve in the face of a significant stressor or intervention.4–6
Frailty, defined as a decrease in strength, endurance, and physiologic reserve, can increase patient susceptibility to stressors and can be due to physical or cognitive causes. 7 Frailty has been shown to lead to an increased risk of adverse outcomes (such as falls, delirium, disability, and death) and has been important in identifying patients that undergo surgery in order to predict their likelihood of negative post-operative outcomes. 8 The psoas muscle area has been used as a general measure of sarcopenia and a prognostic tool to identify patients with a high risk of adverse post-operative outcomes, since loss of muscle mass is a component of assessing sarcopenia in the elderly.9–12 However, the psoas muscle area is only a measure for determining the severity of sarcopenia and there is no consensus on the use of psoas muscle area as an overall marker of sarcopenia.13,14 Furthermore, there are sex-specific and population-specific benchmarks for psoas size when used to diagnose sarcopenia, which makes its use less generalizable to other populations. 15 Overall, the relationship between psoas muscle area and frailty is unclear, as this relationship has been previously studied with conflicting results.16–22
The Goutallier classification is a five-stage grading system based on the extent of fatty infiltration of muscle using magnetic resonance imaging (MRI). Goutallier et al 23 originally described this classification based of the degree of presurgical fatty infiltration of the rotator cuff muscles.
The purpose of this paper was to assess the degree of fatty infiltration of the paraspinal muscles using the Goutallier classification and to determine if the Goutallier score correlates with frailty and perioperative outcomes in patients undergoing lumbar spinal surgery. It was our hypothesis that the Goutallier scoring system can be applied to the paraspinal musculature of the lumbar spine and correlates with frailty and bone density as noted on opportunistic computed tomography (CT) and clinical outcomes. Degeneration of the lumbar paraspinal muscles has been associated with low back pain symptoms, and fatty degeneration associated with these muscles, notably the multifidus, longissimus and iliocostalis, has been observed in degenerative lumbar kyphosis and scoliosis.24,25 The Goutallier classification has been used with great intra- and inter-observer reliability in assessing the degree of fatty degeneration in the lumbar multifidus muscle at both the L4/5 and L5/S1 vertebral levels. 24 Additionally, the Goutallier grade has been shown to correlate with lumbar lordosis and severe disc degeneration. 26
Materials and Methods
Demographics
Inclusion and Exclusion Criteria
Surgical Procedures
This was a single-institution retrospective cohort analysis of adult patients that underwent single or double-level open lumbar decompression, single or double-level instrumented lumbar posterolateral fusion through a mid-line incision, or single or double-level open lumbar fusion with pedicle screws and posterior-based interbody device with a mid-line incision at the L4-5 levels.
Data Collection
Patients undergoing the procedures of interest for degenerative pathologies during the study period were identified using the institution’s electronic medical record system (Epic Electronic Health Record, Verona, Wisconsin, United States). Two additional cohorts were created in addition to the entire cohort, one with patients that only underwent 1 or 2 level decompression and another that underwent 1 or 2 level fusion with or without decompression. Demographic and baseline characteristic data included age, sex, body mass index (BMI), smoking status, diabetic status through preoperative diagnosis, hemoglobin A1c (HbA1c), osteoporotic status through preoperative diagnosis, chronic steroid use, psychiatric illnesses, alcohol use, dependent health status (determined as less than 4 metabolic equivalents), and preoperative American Society of Anesthesiologists Physical Status Classification System (ASA grade). ASA grades 1 and 2 were combined and 3 and 4 were combined. Perioperative outcome measures of interest included length of hospital stay (LOS), and complications/adverse events, reoperation rates, readmission rates, and discharge disposition within 180 days of surgery.
Qualitative assessment of paraspinal muscle fatty infiltration was performed using the Goutallier classification system.
27
We have applied this classification as follows: The fatty composition of the paraspinal muscles, including the multifidus, longissimus, and iliocostalis muscles, were classified independently by 2 medically-trained reviewers into 5 different grades (grades 0-4) based on the adipose-to-muscle ratio using MRI images.26,28,29 The grades of the Goutallier classification are as such: grade 0 is defined as no fatty deposits, grade 1 is defined as some fatty streaks, grade 2 is defined as more muscle than fat, grade 3 is defined as much muscle as fat, and grade 4 is defined as less muscle than fat.
23
The grade was measured on the axial T2-weighted MRI sequence at the L4-5. L4-5 was selected as the locations of the measurement for purposes of standardization, independent of the surgical level. (Figure 1) The overall qualitative measurement of fatty infiltration was calculated as an average Goutallier score for each segment. The reviewers were blinded to the individual patients. The images were randomized and provided again to the reviewers at a later date after initial review to assess the intra-reviewer reliability. The intra- and inter-reviewer reliability was assessed using the intraclass correlation coefficient (ICC). In cases when a disagreement occurred, consensus was reached by a third reader after calculating the ICC. The frailty scores in study subjects were calculated using the 5-item Modified Frailty Index (MFI-5).30,31 T2-weighted Magnetic Resonance Imaging of lumbar paraspinal muscles at the L4-5 Levels. Goutallier score 0 demonstrates no fatty infiltration, Goutallier score 1 demonstrates few fatty streaks within the paraspinal muscles, Goutallier score 2 demonstrates less than 50% fat within the paraspinal muscles, Goutallier score 3 demonstrates 50% of fat within the paraspinal muscles, and Goutallier score 4 demonstrates more than 50% fat within the paraspinal muscles.
Quantitative assessment of osteoporosis screening was conducted using the opportunistic CT protocol for osteoporosis screening. 32 Using Sectra Workstation IDS7 Version 26.1.2.7434 an elliptical region of interest (ROI) was placed at the central portion of the L1 vertebral body on the most recent CT scan of the spine prior to surgery. Areas of sclerosis or degeneration were avoided. If the L1 vertebral body contained sclerosis or degeneration, the L2 vertebral body was used for analysis. The elliptical ROI was enlarged to measure 200 mm2. The mean value of the reading, corresponding to the Hounsfield units (HU) of the vertebral body, was adjusted based on the scanner kilovoltage (kV). For CT scans that had a kV other than 80 kV, 100 kV, or 140 kV, the reading was excluded from the study since the protocol included only those scanner kV. Additionally, if intravenous contrast was used, the HU was adjusted. A HU of <100 corresponded to osteoporosis, HU between 100 and 150 corresponded to osteopenia, and a HU >150 was considered normal. We examined the relationship between Goutallier classification scores and Hounsfield Unit (HU) values in a dataset of 61 paired observations. To evaluate the potential influence of Goutallier grade 0 samples, parallel analyses were conducted on both the complete dataset (n = 61) and a filtered dataset excluding Goutallier grade 0 samples (n = 58).
Data Analysis
Inter-reviewer reliability of the Goutallier classification for lumbar paraspinal muscle grading was calculated using the ICC. Our sample of 314 patients was sufficient to detect moderate-sized, clinically relevant effects (power = 0.80 at a 0.05 significance level) in key outcomes such as discharge destination and complication rates. The relationships between the rated Goutallier grade, frailty index score, perioperative outcomes, and demographics were analyzed using bivariate Chi-squared analyses, Fisher’s exact test, ICC, and multivariate logistic and linear regression to control certain demographics. For analyses regarding CT HU values, data were analyzed using Python (v3.12). Statistical analyses included descriptive statistics, Shapiro-Wilk tests for normality, Pearson and Spearman correlation coefficients, one-way ANOVA, and linear regression. Advanced analyses employed K-means clustering (k = 3 determined via elbow method) to identify natural groupings in the data and multinomial logistic regression to assess predictive capability. For all tests, P ≤ 0.05 was defined as significant. All statistical analyses were performed using SPSS Statistics (v29).
Results
The final cohort consisted of 314 patients. The mean age of the cohort was 68.9 ± 8.6 years, 53.1% female, and had a mean BMI of 29.4 ± 5.9 kg/m2. The mean ASA score was 2.7 ± 0.5, mean HBA1c was 5.6 ± 0.8, mean MFI-5 score was 1.3 ± 1.0, mean length of stay was 4.6 ± 17.1 days, mean surgical time was 213.6 ± 83.1 min, and mean Goutallier score was 2.2 ± 1.1. An ICC of 0.908 was calculated for the Goutallier scores between 2 raters. In the decompression-only group (N = 96), the mean age was 70.6 ± 7.6 years, was 40.4% female, and had a mean BMI of 30.1 ± 6.2 kg/m2. The mean ASA grade was 2.7 ± 0.5, mean HbA1c was 5.7 ± 0.8, mean MFI-5 score was 1.3 ± 0.9, mean length of stay was 2.3 ± 2.2 days, mean surgical time was 143.7 ± 67.8 min, and mean Goutallier score was 2.1 ± 1.0. In the cohort that included patients who underwent fusion with or without decompression (N = 218), the mean age was 68.2 ± 8.9 years, was 58.3% female, and had a mean BMI of 29.1 ± 5.9 kg/m2. The mean ASA grade was 2.6 ± 0.5, mean HbA1c was 5.6 ± 0.9, mean MFI-5 score was 1.3 ± 1.0, mean length of stay was 5.5 ± 20.4 days, mean surgical time was 244.9 ± 69.3 min, and mean Goutallier score was 2.24 ± 1.1.
Patient Demographics and Comorbidities by Goutallier Score for the Entire Cohort
Bold values significant at P-values.
Outcomes Following Lumbar Spinal Surgery by Goutallier Score for the Entire Cohort
Binary Logistic Regression Analyses and Linear Regression Analysis for Goutallier Score for the Entire Cohort
Patient Demographics and Comorbidities by Goutallier Score for the Decompression-Only Cohort
Outcomes Following Lumbar Spinal Surgery by Goutallier Score for the Decompression-Only Cohort
Patient Demographics and Comorbidities by Goutallier Score for the Fusion-with-or-without-Decompression Cohort
Outcomes Following Lumbar Spinal Surgery by Goutallier Score for the Fusion-with-or-without-Decompression Cohort
Bold values significant at P-values.
Analysis of the complete dataset for paired HU values (n = 61) revealed no significant correlation between Goutallier scores and HU values (Pearson r=−0.0102, P = 0.9381; Spearman r=−0.0053, P = 0.9676). One-way ANOVA confirmed no significant differences in mean HU scores across Goutallier grades (F = 0.7503, P = 0.5620). Linear regression demonstrated negligible predictive value (R2 = 0.0001, RMSE = 61.13). Descriptive statistics showed considerable overlap in HU distributions across all Goutallier grades, with mean HU values of 92.12 ± 8.79 for grade 0, 135.92 ± 63.27 for grade 1, 144.60 ± 65.15 for grade 2, 114.96 ± 52.06 for grade 3, and 131.09 ± 71.32 for grade 4. K-means clustering identified 3 distinct clusters that did not align with sequential Goutallier grading: 1 primarily containing high Goutallier grades (3-4) with moderate HU scores (mean = 102.63), another with low Goutallier grades (0-2) with moderate HU scores (mean = 104.11), and a third with mixed Goutallier grades but notably high HU scores (mean = 220.86). Logistic regression models attempting to predict Goutallier categories from HU scores produced poor results (accuracy = 47%). After removing Goutallier grade 0 samples, the findings remained consistent, with no significant correlation (Pearson r=−0.0834, P = 0.5339), negligible predictive value (R2 = 0.0069), and similar clustering patterns.
Discussion
This study demonstrates that the Goutallier classification of the lumbar paraspinal muscles in patients over the age of 50 years old undergoing elective surgery for lumbar degenerative pathologies is significantly associated with age, preoperative health status, preoperative steroid use, complications within 180 days, reoperation within 180 days, discharge destination following surgery, and MFI-5 score. Moreover, after controlling for confounding variables, Goutallier score was found to be predictive of complications within 180 days, reoperation within 180 days, non-home discharge destination, and frailty as measured by the MFI-5 score. However, there was no observed association between Goutallier score and postoperative outcomes such as length of hospital stay, length of surgery, or 180 day readmission. For patients who only underwent decompression, Goutallier score was significantly associated with age, preoperative health status, and postoperative kidney failure. In the fusion-with-or-without-decompression group, Goutallier score was significantly associated with ASA grade, preoperative health status, preoperative osteoporosis, preoperative steroid use, and 180 day reoperation and readmission. In addition, binary logistic regression analyses showed that Goutallier score was predictive of 180 day readmission and reoperation in this group.
Various studies have examined the association between Goutallier scores and lumbar pathologies or surgeries.25,26,29,33–44 In a study by Getzmann et al 35 studying patients part of the Swiss Lumbar Stenosis Outcome Study, the Goutallier score was used to assess paraspinal fatty infiltration at L3 and was applied to 416 patients who were nonsurgically and surgically treated for lumbar spinal canal stenosis. Getzmann et al found that paraspinal fatty infiltration was associated with Spinal Stenosis Measure (SSM) function scores, correlating to disability, and the European Quality of Life 5 Dimensions 3 Level Version, measuring quality of life, but not SSM symptoms scores, correlating to pain, over 3 years. Our study similarly suggests that paraspinal fatty infiltration is associated with preoperative dependent health status and is a predictor of non-home discharge destination, which is associated with lower quality of life and disability compared to home discharge. As noted by several studies, fatty muscle infiltration of the lumbar paraspinal muscles is associated with functional disability.35,45,46 In a retrospective analysis of 184 patients who underwent lumbar surgery, Lee et al 36 found that sarcopenic patients had significantly greater Goutallier scores, measured at the L4-5 levels, than non-sarcopenic patients and that Goutallier score was significantly associated with age. This supports the findings in our study that Goutallier score is age-related and significantly associated with frailty based on the MFI-5 score. Additionally, we found that Goutallier score is predictive of MFI-5 score and therefore may serve as a proxy for frailty. Along with our findings that Goutallier score is predictive of non-home discharge, we believe that preoperative assessment of Goutallier score can be predictive of frailty before and after elective lumbar surgery. Additionally, we found that Goutallier score was predictive of 180 day complications and 180 day reoperation. As healthcare in the U.S. shifts towards a more value-based model, it is important to consider preoperative frailty in patients undergoing elective lumbar surgery to decrease costs related to postoperative complication and reoperation. 47
In our study, we found that patients who only underwent decompression had Goutallier scores that were significantly associated with age. We also found that Goutallier score was not associated with postoperative outcomes except for kidney failure despite this group being older and having a higher BMI, HBA1c, MFI-5 score, Goutallier score, and ASA grade than the entire cohort. In a retrospective cohort analysis of 163 patients who underwent 1-level lumbar microdiscectomy at a single institution, Song et al 38 found that low paralumbar muscle health, which included the Goutallier score, was associated with older age, female sex, higher BMI, and higher Charlson Comorbidity Index. However, the authors did not find differences in the minimal clinically important difference between poor and good paralumbar groups for any patient reported outcomes, which included the Oswestry Disability Index (ODI), visual analog scale for back pain (VAS back), VAS leg, Short Form 12 Physical Component Summary (SF-12 PCS), SF-12 Mental Component Summary (SF-12 MCS), and Patient Reported Outcomes Measurement Information System Physical Function (PROMIS PF). Along with our results that do not show an association between Goutallier score and postoperative outcomes, Goutallier score may not be an appropriate proxy for frailty in this population. One possible explanation for this observation is that decompression procedures are less physiologically taxing than fusion procedures, which can be reflected in lower surgical time and length of stay that we observed in this group. Thus, preoperative frailty may not be a significant factor for postoperative outcomes in this group. Moreover, this group had a smaller sample size (30.6%) than the fusion-with-or-without-decompression group. Therefore, this group may be underpowered to observe significant differences in postoperative outcomes. However, the higher comorbidity measures seen in this population may explain the association with preoperative dependent health status and postoperative kidney failure.
In the cohort that underwent fusion surgery, we found that Goutallier score was predictive of 180 day reoperation and readmission. A retrospective analysis of 46 patients who underwent L4-5 TLIF was conducted by Duan et al, 40 which included classification of preoperative lumbar paraspinal fatty infiltration sing the Goutallier classification, showed that Goutallier score was significantly higher in patients who experienced postoperative adjacent-segment degeneration requiring reoperation. Although we did not characterize the reason for reoperation, the results from Duan et al strengthen our findings that Goutallier score can be used to predict reoperation and readmission, which may be due to adjacent segment degeneration, even though this group was overall less comorbid than the entire group. However, this group had greater mean length of stay and total operation time than the entire cohort, which may contribute to postoperative complications that may explain the increased readmission observed in this group. Additionally, Goutallier score was significantly associated with a preoperative diagnosis of osteoporosis as indicated in the electronic medical record. In a retrospective analysis of postmenopausal women who experienced back pain, Ozer and Guler 39 found that higher Goutallier score of the lumbar paraspinal muscles was significantly associated with lower lumbar vertebrae L1-4 total T-score and bone mineral density. In addition, there was a higher distribution of Goutallier scores in patients with osteoporosis or osteopenia at the L1-2, L2-3, and L3-4 levels. This supports the finding in our study that Goutallier score was significantly associated with a preoperative diagnosis of osteoporosis, which strengthens the relationship between osteoporosis and frailty. However, we did not observe this finding in the total cohort and decompression-only group, which may be due a difference in the reason for surgery between the groups.
Our findings demonstrate that Goutallier scores and HU values represent largely independent measures with minimal correlation, suggesting they capture different aspects of tissue characteristics. The lack of a clear linear relationship between these measures was consistent regardless of whether Goutallier grade 0 samples were included or excluded from the analysis. The identification of 3 distinct clusters through K-means analysis, particularly the existence of a cluster with high HU values across multiple Goutallier grades, indicates heterogeneity in the data that is not captured by the sequential Goutallier grading system alone. This heterogeneity may represent distinct pathophysiological entities or reflect measurement variability. The poor performance of logistic regression models and the low discriminative power of HU thresholds (Youden’s J statistic: 0.14 for full dataset, 0.03 for filtered dataset) further reinforce that these measures should not be used interchangeably for clinical assessment. Rather, our results suggest that Goutallier scores and HU values provide complementary information about tissue quality and should be interpreted in conjunction with one another rather than as substitutes.
There were several limitations of this study. This study involves a retrospective analysis of patients at 1 tertiary care institution that were treated with elective spinal surgery for lumbar degenerative pathologies. Because the cohort selected only includes single and double level fusions and decompression the results we report apply to a certain subgroup of patients. The results of the study may be different if different exclusion criteria are applied. Moreover, patient reported outcome measures were not collected or analyzed in the study. Including these data may be useful in assessing the impact of the Goutallier score on postoperative outcomes such as patient satisfaction following surgery. Future research should also investigate the biological significance of the identified clusters, particularly the high-HU cluster, and explore whether additional imaging or clinical parameters might improve the correlation between these measures or better explain the observed variability.
Conclusion
This study demonstrates the ability of the Goutallier score to predict postoperative complications and reoperation within 180 days of elective surgery for lumbar degenerative pathologies. The Goutallier score may also be useful in counseling patients on the type of discharge destination following surgery and thus the level of disability and quality of life they may have. The Goutallier score can be used as a proxy for frailty for patients undergoing elective lumbar surgery. Spine surgeons can use the Goutallier score in assessing patients with lumbar degenerative pathology in order to guide patient expectations and type of rehabilitation following surgery.
Footnotes
Ethical Considerations
This study received ethical approval from the Medical University of South Carolina Institutional Review Board (IRB) (approval #Pro00122496) on 01/17/2023. This is an IRB-approved retrospective study, all patient information was de-identified and patient consent was not required. Patient data will not be shared with third parties.
Author Contributions
RK, JS, and RR were involved in the development of the project idea. RK, CW, CD, MB, and GR, JL, and RR were involved in data collection, analysis, and interpretation. RK, JS, SL, CN, JG, CR, JL, and RR were involved in article development and review and approved the final version of the article for publication.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
