Abstract
Study design:
Systematic review.
Objectives:
To systematically review, critically appraise and synthesize evidence on use of autologous stem cells sources for fusion in the lumbar spine.
Methods:
A systematic search of PubMed/MEDLINE, EMBASE and ClinicalTrials.gov through February 20, 2020 was conducted comparing autologous cell grafts to other biologics for lumbar spine fusion. The focus was on studies comparing distinct patient groups.
Results:
From 343 potentially relevant citations, 15 studies met the inclusion criteria set a priori. Seven studies compared distinct patient groups, with BMA being used in combination with allograft or autograft not as a standalone material. No economic evaluations were identified. Most observational studies were at moderately high risk of bias. When used for primary lumbar fusion, no statistical differences in outcomes or complications were seen between BMA+autograft/or +allograft compared to autograft/allograft alone. Compared with allograft, data from a RCT suggested statistically better fusion and lower complication rates with concentrated BMA+allograft. When used in revisions, no differences in outcomes were seen between BMA+allograft and either autograft or rh-BMP-2 but fusion rates were lower with BMA+allograft, leading to additional revision surgery.
Conclusions:
There was substantial heterogeneity across studies in patient populations, sample size, biologic combinations, and surgical characteristics making direct comparisons difficult. The overall quality of evidence for fusion rates and the safety of BMA in lumbar fusion procedures was considered very low, with studies being at moderately high or high risk of bias.
Introduction
A total of 2.1 million elective lumbar spinal fusion procedures have been performed between 2004 and 2015 in the United States, with a 62.3% increase between 2004 and 2015. 1 The costs of lumbar fusions increased in those years as well, from 3.7 billion dollars in 2004 to 10.2 billion dollars in 2015. 1 Various cost optimization and reduction strategies have been implemented including the Medicare/Medicaid pay-for-performance program which compares the quality of care to the incurred costs. 2 For spine fusion the choice of graft material is crucial for fusion success and patient reported outcomes. While autologous iliac crest bone grafts (ICBG) remain the “gold standard,” concerns regarding sufficient quantity and quality have led to the development of various osteobiologics.3-5
Given that the majority of traditional osteobiologics lack a cell component, newer cell-based graft materials have been gaining attention in the past decade. 6 In addition to mature osteoblasts, osteogenic progenitor cells play a crucial role in bone formation and remodeling. Sources of progenitor cells for spine fusion include the iliac crest bone, the vertebral body, and adipose tissues.7-9 Bone marrow aspirates contain bone-specific progenitor cells and growth factors, priming them for osteogenesis. This allows bone marrow aspirate (BMA) to provide osteogenic and osteoinductive properties similar to ICBG without the morbidity of harvesting ICBG. The main disadvantage of BMA aspirates is the impact that donor age and comorbidities have on MSC numbers and their differentiation potential. In newborns the ratio of MSC per marrow cells is 1/10,000 but drops down to 1/2,000,000 by the age of 80 years. 10 Due to the low number of viable stem cells in an unfractionated BMA, various techniques and instrumentation have been developed to concentrate MSC, however still lacking in the consistency.11,12 This has led to BMA aspirates becoming a common osteobiologic for cervical and lumbar spine fusion over the last decade.
Despite the common use of autologous BMA, it is still unclear if BMA can provide similar spine fusions rates and clinical outcomes as ICBG. The aim of the current systematic review was to critically appraise and synthesize evidence on the use of autologous stem cells from bone marrow aspirate, adipose, or any other autologous sources for lumbar fusion in patients with degenerative spinal disorders; and to compare the autologous stem cell effectiveness to autograft and common allografts. The key PICO questions included (Figure 1):

An overview of the patients, interventions and outcomes considered for these questions.
Key Question 1: Is use of autologous stem-cells for fusion as effective as fusion with standard autograft or other graft materials in the lumbar spine?
Key Question 2: What complications are associated with autologous stem cell use in lumbar fusion? Is use of stem-cells safer than fusion with standard allograft or autograft in the lumbar spine?
Key Question 3: Do patient factors (e.g. age, smoking, comorbid conditions, revision status, presence of deformity), number of levels treated, cell type or preparation modify fusion rates in patients undergoing autologous stem cell-based lumbar fusion?
Key Question 4: Is autologous stem-cell use for fusion in the lumbar spine cost-effective compared with other graft materials?
Materials and Methods
The methods for this systematic review followed accepted standards for systematic review/comparative effectiveness reviews for rigor, quality, and transparency including those described by the Agency for Healthcare Research and Quality (AHRQ), 13 IOM Standards for Systematic Reviews 14 and the PCORI Methods Guide. 15
Electronic Literature Search
A systematic search of PubMed/MEDLINE data base was conducted for literature published through October 31, 2018 and through April 13, 2018 for EMBASE and ClinicalTrials.gov data bases. An updated search of PubMed and ClinicalTrials.gov data bases was conducted for new studies published between October 1, 2018 and February 20, 2020. Only studies with abstracts in humans, written in English were considered for inclusion, with no other limits were placed on the search. A priori inclusion and exclusion criteria are detailed in Appendix A. Briefly we sought to identify comparative studies of autologous stem cell use versus more commonly used biologics (allograft or autograft) in patients with degenerative disease of the lumbar spine. The search strategy included use of controlled vocabulary (MeSH terms) as well key words (Appendix B). Bibliographies of included studies and relevant systematic reviews were reviewed to identify pertinent studies. Citations were dual reviewed for inclusion at both title/abstract and full text stages (Appendix C, studies excluded at full text stage). ClincalTrials.gov was searched to identify studies which may have new publications. (Appendix D).
Data Extraction
In addition to results, data abstraction included patient characteristics, demographics, lifestyle choices (e.g. smoking), comorbidities (e.g. obesity), cointerventions (e.g. pharmaceutical, physical therapy, etc.) intervention and comparator details (e.g. spinal levels treated, use of anesthetic, cell preparation and concentration, delivery, etc.). A senior methodologist checked data abstractions. Detailed data abstraction is found in the appendices.
Study Quality
Each included study was independently assessed for risk of bias and methodological quality by 2 reviewers (ACS, EB) using pre-set criteria based on criteria and methods delineated in the Cochrane Handbook for Systematic Reviews of Interventions, 16 The Journal of Bone and Joint Surgery, 17 and the Agency for Healthcare Research and Quality 3 with adaptations focusing on criteria associated with methodological quality (Appendix E and F). Economic studies were evaluated according to The Quality of Health Economic Studies (QHES) instrument developed by Ofman et al. 18 Where feasible, the focus was on studies with the least potential for bias and the fewest limitations.
Data Analysis
For continuous measures from RCTs, mean differences and corresponding confidence intervals were calculated with unpaired t-tests used for statistical testing when applicable and data were available. Statistical testing was not performed for observational studies due to the size, quality, and high risk of bias of such studies and the attempt to focus the review on studies with least potential for bias. Risk ratios were calculated for dichotomous outcomes from RCTs if differences between groups were or approached statistical significance. For continuous outcomes standard error of the mean was converted to standard deviation where applicable using Graphpad. 19 Study design, heterogeneity across studies and variation in reporting precluded the pooling of data. The following precluded meaningful pooling of data (i.e. meta-analysis): Insufficient numbers of high-quality studies (only 2 RCTs were identified), substantial variability in study designs used (e.g. studies comparing sides within the same patient, prospective and retrospective studies), study quality (most were at moderately high risk of bias), small sample sizes, differences in enrolled patient populations, variability in cell sources and preparations for interventions and comparators, use of different surgical methods and variability in outcomes reported (e.g. use of different measures of function).
Overall Strength of Body of Evidence
For the outcomes of function, pain, fusion and for adverse events, the overall strength of evidence across included studies was assessed using the precepts outlined the Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group 20 and recommendations made by the Agency for Healthcare Research and Quality (AHRQ)13,21,22 and is further described in Appendix D&E. The overall quality of evidence was based on studies at least risk for bias. In determining the quality (strength) of a body of evidence regarding a given outcome, the overall quality may be downgraded 1 or 2 levels based on the following domains: 1) risk of bias due to study limitations; 2) consistency (heterogeneity) of results; 3) directness of evidence (e.g. hard clinical outcomes); 4) precision of effect size estimates (e.g. width of confidence intervals); 5) publication or reporting bias. Publication and reporting bias are difficult to assess, particularly with fewer than 10 RCTs. 13 Publication bias was unknown in all studies and thus this domain was eliminated from the strength of evidence tables. The initial quality of the overall body of evidence begins as HIGH for RCTs and LOW for observational studies. The body of evidence for methodologically strong observational studies may be upgraded 1 or 2 levels if there are no downgrades in the primary domains above and 1 or more of the following are met: 1) large magnitude of effect; 2) Dose-response gradient; 3) all plausible biases would decrease the magnitude of an apparent effect. The final overall quality (strength) of the body of literature expresses the confidence in the estimate of effect and the impact that further research may have on the results as follows:
Results
Study Selection
From 381 potentially relevant citations, 334 were excluded based on title and/or abstract review; a total of 47 studies in the lumbar spine were selected for full-text review, of which 15 met the inclusion criteria (Figure 2). No additional studies were identified from hand searching bibliographies of included studies or identified systematic reviews. Additionally, 8 ongoing clinical trials were identified (Appendix F).

PRISM Chart.
Seven of the included studies were comparative 5 of which provided data for efficacy and effectiveness (Key Questions 1 and 2) and are the primary evidence base for this review [2 RCTs (N = 28 and N = 80),23,24 1 prospective observational (N = 73) 25 and 2 retrospective observational studies (N = 62 100)26,27]. Two additional retrospective observational studies provided data on subgroups and were included for Key Question 3 only.28,29
The remaining 8 studies conducted within-patient comparisons (i.e., treated 1 side with autologous cells and the other side with the comparator graft) of which 6 provided data for effectiveness. Five were prospective comparative cohort studies30-34 and one was a retrospective cohort study 35 (Appendix G). Two cohort studies (one prospective, one retrospective) provided data on subgroups and were included for Key Question 3 only.36,37 Given that the relative benefit of autologous cells compared to other biologics for function, pain or procedural adverse events could not be determined, outcomes and determination of strength of evidence for those 8 studies were not included in the main body of this review (Appendix G).
Key Question 1: Effectiveness of Autologous Cells for Arthrodesis in the Lumbar Spine
All included studies evaluating autologous cells for lumbar fusion used autologous bone marrow aspirate (BMA) combined with various scaffolds and/or graft extenders (Table 1). Four studies evaluated BMA+allograft or autograft in primary lumbar fusion and comparators included autograft (alone or with other graft materials) in 3 studies (RCT and 2 cohort studies)24-26 and allograft in a RCT. 23 The RCT comparing BMA+allograft to allograft alone was the only study that used concentrated BMA, all others used un-concentrated BMA. One study evaluated BMA+allograft in revision surgery and compared it with both autograft and rhBMP-2 (Table 1). 27 Six studies which did in-patient comparison used BMA for 1 side and autograft (alone or with other graft materials) on the other side.30-35 Details on cell preparation, sources and intervention specifics are Appendix H.
Summary of Patient Demographics and Surgical Characteristics for Comparative Studies Evaluating the Use of Autologous Stem Cells for Primary or Revision Lumbar Fusion.
BMA = bone marrow aspirate; CBM = cancellous bone marrow; DBM = demineralized bone matrix; DDD = degenerative disc disease; HA = hydroxyapatite; MSCs = mesenchymal stem cells; NR = not reported; PLIF = posterior lumbar interbody fusion; PLF = posterolateral fusion.
* Unconcentrated BMA unless otherwise indicated.
† Leukocytes (with residual platelets) were concentrated, approximately 10 times, together with MSCs; the achieved MSCs concentration was 0.01% to 0.02% (1.74 x 104/L on average; range: 1.06–1.98 x 104/L) of all nucleated bone marrow elements (1–10 x 106/L) in all specimens.
‡Concomitant surgeries included discectomy (38%), laminectomy (73%), posterior lumbar interbody fusion (PLIF) (38%), anterior lumbar interbody fusion (ALIF) (6%), and other (21%).
§With subsets exhibiting degenerative spondylolisthesis (82.9%), synovial cysts (20.9%), and disc disease (43.4%).
** All procedures involved pedicle screw instrumentation and posterolateral fusion plus or minus the use of interbody cages.
††With subsets exhibiting degenerative spondylolisthesis (82.9%), synovial cysts (20.9%), and disc disease (43.4%).
Regarding study quality, RCT with allograft comparison was considered moderately low risk of bias 23 given that majority of methodological principles were provided except the concealed allocation (Appendix J). The RCT with autograft+allograft comparison was at moderately high risk of bias 24 unclear information on random sequence generation, concealed allocation, complete follow-up of >80% and <10% difference in follow-up between groups (Appendix J). All comparative observational studies (including those conducting with-in patient comparisons) were at moderately high risk of bias (Appendix I and J). Common methodological concerns across studies included unclear randomization and/or allocation concealment methods (RCTs), unclear loss-to-follow-up, and the inability to blind patients to clinical outcomes (all cohort studies).
Function and Pain
There were no statistical differences in measures of function or pain (or secondary outcomes of patient satisfaction or work status) at 24 months for any BMA/scaffold/graft extender combination compared with autograft based on one small RCT 24 and 2 moderate-sized observational studies25,26 in patients undergoing primary lumbar fusion (Table 2). Similarly, in patients undergoing revision surgery, no differences in leg or back pain were seen at any time point between patients receiving BMA and those receiving autograft or rhBMP-2 (Table 3). 27
Summary of Primary Clinical Outcomes From Comparative Studies of Autologous BMA Versus Autograft for Primary Lumbar Fusion.
BMA = bone marrow aspirate; ICBG: iliac crest bone graft (autograft); LBOS = Low Back Outcome Score; NC-BMA = nonconcentrated bone marrow aspirate; ODI = Oswestry Disability Index; Pro = prospective study design; Retro = retrospective study design; RoB = risk of bias; NA = not applicable.
* As reported by authors.
† LBOS (0-75, higher scores = better function) is graded as follows:
• Success = excellent (≥65) or good (50-64); for the intervention and comparison groups, the proportion of excellent results was 42% (21/50) vs. 44% (22/50) and good results was 46% (23/50) vs. 48% (24/50).
• Failure = fair (25-49) or poor (<25); for the intervention and comparison groups, the proportion of fair results was 8% (4/50) vs. 6% (3/50) and poor results was 4% (2/50) vs. 2% (1/50).
‡Authors did not specify whether this was back or leg pain VAS.
§Where standard errors (SE) were reported, values were used to estimate standard deviation (SD): SD = SE*SQRT(n).
Summary of Clinical and Safety Outcomes From the Retrospective Cohort Study by Taghavi et al. 2010 Comparing Autologous BMA Versus Autograft and Versus rh-BMP-2 for Revision Posterolateral Lumbar Fusion.
BMA = bone marrow aspirate; ICBG: iliac crest bone graft (autograft); LBOS = Low Back Outcome Score; NC-BMA = nonconcentrated bone marrow aspirate; ODI = Oswestry Disability Index; Pro = prospective study design; Retro = retrospective study design; RoB= risk of bias; NA = not applicable.
* Unconcentrated.
† For BMA versus both comparators (autograft and rh-BMP-2), as reported by authors.
Fusion
Definitions of fusion success and methods for assessing it varied across studies. Across the comparative studies for primary lumbar fusion (one small RCT 24 and 2 moderate-sized observational studies25,26), no statistically significant differences between any combination of BMA/scaffold/graft extender and autograft were seen at 24 months; however, rates of fusion varied substantially across the 3 studies (Table 4). The small RCT reported similarly high rates of fusion for BMA+autograft+Healos and autograft+allograft groups (92% and 94%) but noted significantly longer time to fusion in the BMA group. Although not statistically significant, slightly lower fusion rates at 24 months were noted for BMA (63% to 84%) compared to autograft (67% to 94%) in the observational studies (Table 4). Fusion rates varied between the observational studies, which may be due to differences in patient populations, surgical approaches and/or BMA+graft combinations and preparation. In patients undergoing revision surgery, radiographic fusion was significantly less common in patients who received BMA with DBM and allograft (78%) compared with either autograft (100%) or rh-BMP-2 (100%) at a mean of 28 months (Table 3). 27
Summary of Fusion Rates From Comparative Studies of Autologous BMA for Primary Lumbar Fusion.
ALIF = anterior lumbar interbody fusion; BMA = bone marrow aspirate; C-BMA = concentrated BMA; CT = computed tomography; DBM = demineralized bone matrix; NA = Not applicable; NC-BMA = nonconcentrated BMA; NR = not reported; PLF = posterolateral fusion; PLIF = posterior lumbar interbody fusion; ROB= risk of bias; TLIF = transforaminal lumbar interbody fusion
* Fusion was assessed on radiograph unless otherwise indicated. See Appendix Table G7 for fusion criteria across studies.
†p-value as reported by the authors.
Patients receiving concentrated BMA with allograft had significantly higher fusion rates (on both radiograph and computed tomography) at 24 months compared with allograft alone (35% vs 10% on X-ray, 80% vs. 40% on CT, Table 4). 23 There were no differences between single- and multilevel PLFs in either group (data not provided by authors).
Across the studies that conducted within-patient comparisons, BMA products, autograft source and methods of assessing fusion varied. At 24 months, fusion rates for sides with BMA were fairly similar to sides with autograft across 4 prospective cohorts,31-34 with no statistical differences between sides reported (for BMA products ranged from 80% to 94% and 80% to 91% for sides containing autograft, Appendix K). At 24 months, 2 small studies reported lower fusion rates in sides where BMA plus Healos (30%) 35 or BMA plus Osteoset (41%) 33 were used compared with sides containing autograft (63% and 91% respectively for the 2 studies). Three PLF studies compared BMA with un-concentrated bone marrow 27 or different graft extenders.34,35 Two studies29,36 reported similar rates of fusion and complications between the groups. The third study which compared BMA with 2 synthetic graft extenders, a hybrid biomaterial (InQu) versus beta-tricalcium phosphate (β-TCP), found statistically superior fusion success for sides receiving InQu compared to the contralateral sides receiving β-TCP (Appendix L).
Key Question 2: Safety
Harms and adverse events were variably reported across included studies. Small sample sizes likely precluded detection of rare events or observation of differences between BMA and comparators for most complications.
Across studies with primary lumbar fusion, 1 RCT and 2 observational studies comparing patients receiving BMA (combined with various scaffold/graft extenders) with those receiving autograft (alone or with other graft materials) found no statistical differences in pseudarthrosis (Table 5).24-26 In the RCT done by Hart et al. patients receiving concentrated BMA plus allograft compared to those receiving allograft alone had significantly fewer cases of pseudarthrosis. 23 The frequency of donor site pain was generally less in patients receiving BMA products (0% to 67%) compared with those receiving autograft (14% to 82%) across 2 small observational studies in patients receiving primary fusion25,26 (Table 5) and 0% versus 20% in one study of patients having revision surgery (Table 3). 27
Summary of Complications From Comparative Studies of Autologous BMA for Primary Lumbar Fusion.
† As reported by authors.
Across the studies that conducted within-patient comparisons, complications were rarely observed and generally cannot be ascribed to a given intervention side. Reported rates of pseudarthrosis by 24 months ranged from 6% to 59% for sides where BMA+graft was used vs. 11% to 19% for autograft. One study reported donor site pain in 28% of patients (Appendix M).
Key Question 3: Modification of Treatment Effect
Two observational studies provided subgroup analyses on the number of levels fused (Appendix Table N). Study comparing BMA versus autograft for primary fusion found similar fusion rates for single and 2-level fusions.25,26 The retrospective study comparing BMA+allograft to autograft and rh-BMP-2 in revision lumbar fusion reported 100% fusion rates for single level for all 3 grafts, and 100% for multi-level for autograft and rh-BMP-2 and 60% for BMA (Appendix Table O). 27 Vaccaro et al. compared fusion rates based on number of levels and spine pathology (degenerative disc disease vs. spondylolisthesis). 25 BMA group had lower fusion rates for single level and higher fusion rates for 2-level compared to ICBG or DBM+autograft groups (Appendix Table P). Degenerative disc disease patients had lower fusion rates than spondylolisthesis patients when BMA was used as a graft material (58.3% vs. 71.4%). However, small sample size prevented any significant difference. Ajiboye et al. evaluated the effect of age on fusion rates in patients treated with BMA+DBM versus autograft for primary fusion and reported statistically lower rates of fusion in age ≥65 years versus <65 years both within the BMA+DBM group (p = 0.03) and between the BMA+DBM and the ICBG groups (p = 0.01, Appendix Table Q). 28 Neen and co-authors reported differences in fusion rates with the BMA group based on the approach: 77.3% in 360° fusion, 93.3% for posterolateral and 84.6% for posterior lumbar interbody fusion (Appendix Table R). 26 Failure to reach statistical significance for some factors is likely a function of small sample size and study design. In a retrospective multilevel laminectomy and PLF cohort study similar fusion rates and time to fuse between Vitoss+BMA and Nanoss+BMA 27 were reported, however the sample size was not comparable. Additionally, the study did not control for confounding leading to moderately high risk of bias. The included studies did not allow for effective evaluation of treatment effect modification. RCTs with appropriate sample sizes are needed.
Key Question 4: Economic Studies
No studies were identified.
Evidence Summary, Overall Quality (strength) of Evidence
The majority of evidence for the benefits and safety of autologous BMA versus autograft+allograft for primary lumbar fusion comes from 2 cohort studies (Table 6). The quality (strength) of evidence was very low that BMA provides similar results for function, pain, frequency of fusion and frequency of adverse events compared with autograft. These findings are consistent with the one small RCT comparing BMA with autograft+allograft. There was moderate strength of evidence from one RCT that concentrated autologous BMA+allograft was associated with greater likelihood of fusion and lower risk of pseudarthrosis versus allograft. For revision lumbar fusion (Table 7), very low evidence from one small retrospective cohort study suggested lower fusion rates and corresponding higher rates of pseudarthrosis and revision surgery for BMA compared with autograft but no difference in pain.
Evidence Summary: Overall Quality (strength) of Evidence for Effectiveness and Safety Autologous Cells for Primary Lumbar Fusion.
* In this trial, baseline pain was reported as VAS back pain (mean ± SD), 4.2 ± 0.7 (BMA) vs. 4.2 ± 0.8 (autograft), p = NS, and VAS leg pain, 7.8 ± 0.7 (BMA) vs. 7.6 ± 0.4 (autograft), p = NS; however, it is unclear if the VAS score at 24 months represents back or leg pain or an average of both.
† In Vaccaro et al., 45% of patients had had previous spinal surgery (unclear if at same level as current surgery) and underwent various concomitant surgeries including discectomy (38%), laminectomy (73%), posterior lumbar interbody fusion (PLIF) (38%), anterior lumbar interbody fusion (ALIF) (6%), and other (21%).
‡Leukocytes (with residual platelets) were concentrated, approximately 10 times, together with MSCs; the achieved MSCs concentration was 0.01% to 0.02% (1.74 x 104/L on average; range: 1.06–1.98 x 104/L) of all nucleated bone marrow elements (1–10 x 106/L) in all specimens.
Reasons for downgrading quality of evidence (general):
* Serious risk of bias: the majority of studies did not meet one or more criteria of a good quality RCT (see Appendix for details)
† Serious risk of bias: the majority of studies did not meet 2 or more criteria of a good quality cohort or are case series (see Appendix for details)
‡ Small sample size/insufficient power and/or significant variation in estimates (e.g. wide confidence intervals, large standard deviations, etc.) Imprecise effect estimate for an outcome: small sample size and/or confidence interval includes both negligible effect and appreciable benefit or harm with the intervention; If sample size is likely too small to detect rare outcomes, evidence may be downgraded twice. If the estimate is statistically significant, it is imprecise if the CI ranges from “mild” to “substantial.” If the estimate is not statistically significant, it is imprecise if the CI crosses the threshold for “mild/small” effects. Wide (or unknown) confidence interval and/or small sample size may result in downgrade.
§ Unknown consistency; single study or different measures used across studies. Inconsistency: differing estimates of effects across trials; If point estimates across trials are in the same direction, do not vary substantially or heterogeneity can be explained, results may not be downgraded for inconsistency
** Indirect, intermediate or surrogate outcomes may be downgraded.
Evidence Summary: Overall Quality (strength) of Evidence for Effectiveness and Safety Autologous Cells for Revision Lumbar Fusion.
Reasons for downgrading quality of evidence (general):
* Serious risk of bias: the majority of studies did not meet one or more criteria of a good quality RCT (see Appendix for details).
† Serious risk of bias: the majority of studies did not meet 2 or more criteria of a good quality cohort or are case series (see Appendix for details).
‡ Small sample size/insufficient power and/or significant variation in estimates (e.g. wide confidence intervals, large standard deviations, etc.) Imprecise effect estimate for an outcome: small sample size and/or confidence interval includes both negligible effect and appreciable benefit or harm with the intervention; If sample size is likely too small to detect rare outcomes, evidence may be downgraded twice. If the estimate is statistically significant, it is imprecise if the CI ranges from “mild” to “substantial.” If the estimate is not statistically significant, it is imprecise if the CI crosses the threshold for “mild/small” effects. Wide (or unknown) confidence interval and/or small sample size may result in downgrade.
§ Unknown consistency; single study or different measures used across studies. Inconsistency: differing estimates of effects across trials; if point estimates across trials are in the same direction, do not vary substantially or heterogeneity can be explained, results may not be downgraded for inconsistency.
** Indirect, intermediate or surrogate outcomes may be downgraded.
Discussion
The overall quality of evidence for fusion rates and the safety of BMA in lumbar fusion procedures was considered very low, meaning we have very little confidence that the effects represent the true effects; with studies being at moderately high or high risk of bias. The outcomes of spine fusion procedures are heavily dictated by the choice of graft material. Patient comorbidities, proper bone or endplate decortication and bleeding are key elements of graft incorporation and bone remodeling.
Several studies have documented an increase in the annual incidence of spine fusion procedures.1,38 Martin et al. reported that lumbar spine fusions accounted for 41.2% of elective spine fusions in the United States between 2004 and 2015. 1 They also reported that 35.2% of the patient population were ≥ 65 years of age and that 23.1% had one and 8.1% had more than 2 comorbidities. This increase in the number of fusion procedures has been followed by an increase in the number of available osteobiologics. Given that most of the standard allograft materials lack a cellular component, BMA concentrates have been used more extensively in the past decade, and there has been a recent surge in costly cell-based osteobiologics.
In the present systematic review, no study compared BMA as a stand-alone graft material to allograft or autograft. In all 7 comparative studies BMA was mixed with an allograft or autograft. Although studies reported similar fusion rates between combinations of BMA+allograft or BMA+autograft and autograft/allograft only groups, the level of evidence was very low and all studies were at moderately high or high risk of bias.23-29 Ploumis et al. used BMA in combination with Healos and a local autograft, while the comparator group received cancellous allograft mixed with local autograft, leading to 92% vs 94% fusion rates, respectively. 24 Vaccaro and co-workers combined BMA with DBM putty and autograft lamina, achieving. 63% fusion rates at 24 months compared to 70% fusion rates in patients who received DBM putty combined with iliac crest autograft. 25 In addition, some of the patients received PLF in combination with PLIF or ALIF, which could have contributed to lower fusion rates than in the other 2 studies. In a RCT using BMA concentrates+allograft, 80% of the patients had a confirmed fusion by CT, compared to 40% of patients who received spongious allograft chips only. 23 In revision surgeries, Taghavi et al. reported 100% fusion rates for BMA+allograft, rh-BMP2 or autograft groups for single level PLF. 27 In cases with multi-level fusion however, BMA+allograft underperformed compared to the rh-BMP2 and autograft groups. This could suggest that the lack of osteoconductivity puts BMA at a disadvantage for multi-level fusions, in particular if risk factors such as smoking or osteoporosis are present. In addition, time to fusion was significantly longer in the BMA+allograft group for single level PLFs compared to the rh-BMP2 group (313.3 vs. 199.8 days). The osteogenic potency of rh-BMP2 causing early fusion has been well documented previously and could explain the difference in time to fusion. Although the fusion rates were similar between BMA+allograft or BMA+autograft and other graft materials, there was a wide difference in the definition of fusion and the methodological assessment between all 7 comparative studies. In addition, BMA preparation and volume differed among the studies. BMA was collected from multiple sites, initial volumes and centrifugation times varied and final volumes ranged from 2 ml to 10 ml being used. The soaking time also varied (10-20 minutes) or it was not specified, making it difficult to draw conclusions on the effectiveness of BMA in lumbar spine fusion.
Although the fusion rates varied among studies and multi-level revisions with BMA+autograft or BMA+allograft had lower fusion rates, patient reported outcomes were similar between the groups and studies. There was on average a 40%-50% reduction in ODI and VAS in both BMA and graft alone comparator groups at 12- or 24-months.24,27 Among the reported complications, pseudarthrosis was reported in the majority of studies. The incidence of pseudarthrosis after lumbar fusion has been reported to range between 5% and 35%. 39 In studies using un-concentrated BMA with autograft or allograft, although not significant, the rates of pseudarthrosis were higher in the BMA+autograft or BMA+allograft groups compared to autograft/allograft alone at 24 months.24,26 Taghavi et al reported pseudarthrosis in 2-level PLFs in patients receiving BMA and allograft, but not for single level procedures. 27 Autograft and rh-BMP2 groups had 100% fusion rates for both single and 2 level PLFs. In contrast, Vaccaro reported higher non-union rates after single level PLF than 2-level PLF, 41.7% vs. 28.6% when BMA was used (together with DBM and lamina). 25 The explanation for these conflicting rates might be attributed to the small sample size in each of the groups, primary and revision surgeries, and that different graft materials were used in conjunction with BMA. While un-concentrated BMA had higher pseudarthrosis rates, the use of concentrated BMA with allograft led to lower pseudarthrosis rates at 24 months, 20% in the BMA+allograft group and 60% in patients who received allograft alone. 23 There is a discrepancy between patient reported outcomes and pseudarthrosis rates. Kornblum et al. reported worse VAS scores and overall satisfaction in patients who were diagnosed with pseudarthrosis compared to patients with solid fusion. 40 32% of the patients with pseudarthrosis reported poor satisfaction compared to 5% in the solid fusion group. They also reported significant reduction in VAS leg pain score at follow-up, 0.5 in solid fusion vs. 2.1 pseudarthrosis groups.
In the present review, nerve related complications were similar between the groups, and the sample size did not allow for further analysis of complications in general or their relationship with the choice of graft.
The results from the subgroup analyses looking at the key question 3, modifiable risk factors, should be interpreted cautiously. The high risk of bias and small sample sizes of these studies, and the fact that they were not designed to evaluate heterogeneity of treatment effect, prevented firm conclusions. Vaccaro et al. stratified patients based on spinal pathology, degenerative disc disease and spondylolisthesis. 25 At 24 months, the BMA+DBM+autograft cohort had higher pseudarthrosis rates in the degenerative disc group than in the spondylolisthesis group (41.7% vs. 28.6%). In addition, BMA groups had higher pseudarthrosis rates compared to autograft and allograft groups. None of the comparisons were significant, probably due to the very low sample size. The influence of age was addressed by Ajiboye et al. comparing ≥65 and < 65 years of age, but the small sample size prevented any clinically significant conclusions. 28
None of the studies in the present systematic review addressed the financial aspects of osteobiologics or the impact of graft material on overall cost-effectiveness. Given that the osteobiologics market is growing at a fast pace it is crucial to understand the cost of osteobiologics and how they impact overall costs of primary and potential revisions surgeries.
The evidence for the effectiveness and safety of autologous cell sources for lumbar fusion compared to autograft in patients undergoing primary fusion was very low overall; the majority of evidence is from 2 observational studies at moderately high risk of bias that used different BMA preparations in very different patient populations. While there is moderate evidence that concentrated autologous BMA combined with allograft may be associated with higher fusion rates compared to allograft alone, it is based on a single RCT. 23
There are significant limitations with current available literature evaluating the efficacy of BMA in the setting of lumbar spinal fusion. Firstly, the quality of existing evidence was poor with only 7 studies have a comparator group and many of them were retrospective. Although 2 RCT were identify overall sample size across all studies were small to detect any clinically meaningful conclusions. Secondly, significant variability in the time point of fusion assessment and method of fusion assessment was observed in all studies. These latter variations may have affected observed fusion rates. A lack of control for patient factors that result in known derangements in fusion potential (smoking, metabolic bone diseases, steroid use, endocrinopathies, renal pathologies, etc.) may have further confounded fusion outcomes. There was substantial heterogeneity with regard to how studies reported preparation, processing, cell marker characteristics, culture conditions, composition, true stem cell concentration, dose, purity, and delivery of bone marrow aspirate products or other MSC sources, making it difficult to draw conclusions across studies. Future studies should follow proposed minimum reporting standards for clinical studies of cell-based therapy.41,42 In addition, RCTs with sufficient power are needed to effectively evaluation modification of treatment effect.
Conclusions
There was substantial heterogeneity across studies in patient populations, BMA and graft combinations, cell preparations and surgical characteristics making direct comparisons difficult. For comparisons of BMA to autograft or rhBMP-2 in primary or revision surgery, the strength of evidence was very low and based primarily on observational studies. There was a moderate strength of evidence from one RCT that concentrated autologous BMA+Allograft was associated with greater likelihood of fusion and lower risk of pseudarthrosis versus allograft. Given the large number of osteobiologics and the limited evidence conducting studies with powered sample size and comparator groups, information on biologics preparation and mitigating bias is urgently needed.
Supplemental Material
Supplemental Material, autologous_sc_lumbar_spinal_fusion_final_appendices_03-24-2020 - Use of Autologous Stem Cells in Lumbar Spinal Fusion: A Systematic Review of Current Clinical Evidence
Supplemental Material, autologous_sc_lumbar_spinal_fusion_final_appendices_03-24-2020 for Use of Autologous Stem Cells in Lumbar Spinal Fusion: A Systematic Review of Current Clinical Evidence by Zorica Buser, Patrick Hsieh, Hans-Joerg Meisel, Andrea C. Skelly, Erika D. Brodt, Darrel S. Brodke, Jong-Beom Park, S. Tim Yoon, Jeffrey Wang and AO KF Degenerative in Global Spine Journal
Footnotes
Authors’ Note
Zorica Buser and Patrick Hsieh contributed equally.
Acknowledgments
The authors gratefully acknowledge Aaron John Robarts Ferguson for his contributions in performing literature searches, managing citations, data abstraction and manuscript and results table editing. This study was funded by AO Spine International through the AO Spine Knowledge Forum Degenerative, a focused group of international spine experts acting on behalf of AO Spine.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Disclosures outside of submitted work:
DB- consulting: Vallum; Royalties – America, Medtronic, DePuy Synthes.
PH - royalties for Medtronic, NuVasive, and Summer Boomer.
STY- Dr Yoon owns stock in Phygen, Meditech Spine; royalties Meditech Spine, Stryker Spine (Paid directly to institution), research support from Empiric Spine (Paid directly to institution/employer), AO Spine, International Society for the Study of the Lumbar Spine (non-financial support), International Society for the Study of the Lumbar Spine (other).
HJM –consultancy (money paid to institution) - DiFusion (ongoing), Co.don (past); royalties: Medtronic, Fehling Aesculap (past); stocks (money paid to institution) - Regenerate Life Sciences GmbH in DiFusion.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Aggregate Analytics, Inc. received funding from AO Foundation to perform the methodological and analytical aspects of this review.
ORCID iD
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
