Abstract
Study Design
Prospectively Enrolled Cohort Study.
Objective
To compare the time of return to baseline ambulatory function after undergoing minimally invasive transforaminal lumbar interbody fusion (MIS-TLIF) vs traditional open posterolateral fusion (OF).
Methods
Patients undergoing TLIF or OF with an iPhone were prospectively enrolled. Participants voluntarily shared information from the pre-installed Apple Healthkit package, which provided baseline activity data. Daily steps and distance were tracked until a patient returned to within 90% of their pre-operative baseline for 2 consecutive days. Patient-reported outcome measure scores (PROMs) were collected at the pre-operative and subsequent follow-up visits.
Results
A total of 23 MIS-TLIF and 25 OF patients were enrolled. Patients undergoing MIS-TLIF had an average preoperative baseline of 3576 steps (SD 2185); these patients returned to 90% of baseline steps at an average of 10.57 days. Those undergoing OF had an average preoperative baseline of 2280 steps (SD 1295) and required 15.32 days to return to 90% of pre-operative step count. There were no significant correlations between pre-operative demographic factors or PROMs with time to return to 90% of baseline ambulation. After matched analysis was performed, the average treatment effect of MIS vs OF operation was estimated, though this was not statistically significant.
Conclusions
This study quantifies pre- and post-operative ambulatory function for 2 cohorts of patients undergoing lumbar surgery. This work further builds on the existing uses of Apple HealthKit data to establish ambulatory baseline in the lumbar spine surgery population, as well as comparison of objective ambulation data with PROMs.
Keywords
Introduction
A variety of outcome measures have been used to track patient recovery after spinal surgery. Survey-based assessments of functional ability and activity capacity have been used to improve physician understanding of patient progress, with notable limitations due to respondent subjectivity and possible impact of recall bias.1,2 As an alternative, there has been emphasis on the use of objective outcome measures throughout the recovery period, via data collected from devices such as activity trackers and pedometers. 3 These tools provide quantitative data to gauge a patient’s post-operative progress, 4 though are limited by the use of specialized monitoring devices. Consumer-grade wearable devices (such as accelerometer-equipped smart watches) and smartphone-based monitoring appear to be underutilized data sources, especially given the capability of these devices to collect pre- and post-operative step data using a device potentially already in use by a patient.
Post-operative activity data may also provide further outcome comparisons between similar procedures. For patients with degenerative spondylolisthesis, there are several ways of decompressing the spinal canal and neural foramina and stabilizing the instability. Open laminectomy with posterolateral fusion (OF) is a commonly utilized procedure for degenerative spondylolisthesis, with favorable results. 5 Transforaminal lumbar interbody fusion (TLIF) with laminectomy is another well-established technique for addressing neuroforaminal stenosis, performing canal decompression, and restoring stability. Though TLIF was initially described using an “open” technique, modifications to this procedure have resulted in what is now known as the “minimally invasive” TLIF (MIS-TLIF). 6 Though MIS-TLIF differs in the size of the incision and extent of muscular retraction, it is unclear whether MIS-TLIF offers a benefit for long-term patient outcomes. However, a growing body of evidence has demonstrated advantages for MIS-TLIF over open techniques for short term outcomes, such perioperative pain, intraoperative blood loss, pain control, and length of stay (LOS).7–10 Studies have not yet fully described the impact of MIS-TLIF on post-operative functional measures, such as return of baseline ambulatory capacity.
The goal of this study is to compare the post-operative recovery between MIS-TLIF and OF through the use of objective step data collected during the pre-operative and post-operative periods. The main objective of the study is to quantify the time to 90% of baseline steps and distance for patients undergoing MIS-TLIF and OF. Secondary objectives of this study include analysis of relationships between patient-reported outcome measures (PROMs) and demographic factors with pre- and post-operative step counts.
Methods
The study was approved by the local institutional review board prior to prospective enrollment of patients. All patients in the study were indicated for surgical treatment of degenerative spondylolisthesis; 2 groups were enrolled based on planned surgical technique. Patients presenting with primarily neurogenic claudication without significant foraminal stenosis were offered open lumbar laminectomy with posterolateral fusion. Those with central stenosis and/or neuroforaminal stenosis due to degenerative disc disease with notable disc height loss were offered MIS-TLIF. Enrollment was restricted to individuals >18 years and without prior instrumented lumbar or cervical surgery. Study involvement was also limited to patients who used an Apple iPhone. All operations were completed by the senior author.
For patients undergoing MIS-TLIF, bilateral pedicle screws were placed via a percutaneous technique, with decompression and disc space preparation/instrumentation performed through a 20 mm tube. Facetectomy and interbody device insertion were performed on the side concordant with worse radicular pain. For those with equal pre-operative radicular symptoms, the facetectomy and TLIF were performed from the left side. For patients undergoing open posterolateral fusion, a midline approach was performed. Bilateral pedicle screws were placed under direct visualization and wide decompression was performed bilaterally.
Individual patient activity data was collected on an individual’s personal iPhone, using the Apple HealthKit application. This is a pre-loaded software package on contemporary Apple mobile devices (beginning with iOS 8, released in 2014). 11 The functionality of this application was verified at each study visit. For all subjects, baseline average pre-operative step count was calculated from day of study enrollment to the day of surgery. The most recent 90 days of step data prior to surgery was used for those patients with surgery >90 days from the pre-operative visit. Subsequent daily step counts were collected at the first post-operative visit. Collection of step counts at follow-up visits was discontinued at the visit after a patient obtained daily step counts greater than 90% of pre-operative baseline steps for 2 consecutive days.
In addition, a series of patient-reported outcome measures (PROMs) were collected at each visit, per the clinic’s typical protocol. These included Modified Oswestry Disability Index (mODI), visual analogue score for back pain (VAS-B), Short Form Health Survey 36 (SF-36) pain and physical function subdomains, as well as Patient Health Questionnaire 9 (PHQ-9).
Statistical analysis was performed using R software (version 2024.12.1, R Foundation for Statistical Computing, Vienna, Austria). Propensity score matching was performed with the use of the “MatchIt” package and treatment effects were estimated with the “Marginal Effects Package.”12,13 This software has been used in a previous observational study to estimate the effect of MIS- vs open-TLIF on local and regional radiographic outcomes. 14 In our study, optimal full matching was used between groups, with propensity scores estimated with logistic regression. Of note, the matched data is weighted by covariates and between group differences were assessed with standard mean errors. Non-matched data are presented for demographics and summary data of pre- and post-operative step counts.
For baseline, unmatched comparisons, Fisher Exact tests were used for categorical data and Student’s t-test was used for continuous data. Non-parametric comparisons were performed with the Mann-Whitney U test and measures of association for non-parametric data were assessed with Spearman’s rank correlation coefficient. For all statistical comparisons, a pre-determined alpha ≤0.05 denotes significance.
Results
Demographics
PROM surveys were completed at pre-operative and follow-up visits. Pooled PROM data for MODI and VAS-B for the MIS and open groups are shown as box plots in Figures 1 and 2. There were no statistically significant differences between the MIS and open groups when comparing these measures pre-operatively and at all follow-up visits. Boxplots of pooled MODI responses for MIS-TLIF and OF; 1st visit at 2 weeks (nominal), 2nd visit at 3 months (nominal), and 3rd visit at 6 months post-operatively (nominal) Boxplots of pooled VAS-B responses for MIS-TLIF and OF; 1st visit at 2 weeks (nominal), 2nd visit at 3 months (nominal), and 3rd visit at 6 months post-operatively (nominal)

Results
As an illustrative example of variance within the data, a scatter plot of pre-operative step count vs age for the MIS-TLIF and OF groups is presented in Figure 3. Associations between pre-operative step count and pre-operative demographic and outcome measure scores were assessed with the use of Spearman’s rank correlation coefficient (Table 3). There were no significant associations of pre-operative step count with age, BMI, CCI, or pre-operative patient reported outcome measures, with the exception of a moderate correlation noted between SF-36 Physical Functioning Domain and pre-operative step count in the MIS-TLIF group. This correlation was not observed in the OF group. Pre-operative step count vs age Correlations *Rho of 0 suggests no correlation; **Rho of −1 or 1 suggests strong negative or pos. correlation.
Pre-operative baseline steps vs time to return to 90% of baseline steps is presented as a scatter plot in Figure 4. No patients in the study failed to meet the 90% threshold. Summary statistics for time to 90% of baseline steps are shown in Table 2. The MIS group obtained 90% of baseline steps in 10.57 days (SD 5.29) compared to 15.32 days (SD 8.74) in the OF group (P = 0.031). Similarly, the MIS group required 11.0 days (SD 5.66) to return to 90% of pre-operative distance, compared to 15.76 days (SD 8.33) in the OF group (P = 0.036). Spearman’s rank correlation coefficients were also calculated to assess relationships among first post-operative visit PROMs and time to achieve 90% of baseline steps; no statistically significant correlations were found (Table 3). Pre-operative baseline steps vs time to return to 90% of baseline steps
Matched data
Standardized mean difference approaching 0 indicates well-matched data.
Among all patients, there was one re-operation for post-operative radiculopathy in the MIS-TLIF group. One patient in the OF group experienced post-operative urinary retention and was discharged with a foley catheter. All but 1 patient was discharged home; the remaining patient (OF cohort) was discharged to a sub-acute rehabilitation facility. Five patients returned to the emergency department during the 3 months following surgery (3 MIS-TLIF, 2 OF); among these patients, 3/5 visits were attributable to surgery (uncontrolled post-operative pain, perioperative hyperglycemia, dizziness related to analgesic use). There were no complications attributed to the monitoring of step data.
Discussion
In this study, objective monitoring of step data for patients undergoing MIS-TLIF and open posterolateral fusion was compared, though findings must be interpreted in the context of different baseline pre-operative ambulatory function between groups. For the MIS-TLIF group, an average baseline of 3576 steps per day was recorded pre-operatively (SD 2185 steps per day); patients obtained 90% of pre-operative baseline steps by 10.57 days (SD 5.29 days). This was significantly different from the OF group, with patients recording approximately 2280 steps pre-operatively (SD 1295) and requiring 15.32 days (SD 8.74) to return to 90% of pre-operative baseline. These differences are likely attributable to baseline demographic characteristics of the study cohorts undergoing these procedures, notably different in age, BMI, and comorbidities. However, baseline step counts did not demonstrate significant associations with these factors in either group. Similarly, there were no associations between pre-operative and first follow-up patient reported outcome measures with time to return to 90% of baseline ambulation. To account for heterogeneity in the study cohorts, propensity score matching was attempted to estimate the treatment effect of MIS-TLIF vs OF on time to return to 90% of baseline ambulation, which did not yield a statistically significant finding. Given these results, it is likely that baseline differences in cohorts are the primary factor for differences in functional outcomes, rather than treatment effects between MIS-TLIF and OF.
This work is among relatively few studies using Apple Healthkit data to track post-operative patient outcomes. 15 Though a previous study has used the iPhone to obtain patient step data in MIS-TLIF, 16 our study helps further generalize Healthkit gait data by making a direct comparison of patients undergoing MIS-TLIF to patients undergoing traditional open decompression and posterolateral fusion. To the author’s knowledge, our study is the first to directly compare Healthkit step data for 2 different surgical techniques. Similarly, there are limited studies quantifying post-operative activity after minimally invasive spine procedures, despite a large body of literature focused on the perioperative recovery of patients undergoing MIS vs open techniques.5,6,17 One relevant activity study utilized an accelerometer on a single patient to quantify physical activity before and after MIS-TLIF. 18 The authors found that return to baseline activity occurred within 2 months for this patient, with physical activity almost doubling in the fourth month post-operation. Another recent study utilizing Apple Health step data for patients undergoing minimally-invasive microdiscectomy demonstrated baseline and postoperative step counts in a relatively younger population with lumbar radiculopathy (mean age 49.60, SD 15.13), showing pre- and post-operative step counts of 3450 (SD 2737) and 7479 (SD 3271), respectively. 19
In the broader context of spine surgery, there has been a growing body of literature surrounding wearable devices, utilizing both commonly-available products such as Fitbit devices and dedicated specialized products.20,21 Largely, these inquiries focus on the limitations of using currently utilized PROMs for quantifying patient recovery. Maharaj et al conducted a study of “recovery kinetics” in patients undergoing lumbar decompression or fusion using the Mi-Band2, a measurement device worn on the wrist. 22 The authors also demonstrated variability with observed pre-operative step counts, with patients recording 4700 +/− 2900 steps prior to surgery. They concluded that using objective step data allows additional insight into patient progress, including earlier identification of patients slow to improve in functional status during the post-operative period. A prospective study in cervical and lumbar spine surgery patients utilizing a consumer-grade pedometer noted similar difficulties with correlating PROMs to step counts, which underscores findings in our study, though noted significant correlations with PHQ-2. 23 In a study of lumbar spine surgery patients in the first post-operative week, Gilmore et al noted the utility of observing lower than expected step counts, which helped to identify patients who required a longer time to achieve independent mobility and those requiring longer hospital admissions. 24 Given the heterogeneity of measurement methods, surgical indications, and duration of follow-up within the current literature, it is difficult to generalize actionable activity thresholds until larger studies or meta-analyses are performed.
In this study, the use of Apple Healthkit software was intended to further improve the availability of post-operative activity data. Given widespread use of Apple smartphones and relative ease-of-access to Healthkit data for both patients and providers, obtaining and interpreting activity data adds marginal time to a routine post-operative encounter. It is the opinion of the authors that future investigators should consider including baseline and post-operative patient step counts and distance from Healthkit to establish a larger sample size for comparison with future studies. The use of the Healthkit application for measuring gait parameters has been investigated for validity when compared to a commercial wearable device (APDM Mobility Lab) for the measurement of step length in adults, with an estimated error of 9.8%. 25 Though the precision of smartphone-based step monitoring may be less than research-grade accelerometers, the wide availability is the main draw to current use. Aside from the specific Healthkit application, gait data collection via smartphone inertial measurement units has been investigated for test-retest reliability, with findings suggestive of high reproducibility in both home and laboratory settings.26,27 Despite this favorable outlook, there are some limitations to using Healthkit data (or any smartphone-based collection) for activity monitoring. Step counts are only obtained when the device is physically on the person of an individual, which may potentially underestimate true step counts in environments such as the home. Similarly, smartphone-based measurements may also lose accuracy if a device is contained within a purse or handbag.
Ambulatory capacity may be a relevant factor in patient selection and timing of surgery. 28 Though ambulatory dysfunction related to neurogenic claudication is common among patients presenting with degenerative spondylolisthesis, the relationship between pain intensity and ambulatory capacity has not been consistently established with the use of survey-based outcome measures. 29 This finding possibly underscores the lack of associations found in this study between pre-operative PROMs and ambulatory function, though this may not be generalizable. PROMs which are directed towards walking-specific pain and disability may demonstrate closer associations. In a study of walking capacity and performance in patients with lumbar spinal stenosis, Conway et al monitored ambulatory function of 12 subjects with spinal stenosis during a 7-day period with the use of an Actigraph GT1M activity monitor. 30 The patients in this study recorded an average of 2821.9 steps per day. The authors found activity monitor data was correlated with 2 outcome measures; patients who indicated via survey that walking was limited by pain in the legs demonstrated a correlation with average activity per day and maximum continuous activity per day, though the correlation was modest (regression coefficients of r = 0.623 and 0.754, respectively). Another study in patients with thoracolumbar degenerative disease utilized the Apple Watch to quantify step counts in the 2 weeks prior to planned surgery; female patients ≥65 years recorded 3582 steps (+/− 1851) and male patients ≥65 years recorded 5989 steps (+/− 5686). 31 The authors noted correlations among several PROMs with step count, including ODI, SF-36 physical component, and Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function Subscale. Given the differences in baseline step counts between our study and other existing work, wide heterogeneity in the baseline step counts among participants appear to influence correlations of PROM scores with pre- and post-operative ambulatory function. Further work investigating correlations between PROMs and step counts may be improved through the use of walking-relevant PROM questions and normalized measures of step counts. Though the experience of recovery captured by PROMs represents one aspect of a patient’s post-operative course, the objective step counts are likely influenced by many external factors (especially home environment and assistance) not captured by commonly used survey instruments.
This study has several limitations. All participants were under the care of a single surgeon, which limits generalizability to those with different practice settings and local patient populations. A significant selection bias was observed towards younger and less obese patients in the MIS-TLIF group. However, this may be representative of patient selection practices for many spine surgeons employing both MIS and open techniques. Furthermore, the duration of the recording period limits the analysis of ambulation outside of the immediate post-operative period. Including additional data may provide information on the longer-term effects on ambulatory function from these procedures. Finally, the collection of all post-operative step data may be subject to Hawthorne effect bias, given that patients were aware that they were under observation for the study and may be motivated to over-report activity. The pre-operative data is least vulnerable to this effect, as historical baseline data was collected at the time of study enrollment.
Conclusion
This study utilizes Apple Healthkit data to assess the post-operative ambulation of patients undergoing MIS-TLIF vs open posterolateral fusion. Though the observed differences in post-operative step counts found in this study are likely attributed to baseline demographic and functional differences between each cohort rather than surgical technique itself, this study further builds on prior uses of Healthkit data to quantify post-operative ambulatory function in spine surgery population. The authors encourage the routine monitoring of pre- and post-operative step data in future studies, to help establish further information on the trajectory of post-operative ambulatory function in patients undergoing spine surgery.
Footnotes
Ethical Approval
The Corewell Health William Beaumont University Hospital Institutional Review Board granted approval for this study (2022-242) prior to enrollment of patients.
Consent to Participate
All patients enrolled in the study participated in an informed consent discussion.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors disclose no relevant financial relationships, conflicts of interest, or other potential sources of bias related to the content of this work. Drs. Miller, Easthardt, and Michel have no financial relationships with industry or relevant ownership/stock interests. Dr Park serves as a consultant for Stryker, Arthrex, Kuros, Amplify, and Medynus; he owns stock or stock options in Alphatec and Johnson and Johnson.
