Leveraging Artificial Intelligence and Synthetic Data Derivatives for Spine Surgery Research

Abstract

Study Design

Retrospective cohort study.

Objectives

Leveraging electronic health records (EHRs) for spine surgery research is impeded by concerns regarding patient privacy and data ownership. Synthetic data derivatives may help overcome these limitations. This study’s objective was to validate the use of synthetic data for spine surgery research.

Methods

Data came from the EHR from 15 hospitals. Patients that underwent anterior cervical or posterior lumbar fusion (2010-2020) were included. Real data were obtained from the EHR. Synthetic data was generated to simulate the properties of the real data, without maintaining a one-to-one correspondence with real patients. Within each cohort, ability to predict 30-day readmissions and 30-day complications was evaluated using logistic regression and extreme gradient boosting machines (XGBoost).

Results

We identified 9,072 real and 9,088 synthetic cervical fusion patients. Descriptive characteristics were nearly identical between the 2 datasets. When predicting readmission, models built using real and synthetic data both had c-statistics of .69-.71 using logistic regression and XGBoost. Among 12,111 real and 12,126 synthetic lumbar fusion patients, descriptive characteristics were nearly the same for most variables. Using logistic regression and XGBoost to predict readmission, discrimination was similar with models built using real and synthetic data (c-statistics .66-.69). When predicting complications, models derived using real and synthetic data showed similar discrimination in both cohorts. Despite some differences, the most influential predictors were similar in the real and synthetic datasets.

Conclusion

Synthetic data replicate most descriptive and predictive properties of real data, and therefore may expand EHR research in spine surgery.

Keywords

synthetic data derivatives spine surgery electronic health records medical informatics machine learning artificial intelligence treatment outcome

Introduction

Spine surgeons treat a wide variety of complex conditions, creating difficulty identifying the most effective treatments for particular patients. Historically, randomized controlled trials have been considered the gold standard for generating evidence regarding the efficacy of treatment interventions.¹ However, randomized trials are time-consuming, include a small number of all eligible patients, use narrow eligibility criteria, and are typically extremely costly.^2-4 In spine surgery populations in particular, trials are also hindered by high rates of nonadherence, where patients do not pursue the treatment assigned, substantially limiting study conclusions.^5,6 Collectively, these problems have made randomized trials impractical for studying many pressing spine surgery questions.

Responding in part to the challenges of conducting randomized trials, there has been tremendous growth in spine surgery registries.^7-11 These registries have undoubtedly facilitated the development of evidence-based practices and new quality improvement initiatives. Nonetheless, maintaining these registries is costly, and data elements must be manually abstracted, limiting both hospital participation and the breadth of data collection. With at least 98% of hospitals now using an electronic health record (EHR),¹² automated queries of multidimensional clinical, laboratory, imaging, and diagnostic data are increasingly viable alternatives to traditional registries. Yet obtaining EHR data within a healthcare system is often cumbersome, and sharing of EHR data across health systems remains extremely uncommon, particularly because of questions relating to patient privacy and data ownership.^13,14

One option for obtaining and sharing detailed health information across health systems is the use of synthetic data derivatives. Moving beyond simple deidentification (e.g., the removal of names and dates of birth), synthetic datasets are novel cohorts derived from the actual EHR but no longer corresponding to real individual patient data.¹⁵ While synthetic data no longer maintain one-to-one correspondence with real patient records, when generated properly, they have the same statistical properties as the original EHR data (e.g., the same distribution of comorbidities and laboratory values).¹⁵ Consequently, in principle, synthetic data platforms have the potential to facilitate acquisition of EHR data and also sharing of multicenter datasets, while avoiding challenges related to patient privacy.

Our institution recently implemented a synthetic data platform developed by MDClone (Be’er Sheva, Israel), which is effective when studying select populations identified based on clinical diagnoses (e.g., sepsis and heart failure).^15,16 However, it is unknown whether this synthetic data platform produces valid results for studying spine surgeries, such as spinal fusion surgery. Consequently, the objective of this study was to validate the use of synthetic data in 2 spine fusion populations to evaluate its potential for accurately characterizing spine surgery outcomes.

Methods

Patient Population and Variable Selection

This study used both real data and synthetic data derivates created from the Research Data Core (RDC) at the authors’ institution from the years 2010 to 2021. The RDC is a centralized data repository that includes information from the EHR of 15 hospitals that are part of a single health system. EHR data are periodically downloaded from the EHR to the RDC, ensuring an updated source of research data.

The first cohort for this study included patients 18 to 89years old that underwent anterior cervical fusion. The second cohort included patients 18 to 89years old who underwent posterior lumbar fusion. We included both single and multilevel surgeries. Eligible procedures were identified from the RDC using the MDClone interface, which maps surgical records from the EHR based on international classification (ICD) of diseases and/or Current Procedural Terminology (CPT) codes to procedure descriptions. The procedure codes used to define the study populations are shown in E-Table 1 in the Appendix.

Table 1.

Descriptive statistics of the cervical fusion cohort.

Characteristic	Original, N = 9,072^a	Synthetic, N = 9,088^a	P-value^b
Gender			.004
Censored	0 (0%)	10 (.1%)
Female	4,575 (50%)	4,573 (50%)
Male	4,497 (50%)	4,505 (50%)
Age	56 (47, 64)	55 (47, 64)	0.9
Race			<.001
White	7,913 (87%)	7,790 (86%)
Black	845 (9.3%)	735 (8.1%)
Other	314 (3.5%)	163 (1.8%)
Censored	0 (0%)	400 (4.4%)
Visit type			<.001
Inpatient	0 (0%)	0 (0%)
Outpatient/SDS	8,879 (100%)	8,700 (97%)
Censored	0 (0%)	242 (2.7%)
Unknown	193	146
Diabetes	1,464 (16%)	1,464 (16%)	>0.9
Hypertension	3,607 (40%)	3,619 (40%)	>0.9
Depression	1,504 (17%)	1,507 (17%)	>0.9
Anxiety	1,227 (14%)	1,228 (14%)	>0.9
Obesity	1,754 (19%)	1,759 (19%)	>0.9
COPD	1,340 (15%)	1,339 (15%)	>0.9
Opioid use	2,488 (27%)	2,493 (27%)	>0.9
CHF	271 (3.0%)	272 (3.0%)	>0.9
Drug use	246 (2.7%)	251 (2.8%)	0.8
SSRI use	134 (1.5%)	137 (1.5%)	0.9
Chronic kidney disease	315 (3.5%)	317 (3.5%)	>0.9
Liver disease	190 (2.1%)	190 (2.1%)	>0.9
Spine trauma	207 (2.3%)	208 (2.3%)	>0.9
Infection	43 (.5%)	43 (.5%)	>0.9
Posterior fusion	693 (7.6%)	698 (7.7%)	>0.9
Thoracic fusion	282 (3.1%)	284 (3.1%)	>0.9
Osteoporosis	272 (3.0%)	273 (3.0%)	>0.9
Rheumatoid arthritis	266 (2.9%)	267 (2.9%)	>0.9
Readmission	559 (6.2%)	557 (6.1%)	>0.9
Surgical site infection	34 (.4%)	34 (.4%)	>0.9
Thromboembolic event	71 (.8%)	72 (.8%)	>0.9
Pneumonia	75 (.8%)	74 (.8%)	>0.9
Postop hematoma	21 (.2%)	21 (.2%)	>0.9
Myocardial infarction	3 (<.1%)	24 (.3%)	<.001
Acute kidney injury	62 (.7%)	62 (.7%)	>0.9
Respiratory event	131 (1.4%)	131 (1.4%)	>0.9
Any complication	275 (3.0%)	309 (3.4%)	0.2
BMI	29 (26, 34)	29 (26, 34)	>0.9
Unknown	2,595	2,596
Sodium (min)	140 (138, 141)	140 (138, 141)	>0.9
Unknown	3,161	3,164
Sodium (max)	140 (138, 141)	140 (138, 141)	>0.9
Unknown	3,161	3,163
Hemoglobin	13.90 (12.90, 14.90)	13.90 (12.90, 14.90)	0.8
Unknown	2,521	2,523
WBC (min)	7.40 (6.00, 9.00)	7.40 (6.00, 9.00)	>0.9
Unknown	2,548	2,550
WBC (Max)	7.50 (6.20, 9.30)	7.60 (6.20, 9.30)	>0.9
Unknown	2,548	2,550
Prior ED visit count			.002
0	7,104 (78%)	7,114 (78%)
1	1,232 (14%)	1,354 (15%)
2	385 (4.2%)	327 (3.6%)
3	153 (1.7%)	143 (1.6%)
4_plus	198 (2.2%)	150 (1.7%)
Prior hospital admit count			<.001
0	5,940 (65%)	5,945 (65%)
1	2,634 (29%)	2,812 (31%)
2	347 (3.8%)	216 (2.4%)
3_plus	151 (1.7%)	115 (1.3%)
Medications prescribed	0 (0, 8)	0 (0, 8)	>0.9
Admission source			<.001
Routine	5,218 (91%)	5,228 (91%)
Non-routine	515 (9.0%)	502 (8.7%)
Not specified	0 (0%)	0 (0%)
Censored	0 (0%)	24 (.4%)
Unknown	3,339	3,334

^an (%); Median (IQR).

^bFisher’s exact test; Wilcoxon rank sum test; Pearson’s Chi-squared test.

For both descriptive and predictive analyses, we evaluated variables related to comorbid diagnoses (e.g., diabetes), relevant laboratory values (e.g., sodium level), body mass index (BMI), medication prescriptions, and demographic characteristics. We also noted the presence of a concomitant posterior/thoracic (cervical cohort) or anterior/thoracic (lumbar cohort) fusion. We chose these variables to be representative of data elements that can typically be captured via structured EHR queries. Comorbid medical diagnoses were evaluated from 1year before until the time of the index procedure. Laboratory values were evaluated from 30 days before until 1day after index surgery. Surgical indications (e.g., myelopathy and spine trauma) were evaluated within 30 days of surgery.

To examine predictive performance, our primary outcome was 30-day readmission to our health system, an increasingly important quality metric.¹⁷ As a secondary outcome, we evaluated postoperative complications within 30days of surgery, including both surgical (e.g., postoperative hematoma) and medical (e.g., pneumonia) events.

Synthetic Data Generation

In essence, synthetic data are new populations of patients created based on real datasets that do not maintain any one-to-one correspondence with the real patients they are intended to mimic. Methods for generating synthetic clinical data can be generally classified as statistical simulation or computational derivation.¹⁵ Statistical simulation methods use real-world, often publicly available, datasets as the basis for generating artificial datasets. Such synthetic data should mimic disease distribution at the population level and maintain the appearance of patient-level data. This approach may be appropriate for broad descriptive analyses (e.g., evaluating trends in surgery rates over time). However, these methods do not account for covariate relationships at the individual patient level (eg, the relationship between individual patient comorbidities and clinically relevant endpoints).¹⁸ Addressing these deficiencies, the MDClone platform is based on the “derivation” approach of synthetic data generation. This method uses computer algorithms to create new synthetic datasets on demand (i.e., in real-time) that are based on real, individual patient EHR data. The new synthetic dataset includes a similar number of patients as the real source dataset and also maintains the distribution and covariance structure of variables in the original data.^15,19 In other words, both descriptive characteristics (e.g., demographics and surgical details) and the relationships among the dataset variables (e.g., the relationship between surgical approach and complications) should be maintained. At the same time, patients in the synthetic dataset have no one-to-one correspondence with real patients, thereby preserving anonymity. While the exact algorithm used by MDClone is proprietary, it involves using statistical and artificial intelligence techniques to learn complex relationships among variables in the real dataset and preserve those relationships in the newly created synthetic dataset.

The details of the MDClone platform and workflow have been published previously.¹⁵ Briefly, the real data used by the platform comes from our RDC. The RDC contains multimodal data that includes diagnoses and medical problems, medication prescriptions, surgical procedures, laboratory values, and demographic information across both inpatient and outpatient settings. MDClone includes a “query tool” that searches the data lake to define a reference event for a population of interest (e.g., lumbar fusion in 2019). Other time-related covariates (e.g., comorbid diagnoses prior to surgery) are also defined using the query tool. During this process, MDClone provides an approximation of the size of the target population and the frequency of relevant variables requested in the query (e.g., medication prescriptions). During the last step, MDClone’s “synthetic data generator” transforms real data from the RDC into a synthetic dataset that users can download. An image of the query tool is shown in the appendix (Supplement Appendix E-Figure 1).

MDClone will only create datasets with at least 200 patients, reflecting its definition of a moderately sized population. Additionally, as more detailed information is requested based on rare combinations of categorical variables (e.g., demographic subgroups), the system will “censor” some values to avoid the possibility of identifying individual patients. Because real patient data are not reported, the authors’ institution permits users to generate synthetic datasets based on real EHR data without obtaining institutional review board (IRB) approval.

For this study, we used a single query applied to the MDClone platform to generate a synthetic dataset and then, separately, retrieve the original dataset containing the records of the population from which it was derived.

Statistical Analysis

Descriptive statistics compared variables of interest between the synthetic and real datasets. To evaluate whether proportions and mean values were significantly different between datasets, Chi-Square tests, t-tests, and Mann–Whitney U tests were performed, depending on the distribution of the data. We did not correct for multiple comparison testing to avoid potentially missing meaningful differences between the real and synthetic datasets. Variables with less than 20% missing data were imputed using the missForest package in R.²⁰ Variables that had 30% or more missing data missing were excluded from the study. Less than 5% of data were censored, and these were excluded from predictive analyses.

To determine the performance of the real and synthetic datasets in a predictive modeling context, we compared 2 multivariable models to predict both 30-day readmission and 30-day complications for each cohort. First, we used multivariable logistic regression. Second, we trained an extreme gradient boosting machine (XGBoost) model, which is a machine learning technique. XGBoost is based on predictions from classification trees, which involve sequential splits in the data based on variables that distinguish patients with vs those without the outcome. To improve model accuracy over a single classification tree, XGBoost combines input from many individual trees.²¹ For the logistic regression model, we compared statistically significant variables and variable odds ratios between real and synthetic datasets. For the XGBoost model, we compared the most influential variables.²² To test predictive performance, we split each dataset into training (70%) and testing (30%) datasets. We developed each predictive model in both the real and synthetic datasets using the “training” (70%) data. Our goal was to evaluate how a predictive model built based on synthetic data would perform when applied to a real dataset. Therefore, we compared the discrimination of each model in the real testing dataset (30%), which had not been used for model development. Discrimination—that is, the ability to distinguish patients that do versus those that do not experience the outcome—was measured using the c-statistic, which represents the area under the receive operating characteristic curve.²³This study was reviewed by the authors’ IRB and granted exempt status and a waiver of HIPAA authorization. Consequently, patient consent was not required. Statistical analyses were conducted in R version 4.0.1 using base functions along with the xgboost, caret, and iml packages. P < .05 was considered statistically significant.

Results

Cervical Fusion

From 2010-2020, we identified a total of 9,072 real and 9,088 synthetic patients who underwent anterior cervical fusion in our healthcare system. Only the variables for gender (.1%), visit type (2.7%), admission source (.4%), and race (4.4%) included censored data. Comparing the real and synthetic datasets, no continuous variables showed significant differences between datasets. Furthermore, as shown in Figure 1A, the distribution of the data also appeared nearly identical between datasets. Most categorical variables were nearly identical between the real and synthetic datasets. However, there were small but statistically significant differences in race in the real versus synthetic data (9.3% vs 8.1% Black). Likewise, when the number of recent hospital and emergency department (ED) visits was categorized, there were very small but significant differences between the datasets (e.g., 2.2% vs 1.7% with ≥4 recent hospital admissions). Finally, we did note a difference in the rate of myocardial infarction, which was the rarest single variable we evaluated (real <.1% vs synthetic .3%). The descriptive results with P-values are summarized in Table 1.

Figure 1.

Comparison of cervical (A) and lumbar (B) cohort characteristics in the real and synthetic datasets. Violin plots shows the distribution of the data for select continuous variables.

Outcome Prediction

In the real and synthetic cervical fusion populations, 6.2% and 6.1% of patients, respectively, were readmitted to the hospital within 30days of surgery. Using multivariable logistic regression, the c-statistic for predicting readmission was .71 in the real and synthetic datasets. Among the statistically significant predictors, all 10 were shared between both models, with similar model coefficients in both. Influential and complete logistic regression parameters are shown in Table 2 and Supplemental Apendix E-Table 2. Model discrimination is displayed graphically in Figure 2. Using XGBoost, models built using the real and synthetic datasets had C-statistics of .69 and .71, respectively. The most influential variables in each XGBoost model are summarized in Figure 3A. Both sets of models shared similar influential variables (e.g., age and hemoglobin value), though several differences were also noted (e.g., recent ED visits was more important in the real vs synthetic model).

Table 2.

Influential logistic regression parameters predicting 30-day readmission for the cervical fusion cohort by dataset. Variables with P < .10 in either dataset are shown.

	Real	Synthetic
Gender=Male (ref=Female)	.413*** (.170, .657)	.374*** (.123, .625)
Age	.033*** (.023, .043)	.039*** (.028, .049)
Visit Type=Inpatient	1.042*** (.715, 1.368)	.921*** (.580, 1.261)
Hemoglobin	−.170*** (−.253, −.088)	−.167*** (−.255, −.079)
Spine trauma	−1.211*** (−2.109, −.312)	−1.024** (−1.864, −.184)
SSRI use	.552* (−.103, 1.208)	.197 (−.521, .914)
WBC (max)	.025 (−.037, .087)	.068* (−.008, .144)
Prior ED visit count	.138*** (.066, .210)	.104** (.006, .202)
Posterior fusion	.334* (−.031, .698)	.352* (−.023, .727)
Admission source=Non-routine (ref=routine)	.669*** (.295, 1.042)	.555*** (.138, .972)
Admission source=Not specified	.344** (.041, .647)	.447*** (.133, .761)
Any imputed variable	.486*** (.245, .727)	.335*** (.085, .585)
Race=Black	.493*** (.181, .805)	.757*** (.433, 1.082)
Constant	−7.501** (−13.896, −1.106)	−6.466* (−13.388, .456)
Observations	6,352	6,083
Akaike inf. Crit	2,686.382	2,529.932

Note: *P < .1; ** P < .05; *** P < .01.

Figure 2.

Receive operating characteristic (ROC) curves showing model discrimination predicting readmission for the cervical (A) and lumbar (B) cohorts.

Figure 3.

A comparison of the most influential variables between the models built using real versus synthetic data to predict readmission for the cervical (A) and lumbar (B) cohorts.

In the real dataset, 3.0% of patients experienced a 30-day complication compared to 3.4% in the synthetic dataset (P = .2), with respiratory failure (1.4% in both datasets) being the most common event (Table 1). The c-statistics for predicting a 30-day complication were similar in models developed using real (c = .83 for logistic regression; c=.82 for XGBoost) and synthetic datasets (c = .86 for logistic regression; c = .84 for XGBoost). Like with readmission prediction, there was substantial overlap among the most influential predictors in both the logistic regression and XGBoost models (Supplemental Appendix E-Figure 3; Supplemental Appendix E-Figure 2). However, there were also notable differences (e.g., hemoglobin was more influential in the real vs synthetic model).

Lumbar Fusion Cohort

In the lumbar fusion cohort, we identified 12,111 real and 12,126 synthetic patients. There was a small amount of censoring in this population, which was restricted to race (2.0%), visit type (<.1%), and admission source (1.3%). Similar to the cervical cohort, the only variables with significant differences between the real and synthetic groups were race and number of recent hospital and ED admissions. However, the magnitude of the differences was small. Other categorical variables and all continuous variables were nearly identical between the 2 cohorts. These descriptive characteristics are summarized in Table 3 and Figure 1B.

Table 3.

Descriptive statistics of the lumbar fusion cohort.

Characteristic	Original, N = 12,111^a	Synthetic, N = 12,126^b	P-value^b
Gender			>0.9
Female	6,599 (54%)	6,601 (54%)
Male	5,511 (46%)	5,525 (46%)
Gender unknown	1 (<.1%)	0 (0%)
Censored	0 (0%)	0 (0%)
Age	63 (54, 70)	63 (54, 70)	>0.9
Race			<.001
White	11,058 (91%)	10,976 (91%)
Black	736 (6.1%)	690 (5.7%)
Other	307 (2.5%)	222 (1.8%)
Censored	0 (0%)	238 (2.0%)
Unknown	10	0
Visit type			<.001
Inpatient	11,544 (97%)	11,561 (97%)
Outpatient/SDS	304 (2.6%)	353 (3.0%)
Censored	0 (0%)	9 (<.1%)
Unknown	263	203
Diabetes	2,176 (18%)	2,184 (18%)	>0.9
Hypertension	5,953 (49%)	5,969 (49%)	>0.9
Depression	2,214 (18%)	2,216 (18%)	>0.9
Anxiety	1,673 (14%)	1,673 (14%)	>0.9
Obesity	2,396 (20%)	2,400 (20%)	>0.9
COPD	1,795 (15%)	1,800 (15%)	>0.9
Opioid use	4,050 (33%)	4,061 (33%)	>0.9
Steroid use	106 (.9%)	108 (.9%)	0.9
CHF	448 (3.7%)	449 (3.7%)	>0.9
Drug use	198 (1.6%)	195 (1.6%)	0.9
SSRI use	199 (1.6%)	199 (1.6%)	>0.9
Chronic kidney disease	575 (4.7%)	578 (4.8%)	>0.9
Liver	265 (2.2%)	265 (2.2%)	>0.9
Peripheral vascular disease	274 (2.3%)	275 (2.3%)	>0.9
Spine trauma	265 (2.2%)	267 (2.2%)	>0.9
Spine infection	66 (.5%)	66 (.5%)	>0.9
Scoliosis	2,517 (21%)	2,518 (21%)	>0.9
Spondylolisthesis	5,090 (42%)	5,094 (42%)	>0.9
Interbody	676 (5.6%)	677 (5.6%)	>0.9
Readmission	1,048 (8.7%)	1,048 (8.6%)	>0.9
Surgical site infection	58 (.5%)	58 (.5%)	>0.9
Thromboembolic event	159 (1.3%)	160 (1.3%)	>0.9
Pneumonia	125 (1.0%)	125 (1.0%)	>0.9
Postop hematoma	32 (.3%)	31 (.3%)	0.9
Myocardial infarction	16 (.1%)	16 (.1%)	>0.9
Acute kidney injury	192 (1.6%)	191 (1.6%)	>0.9
Respiratory event	177 (1.5%)	179 (1.5%)	>0.9
Any complication	607 (5.0%)	626 (5.2%)	0.6
BMI	29.8 (26.0, 34.0)	29.8 (26.1, 33.9)	>0.9
Unknown	2,990	2,991
Sodium (min)	139 (137, 141)	139 (137, 141)	>0.9
Unknown	4,614	4,610
Sodium (max)	140 (138, 141)	140 (138, 141)	>0.9
Unknown	4,614	4,611
Hemoglobin	11.50 (10.00, 13.10)	11.50 (10.00, 13.10)	>0.9
Unknown	1,509	1,508
WBC (min)	7.00 (5.80, 8.60)	7.10 (5.80, 8.60)	>0.9
Unknown	3,780	3,776
WBC (max)	7.20 (5.90, 9.00)	7.20 (6.00, 9.00)	>0.9
Unknown	3,780	3,777
Prior ED visit count			.001
0	10,058 (83%)	10,069 (83%)
1	1,310 (11%)	1,438 (12%)
2	403 (3.3%)	346 (2.9%)
3	164 (1.4%)	135 (1.1%)
4_plus	176 (1.5%)	138 (1.1%)
Prior hospital admit count			.001
0	4,729 (39%)	4,881 (40%)
1	6,350 (52%)	6,357 (52%)
2	768 (6.3%)	690 (5.7%)
3_plus	264 (2.2%)	198 (1.6%)
Medications prescribed	2 (0, 9)	2 (0, 9)	0.9
Admit source			<.001
Routine	9,202 (89%)	9,154 (88%)
Non-routine	450 (4.3%)	419 (4.0%)
Not specified	718 (6.9%)	702 (6.7%)
Censored	0 (0%)	135 (1.3%)
Unknown	1,741	1,716

^an (%); Median (IQR).

^bFisher’s exact test; Wilcoxon rank sum test; Pearson’s Chi-squared test.

Outcome Prediction

Nearly 9% of patients were readmitted within 30days of lumbar fusion in both the real and synthetic cohorts. The c-statistics for the model developed using real data (c = .66 for logistic regression; c = .68 for XGBoost) were similar to models developed using synthetic data (c = .68 for logistic regression; c = .69 for XGBoost) (Figure 2B). Influential and complete model predictors from the logistic regression are shown in Table 4 and Supplemental Appendix E-Table 4. As shown in the Table, while 6 predictors were significant in both models, 7 were only significant in 1 dataset, including maximum sodium value and comorbid diabetes. In the XGBoost analysis, age, hemoglobin, and WBC value were influential in both models, though the importance of these variables differed between the 2 (Figure 3B).

Table 4.

Influential logistic regression parameters predicting 30-day readmission for the lumbar fusion cohort by dataset. Variables with P < .10 in either dataset are shown.

	Real	Synthetic
Age	.018^*** (.010, .025)	.014^*** (.007, .021)
Diabetes	.221^** (.024, .418)	.001 (−.209, .210)
Hemoglobin	−.051^** (−.097, −.005)	−.034 (−.080, .013)
BMI	.020^** (.004, .037)	.026^*** (.009, .042)
Steroid use	.758^** (.181, 1.334)	.210 (−.443, .864)
CHF	.377^** (.048, .706)	.673^*** (.338, 1.008)
Sodium (max)	−.079^** (−.151, −.008)	−.043 (−.117, .030)
WBC (max)	.047^* (−.006, .100)	.028 (−.030, .086)
Liver	.504^** (.101, .908)	.234 (−.235, .702)
Prior ED visit count	.107^*** (.030, .184)	.083^** (.003, .164)
Medications prescribed	.006^* (−.001, .012)	.010^*** (.003, .017)
Interbody	−.480^** (−.918, −.042)	−.376^* (−.805, .053)
Multilevel fusion	.592^*** (.344, .839)	.514^*** (.257, .771)
Anterior fusion	−.108 (−.440, .224)	−.290^* (−.634, .054)
Race=Black (ref=White)	.685^*** (.411, .958)	.670^*** (.379, .961)
Observations	8,479	8,322
Akaike inf. Crit	4,764.994	4,691.151

Note: *P < .1; ** P < .05; *** P < .01.

A total of 5.0% patients in the real and 5.2% in the synthetic datasets experienced a complication within 30days of surgery (P = .6), with acute kidney injury (1.6%) being the most common event. Again, discrimination was similar in models developed using real (c=.74 logistic regression; c = .76 XGBoost) and synthetic (c = .75 logistic regression; c = .80 XGBoost) datasets. The most influential predictors were similar in both sets of models using XGBoost, though 12 variables were only significant in 1 dataset in the logistic regression model. The most influential variables from each model are shown in E-Table 5 and E-Figure 3.

Discussion

In this study of a large, multi-hospital system, we found that synthetic data derivatives were effectively able to simulate the descriptive and predictive characteristics of real data from anterior cervical and posterior lumbar fusion patients. In descriptive analyses, real and synthetic data showed nearly identical characteristics, with the exception of slight differences in some categorical variables. Using both regression and machine learning prediction analyses, models built using synthetic data showed similar predictive performance as models built using real data when tested in real testing datasets. While there were some differences noted among the most influential predictors, these overall results suggest that synthetic data derivatives may be appropriate for many spine surgery predictive analyses.

Corresponding to unabated grow in healthcare data, there is rapidly growing interest in using “big data” in spine surgery.^7,24,25 Although administrative billing are a highly accessible source for multicenter analyses, there is growing recognition that such datasets are limited both in the accuracy and breadth of the variables available.^24,26 Clinical registries offer a high quality alternative,^27-29 but their high cost and potentially restrictive data sharing policies limit their growth.^30,31 Additionally, clinical registries typically rely on manual data abstraction and do not incorporate comprehensive laboratory and prescription records, limiting potential uses. While the widespread availability of hospital EHRs has created opportunities to expand use of these data assets,^12,32 progress advancing EHR-based research has been tepid.

Circumventing many problems related to patient privacy and data ownership, synthetic data derivatives offer a promising opportunity to leverage artificial intelligence analytics to streamline spine surgery research using EHR data. Indeed, such methods are becoming increasingly common in non-surgical populations,^15,33 including a recent effort by the National Institutes of Health to use synthetic data for COVID-19 research.³⁴ Our results in 2 spine surgery populations suggest that synthetic data derivatives almost entirely replicate population descriptive characteristics, while also closely simulating predictive performance. These findings suggest a wide array of potential applications, including epidemiological analyses, studies of surgical trends, and profiling quality outcome metrics. As patient-reported outcome measures are integrated as structured data elements in the EHR, opportunities for comparative effectiveness analyses will also expand substantially.

Beyond the availability of structured data elements, the ultimate impact of synthetic data derivatives will depend on the extent of stakeholder buy-in. At the authors’ institution, the use of synthetic data markedly expedites EHR research by removing the need for either IRB approval or the assistance of paid data brokers to facilitate access. Therefore, physicians and clinical investigators are given immediate access to conduct EHR queries across the health system, which has implications for both quality improvement investigations and exploratory research efforts.

However, likely the greatest potential benefit of synthetic datasets in spine surgery relates to the creation and sharing of multicenter datasets supported by multi-institution partnerships. For example, in Israel, a major medical center linked records of COVID-19 patients with a large insurance provider to obtain comprehensive health histories and medical records. Lacking privacy concerns, synthetic datasets based on these linked data were then made available to external data scientists to expedite research efforts.³⁵ Such examples illustrate the power of synthetic data to facilitate scientific advances when healthcare organizations work together to create and share such datasets.

Despite these opportunities, our findings also highlight important limitations to using synthetic data derivatives. First, there were slight but statistically significant descriptive differences between real and synthetic data related to rare categorical variables (e.g., some racial minorities; number of recent ED visits). While the size of these differences was generally small, the impact may be magnified in rare subgroups. Consequently, real data should likely be used to verify precise treatment effects related to very rare outcomes or small subgroups (e.g., rare demographic subgroups). Second, although discrimination was comparable in models built using real and synthetic data, there were differences noted among the most influential predictors identified. Therefore, while synthetic data appear capable of predicting clinically relevant endpoints in real populations, they may not be best-suited for analyses focused on identifying the impact of individual predictors in complex multivariable analyses. Third, since the MDClone platform is proprietary, users can validate its performance but cannot investigate the detailed methods of its underlying algorithm. Finally, synthetic data derivatives are dependent on the structured data encoded within the RDC. At our institution, the RDC does not currently include some clinically relevant outcomes, like patient-reported outcomes and pain scores. As such additional data elements are added to the RDC, opportunities for synthetic data research will expand.

Conclusions

Synthetic data derivatives offer a novel approach for leveraging EHR analytics to support multicenter spine surgery research. In this initial validation in 2 spine surgery populations, the descriptive characteristics and predictive performance from synthetic data closely mirrored those obtained using real data. As both the use of structured EHR data and buy-in from large organizations expand, synthetic data derivatives are likely to assume a growing role in spine surgery research.

Supplemental Material

sj-pdf-1-gsj-10.1177_21925682221085535 - Leveraging Artificial Intelligence and Synthetic Data Derivatives for Spine Surgery Research

Supplemental Material, sj-pdf-1-gsj-10.1177_21925682221085535 for Leveraging Artificial Intelligence and Synthetic Data Derivatives for Spine Surgery Research by Jacob K. Greenberg, Joshua M. Landman, Michael P. Kelly, Brenton H. Pennicooke, Camilo A. Molina, Randi E. Foraker, and Wilson Z. Ray in Global Spine Journal

Footnotes

Acknowledgment

We thank Dr Noa Zamstein for her valuable input on our search strategy and her feedback on the manuscript content.

Declaration of Conflicting Interests

No Authors report any financial conflicts of interest. Drs. Greenberg and Foraker have delivered one or more webinars on the use of MDClone and received a nominal gift card in appreciation. Dr. Ray received research support from the Defense Advanced Research Projects Agency, Department of Defense, Missouri Spinal Cord Injury Foundation, National Institute of Health/NINDs, Hope Center, and Johnson & Johnson. Dr. Ray reports: stock/equity in Acera surgical; consulting support from Depuy/Synthes, Globus, and Nuvasive; royalties from Depuy/Synthes, Nuvasive, Acera surgical. Dr. Foraker received no funding specifically related to this study. Dr. Foraker reports research support from the Washington University Institute for Public Health, National Institutes of Health, Global Autoimmune Institute, Agency for Healthcare Research and Quality, Siteman Investment Program, Alzheimer’s Drug Discovery Foundation, and Children’s Discovery Institute. Dr. Kelly reported no funding related to this submission. Dr. Kelly received research support from the Setting Scoliosis Straight Foundation and the International Spine Study Group Foundation. Dr. Kelly received personal fees from The Journal of Bone and Joint Surgery. Dr. Molina reported equity in Augmedics and consulting fees from Depuy/Synthes and Kuros.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Dr Greenberg was supported by grants from the Agency for Healthcare Research and Quality (1F32HS027075-01A1), the Thrasher Research Fund (#15024), and a Young Investigator research grant from AO Spine North America. This study received no dedicated funding.

Ethics Approval

This study was reviewed by the Washington University in St Louis institutional review board (IRB) and granted exempt status (IRB #202012138) and a waiver of HIPAA authorization. Because of the waiver of HIPAA authorization, patient consent was not required or obtained.

ORCID iDs

Jacob K. Greenberg

Michael P. Kelly

Brenton H. Pennicooke

Supplemental Material

Supplemental material for this article is available online.

References

Burns

Rohrich

Chung

. The levels of evidence and their role in evidence-based medicine. Plast Reconstr Surg. 2011;128(1):305-310.

Reith

Landray

Devereaux

, et al. Randomized clinical trials - removing unnecessary obstacles. N Engl J Med. 2013;369(11):1061-1065.

James

Rao

Granger

. Registry-based randomized clinical trials-a new clinical trial paradigm. Nat Rev Cardiol. 2015;12(5):312-316.

Vickers

Scardino

. The clinically-integrated randomized trial: proposed novel method for conducting large trials at low cost. Trials. 2009;10(1):14.

Weinstein

Tosteson

Lurie

, et al. Surgical vs nonoperative treatment for lumbar disk herniation. JAMA. 2006;296(20):2441-2450.

Kelly

Lurie

Yanik

, et al. Operative versus nonoperative treatment for adult symptomatic lumbar scoliosis. J Bone Joint Surg. 2019;101(4):338-352.

Bydon

Schirmer

Oermann

, et al. Big data defined: a practical review for neurosurgeons. World Neurosurgery. 2020;133:e842-e849.

Asher

Parker

Rolston

Selden

McGirt

. Using clinical registries to improve the quality of neurosurgical care. Neurosurg Clin. 2015;26(2):253-263.

Asher

Speroff

Dittus

, et al. The national neurosurgery quality and outcomes database (N2QOD). Spine. 2014;39(22):S106-S116.

10.

American Spine Registry . American spine registry, the national quality improvement registry for spine care. November 22, 2020. https://www.americanspineregistry.org. https://www.americanspineregistry.org. American Spine Registry; Published 2020 Accessed

11.

International Spine Study Group . International spine study group; Published 2020. December 24, 2020. https://issgf.org. https://issgf.org

12.

The Office of the National Coordinator for Health Information Technology . Hospitals partherpating in the CMS EHR incentive programs,’ health IT quick-Stat #45. Dashboard.healthit.gov/quicksther/pages/FIG-Hospitals-EHR-Incentive-Programs.php. Published 2017. Accessed. March 18, 2020;2020.

13.

Kushida

Nichols

Jadrnicek

Miller

Walsh

Griffin

. Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies. Med Care. 2012;50 Suppl:S82-S101.

14.

Foraker

Mann

Payne

PRO

. Are synthetic data derivatives the future of translational medicine? JACC (J Am Coll Cardiol): Basic to Translational Science. 2018;3(5):716-718.

15.

Foraker

Gupta

, et al. Spot the difference: comparing results of analyses from real patient data and synthetic derivatives. JAMIA Open. 2020;3(4):557-566.

16.

Guo

Foraker

MacGregor

Masood

Cupps

Pasque

. The use of synthetic electronic health record data and deep learning to improve timing of high-risk heart failure surgical intervention by predicting proximity to catastrophic decompensation. Frontiers in digital health. 2020;2(44):576945.

17.

Catalyst

. Hospital readmissions reduction program (HRRP). NEJM Catalyst. 2018.

18.

Walonoski

Kramer

Nichols

, et al. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inf Assoc. 2017;25(3):230-238.

19.

Erez

. Computer system of computer servers and dedicated computer clients specially programmed to generate synthetic non-reversible electronic data records based on real-time electronic querying and methods of use thereof. Google Patents. 2019; US Patent number 10,235,537.

20.

Stekhoven

Bühlmann

. MissForest--non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012 Jan 1;28(1):112-8.

21.

Wright

Ziegler

. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. arXiv preprint arXiv:150804409. 2015.

22.

Molnar

Casalicchio

Bischl

. iml: An R package for interpretable machine learning. Journal of Open Source Software. 2018;3(26):786.

23.

Hanley

McNeil

. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29-36.

24.

Greenberg

Otun

Ghogawala

, et al. Translating Data Analytics Into Improved Spine Surgery Outcomes: A Roadmap for Biomedical Informatics Research in 2021. Global Spine J. 2021. Press

25.

Dash

Shakyawar

Sharma

Kaushik

. Big data in healthcare: management, analysis and future prospects. Journal of Big Data. 2019;6(1):54.

26.

Lawson

Louie

Zingmond

, et al. A comparison of clinical registry versus administrative claims data for reporting of 30-day surgical complications. Ann Surg. 2012;256(6):973-981.

27.

Asher

McCormick

Selden

Ghogawala

McGirt

. The national neurosurgery quality and outcomes database and neuropoint alliance: rationale, development, and implementation. Neurosurg Focus. 2013;34(1):E2.

28.

Park-Reeves Syringomyelia Research Consortium . Park-reeves syringomyelia research consortium. Published 2020. Accessed. April 4, 2020;2020. https://park-reeves.wustl.edu/Default.aspx

29.

Hydrocephalus Clinical Research Network . Hydrocephalus clinical research network. Published 2020.Accessed. May 14, 2020;2020. http://hcrn.org/

30.

Pugely

Martin

Harwood

, et al. Database and registry research in orthopaedic surgery. J Bone Joint Surg. 2015;97(21):1799-1808.

31.

American College of Surgeons . ACS NSQIP hospital participation requirementsamerican college of surgeons. Published 2020. Accessed. March 18, 2020;2020. https://www.facs.org/quality-programs/acs-nsqip/joinnow/participation

32.

Pittman

Miranpuri

. Neurosurgery clinical registry data collection utilizing informatics for Integrating Biology and the Bedside and electronic health records at the University of Rochester. Neurosurg Focus. 2015;39(6):E16.

33.

Reiner Benaim

Almog

Gorelik

, et al. Analyzing medical research results based on synthetic data and their relation to real data results: systematic comparison from five observational studies. JMIR Medical Informatics. 2020;8(2):e16492.

34.

MDClone and national institutes of health (NIH) to support national scientific exploration for COVID-19 efforts. https://www.mdclone.com/news-press/articles/mdclone-and-national-institutes-of-health. MDClone; Published 2020. Accessed March, 30, 2021.

35.

Lieber

. The people in this medical research are fake. the innovations are real. Wall St J. 2021.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.73 MB