Abstract
Study Design
Longitudinal Cohort.
Objectives
To determine if a surgeon’s clinical judgment can predict clinical outcomes after surgery for LDH.
Methods
Surgeons provided an opinion on outcomes in patients with lumbar disc herniation (LDH) with a series of seven faces denoting the Global Perceived Effect (GPE) as “Very bad” (GPE1), “Bad” (GPE2), “Fairly bad”(GPE3), “No change”(GPE4), “Fairly good” (GPE5), “Good” (GPE6) and “Very Good” (GPE7). Standard demographic, surgical and outcomes were collected prior to and 1 year after surgery. Patients were then stratified based on the surgeon’s clinical judgement and change in 1 year outcome measures were compared.
Results
Of 153 subjects, 110 (72%) had 1 year data with 0 GP1, 1 GPE2, 4 GPE3, 5 GPE4, 36 GPE5, 48 GPE6 and 16 GPE 7. Only patients in GPE3 to GPE7 were included in the analysis. There was no difference in demographic or surgical parameters among the GPE groups. Improvements in ODI, EQ5D and SF36PCS were greatest in the GPE7 followed by the GPE6 and GPE5. GPE5 and GPE4 had similar improvements, while GPE3 had less improvement than GPE4. Improvement in VAS back and leg pain was similar the GPE7, GPE6 and GPE5 group, with less improvement seen in the GPE4 and GPE3 groups.
Conclusions
The current study shows that although the significance of mathematical modeling, artificial intelligence and machine learning as an analytical way of predicting outcomes, it is crucial not to underestimate the value of clinical intuition in patient counseling and predicting clinical outcomes after surgery for LDH.
Keywords
Introduction
Predicting clinical outcomes after surgery is an essential part of clinical decision making and patient counseling. Thus, an increasing number of models and on-line calculators1-8 have been developed to predict outcomes in patients choosing to have surgery for lumbar disc herniation (LDH). The patient’s demographics, baseline patient reported outcomes (PROs), smoking and employment history are entered in these calculators and probabilities for improvement in outcomes are calculated using regression or machine learning algorithms.1-8 Several studies evaluating the accuracy of these models at predicting patient improvement after surgery vary widely.9-13
Studies examining the accuracy of the surgeon predicting outcomes based on intuition or “gut feel” after elective surgery are sparse and are mostly directed towards risks of complications.14-20 The surgeon’s prediction of clinical outcomes is an ‘‘educated’’ subjective assessment based on clinical experience and evaluation of the specific patient clinical scenario. Gestalt relies on the ability of the physician to recognize signs and symptoms that can distinguish patients with severe and less severe disease21-23 or between patients who will do well or not after surgery. Few studies examine the predictive value of a clinician’s gestalt for subsequent clinical outcomes. 21 There has been no formal evaluation of predicting clinical outcomes solely on a surgeon’s clinical judgment based on patient gestalt.
The purpose of this study is to evaluate the accuracy of a surgeon’s intuition in predicting changes in Patient-reported Outcome Measures (PROMs) in patients with LDH opting for surgical treatment.
Methods
The study was approved by the Lillebaelt Hospital Research Board with a waiver of informed consent. Patients with LDH referred to a tertiary spine clinic for surgical consult were evaluated by one of ten experienced spine surgeons at a tertiary spine specialty hospital. All participating spine surgeons were fellowship-trained specialists; nine were originally trained in orthopedic surgery, while one had a background in neurosurgery; and had been in practice for at least 5 years. Data available to the evaluating surgeon included standard demographic data including age, sex, smoking status, work status, prior treatments including surgery, medical co-morbidities. Patient-reported Outcome Measures including the Oswestry Disability Index (ODI),24,25 the EuroQol-5D 3 Level (EQ5D),
26
and the Visual Analog Scale (0 to 100) for Back Pain (BP) and Leg Pain (LP) which the patient completed immediately prior to the visit was available as well. As the option for surgical treatment of LDH is preference based, the surgeon discussed the risks and benefits of surgical vs non-surgical treatment with the patient and the patient’s family as part of a shared decision-making process. Once the patient opted for surgery, the surgeon was asked to provide an opinion on the outcomes of surgery for the specific patient considering all the data that was available to them with a series of seven faces (Figure 1) denoting the Global Perceived Effect (GPE) as “Very bad” (GPE1), “Bad” (GPE2), “Fairly bad”(GPE3), “No change”(GPE4), “Fairly good” (GPE5), “Good” (GPE6) and “Very Good” (GPE7). This opinion was not communicated directly to the patient. Series of seven faces denoting the Global Perceived Effect (GPE).
Standard demographic and surgical data was collected as well as PROMs including the Oswestry Disability Index (ODI),24,25 the EuroQol-5D 3 Level (EQ5D),, 26 the Physical Composite Score (PCS) and Mental Composite Score (MCS) of the Short Form-36 (SF36) 27 and the Visual Analog Scale (0 to 100) for Back Pain (BP) and Leg Pain (LP) prior to surgery and 1 year after surgery. Patients were then stratified based on the surgeon’s GPE and change in 1 year outcome measures were compared.
All statistical analyses were performed using SPSS v 28.0 (IBM Corp, Armonk, NY). Comparisons among the seven GPE categories were performed using one-way Analysis of Variance (ANOVA) for continuous variables with post-hoc analyses (Tukey’s test) and Fisher’s exact test for categorical variables. A threshold P-value of 0.05 was set for statistical significance.
Results
Summary of Demographic and Surgical Data.
*P-value is from One-way ANOVA with post-hoc comparisons among the five groups.
Summary of Pre-operative and One-Year Change in Patient-Reported Outcome Scores.
*P-value is from One-way ANOVA with post-hoc comparisons among the five groups.
Discussion
The results of the current study highlight the clinical relevance of a surgeon’s Global Perceived Effect (GPE) based on gestalt in predicting clinical improvements after LDH surgery. The GPE appears to be effective in assessing improvements in ODI, EQ5D, and SF-36. However, its predictive ability is less evident in predicting improvements in VAS back and leg pain. This may indicate that a surgeon’s clinical judgment is valuable in certain domains of patient outcomes, but may not fully capture the nuances of pain-related improvements. It may also indicate that PROMs that have multiple domains and convey functional limitations better reflect the impact of the disease on the patient’s quality of life and surgeons can better intuit this. Whereas, quantifying pain scores has always been difficult as it requires the patient to provide a subjective interpretation of the pain experience in the past week and then assign a value to the measurement scale28-31
The discriminative ability of a surgeon’s intuition was more evident in identifying patients who will have “Very Good” or “Good” outcomes after surgery for LDH, but less so for patients that will have “No change” or “Fairly Good” outcomes after surgery. One reason may be the small number of cases in these two groups. Another reason may be that the faces used for the GPE did not adequately convey a distinction between the “No Change” and “Fairly Good” groups. Using a five-point Likert scale instead of a seven-point scale used in this study may yield clearer difference.32,33 However, it would be difficult to collapse groups in the current study as GPE4 and GPE5 were similar for the ODI, while GPE3 and GPE4 were similar for the EQ-5D and GPE3, GPE4 and GPE5 are similar for the SF-36.Interestingly, the mean change in all the outcome scores in the patients in the “Fairly bad” group still exceeded published minimum clinically important differences34-36
There are limitations to the study. There are few to no patients in the “Very bad”, “Bad”, “Fairly bad”, “No change” cohorts which is to be expected as surgeon’s would likely advice these patients against surgery. It would have been helpful to note why patients in the “Fairly bad” and “Bad” and “No change” cohorts still opted for surgery. The reasoning behind the surgeon’s opinion regarding the GPE was not specifically sought for. That is, which specific patient characteristic led the surgeon to give an opinion that the patient will or will not do well after surgery.
Treatment decisions for lumbar disc herniation is preference based, as the patient can opt for non-surgical care without the risks of surgery but with a possible prolonged clinical course or they can opt for surgery that has risks but can also shorten the clinical course. Clinical decision-making, especially in the setting of preference based treatment is difficult. Decision-making is partly based on the dual-process theory by Epstein 37 and Hammond 38 that posits two separate cognitive operations, intuitive and analytical. Intuitive thinking is generated without much conscious effort and uses available information subconsciously by pattern recognition based on past experience. 39 Analytical thinking is more deliberate, consciously processing data. 40
The current study shows that although the significance of artificial intelligence (AI) in modeling as an analytical way of predicting outcomes after lumbar surgery, it is also crucial not to underestimate the value of clinical intuition in patient counseling and predicting clinical outcomes after surgery for LDH. Combining AI-enabled predictive models with clinical intuition of experienced clinicians —“hybrid intelligence” – might have the potential to improve the accuracy of outcome predictions in lumbar disc herniation surgeries. Future studies should aim to include more patients in the GPE1, GPE2 and GPE3 groups through a multi-center project, exploring factors and clinical variables surgeons used in their predictions to understand how these judgements were made to help improve or standardize these processes and to include patients who did not opt for surgery.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
