Abstract
Introduction:
Treatment decisions for ruptured abdominal aortic aneurysms (rAAA) patients are made under extreme time pressure and often with limited patient information and without opportunity for multidisciplinary consultation. This study describes the development of a decision-support instrument (DSI) to predict how experts would decide in real-life emergency scenarios, based on patient characteristics including frailty, estimated life expectancy, and treatment preference.
Methods:
The DSI was developed using a discrete choice experiment. A multidisciplinary group of medical specialists, including vascular/oncological surgeons, anesthesiologists, intensivists, and geriatricians, defined the relevant criteria for deciding whether to operate on an rAAA based on patient-specific factors and expert opinion. The group identified relevant levels for each criterion, with an increasing risk for worse outcomes. Criteria that would immediately exclude a patient from treatment were designated as knockout criteria. An expert panel of 21 medical specialists evaluated 30 fictional patient scenarios, which resulted in a weight for each criterion expressed as relative importance (RI); a measure of how much each criterion, compared to the sum of all criteria, contributes to the model’s outcome. The finalized DSI predicts the percentage of medical specialists who would choose to operate on a specific patient, based on the input criteria for that patient.
Results:
In 4 criteria sessions with medical experts, consensus was reached on 11 criteria that were judged to influence the decision whether to operate or not: cardiopulmonary resuscitation, patient’s wish for operation, renal function, age, life expectancy, endovascular treatment options, hemoglobin blood concentration, mean arterial pressure, pulmonary burden, cardiac burden, and clinical frailty score. Nine criteria had a significant impact on the decision; the most significant were cardiac burden (RI, 15%), age (RI, 13%), life expectancy (RI, 13%), and Clinical Frailty Scale and pulmonary burden (both RI, 12%).
Conclusion:
A clinical DSI was developed, based on codified multidisciplinary peer expertise, to support real-life medical decision-making during acute treatment planning for rAAA. Three parameters that are underreported in previous scoring systems (patient’s frailty, estimated life expectancy, and patient’s desire to be operated) were considered most important for treatment decision.
Clinical Impact
Treatment decisions for ruptured abdominal aortic aneurysm patients are time-critical and morally complex, often made with limited clinical information and without multidisciplinary consultation. This study describes the development of a transparent decision-support instrument (DSI) based on multidisciplinary peer expertise. By reflecting decision logic and criteria considered important by experts, the DSI may contribute to more consistent decision-making and help identify patients for whom palliative care might be more appropriate. Further validation is required to determine its clinical utility, although this approach may support clinicians in urgent, morally complex scenarios.
Keywords
Introduction
A ruptured abdominal aortic aneurysm (rAAA) is a life-threatening condition, with mortality rates ranging from 59% to 83%. 1 Despite therapeutic advances, such as endovascular aneurysm repair (EVAR), peri-operative mortality and morbidity rates are high, with a 30-day mortality rate of 21% in patients treated with EVAR compared with 25% for open surgery, respectively.2–4 Although survival is often considered as the primary outcome, considering the potential decline in quality of life experienced by patients after the operation is crucial when making decisions regarding surgical intervention. 5 To have the best chances of survival and the best functional outcome, the patient must be operated on as soon as possible. Decisions are typically made with limited information available and with limited time for multidisciplinary case discussion. In these urgent situations, it is useful to have a decision-support instrument (DSI) that supports the treating physician(s) and the surgical team in making the decision whether to operate or not.
Several mortality risk stratification scores have been developed to guide the multidisciplinary team in the decision whether to operate on an rAAA patient. 6 These include the Dutch Aneurysm Score, the Vascular Study Group of New England score, and the Harborview Medical Center score.7–9 These scoring systems seem to accurately predict postoperative mortality after rAAA; nevertheless, the European Society for Vascular Surgery (ESVS) does not recommend to base clinical decision-making on withholding treatment entirely on these scoring systems. 6 The Journal of Vascular Surgery guidelines recommend informing patients of their peri-operative mortality risk using the Vascular Quality Initiative (VQI) risk score when considering elective open repair or EVAR. This is particularly emphasized for high-risk patients to support informed decision-making. However, it is important to note that the VQI score is not intended for use in patients with ruptured aneurysms.10–14
This report introduces a DSI based on codified multidisciplinary medical peer expertise. The model is based on a discrete choice experiment, a technique that is widely used outside the medical field to determine the preferences of a group of people.15–17 A panel makes repeated choices on scenarios that contain generated criteria values. The aim of this study was to develop a DSI that reflects criteria that are deemed important by a multidisciplinary team of medical specialists experienced in the treatment of rAAA patients. The instrument may support the treatment team in real-life medical decision-making for rAAA treatment.
Methods
No ethical approval was required, as this study did not involve patient data or any identifiable personal information.
Study Outline
For this study, the University Medical Center Groningen initiated a collaboration with Councyl AI (Councyl B.V., Delft, the Netherlands). The study was conducted between May 2021 and August 2022 according to the method compiled by Ten Broeke et al. 15 In this approach, a choice experiment containing hypothetical choice scenarios is used to collect expert decisions, Figure 1. Those decisions are then used to estimate weights for each criterion. The resulting choice instrument can assess real-life decision situations. This methodology is also known as behavioral artificial intelligence technology (BAIT) and has been described previously in detail.15,18,19

Study workflow chart.
The first step in BAIT is to specify the expert decision of the model, in this study: ‘to perform (endo)vascular surgery on a patient with an rAAA with a specific medical profile,’ and to identify factors that play a role in that expert decision (e.g.; age, hemoglobin level). To determine these factors and to eventually conduct the choice experiment, experts in this field of practice were selected using a purposive sampling strategy. Clinical experts included vascular surgeons, anesthesiologists, and intensivists from a high-volume vascular center. To incorporate ethical and patient-centered decision-making perspectives, we also invited geriatricians and an oncological surgeon, who often contribute to end-of-life or frailty-related decisions. All experts had at least 5 years of clinical experience and were regularly involved in emergency decision-making for rAAA. This approach ensured that the final model reflected both medical considerations and ethical aspects of real-life decision-making.
The choice instrument structure was set by deciding on elements, such as nonlinear weights, determining knockout criteria and interaction effects, for determination of the DSI depending on the situation. The instrument structure was set up first, and a pilot choice experiment was conducted with four experts.
Then, the final choice experiment was conducted with an invited group of 21 experts (also referred to as “the expert group”). All members of the expert group were asked to make individual choices based on 30 hypothetical patient scenarios that mimic real situations created by Councyl AI. The observed choices made by the expert group were used to estimate the importance weights of all factors, including their signs and nonlinear curvatures, using maximum likelihood techniques. The instruments’ weights were iteratively adjusted to match the experts’ actual choices, improving prediction accuracy until no further improvements were possible. Logistic regression was used to estimate the weights for each criterion and determine relative importance (RI) to make the DSI available for clinical application. Finally, the results were presented to the expert group, showing visualizations of factor weights and their contributions to decisions.
Relative importance
Understanding RI is a key to interpreting this decision model. The model was built using a set of input criteria (as determined in the criteria sessions) to predict our target variable (the decision to operate or not). The RI of each criterion indicates how much it contributes to the prediction of our model; thus, some may have a stronger impact than others. The RI improves the model’s interpretability because it helps understanding on which criteria the model is relying most or least. Recognizing important criteria can help guide real-life decision-making, can reduce unnecessary complexity, and can improve the efficiency of a model.
Expert decision analysis/defining relevant criteria
In this study, what the expert panel meant when they said they would operate or not in a hypothetical scenario is: if in their expert opinion and under realistic time pressure, they would proceed with surgery (yes, operate) in that specific case or they would opt for best supportive care (no, don’t operate).
The goal of this study is not to define a primary outcome such as survival, quality of life, or cost, as the driver behind this decision but to model the expert decision-making process. Therefore, relevant decision criteria with their corresponding levels (e.g. for age: <80 years, 80-90 years, >90 years) in the treatment for rAAA were chosen in an expert focus group. We intentionally did not ask experts to identify a single primary outcome guiding their decision. Instead, we applied BAIT to quantify the weight (RI) of each decision factor (criterion) across systematically varied scenarios during the choice experiment. Criteria sessions were facilitated by a trained moderator (M.L.D.) who ensured balanced input and prevented dominance by individual participants. A structured discussion framework was used to guide the process. This framework included clinical factors known from the literature to influence outcomes after rAAA, including established risk scores and prognostic variables.7–9 While no formal systematic review was conducted, a targeted search in Medline, Embase, and Cochrane Central Register of Controlled Trials was performed by M.L.D., and the findings were used to inform and stimulate the expert discussion. Some levels of specific criteria might immediately result in the decision not to operate irrespective of the other criteria; these levels were labeled as “knockout criterion levels.”
In addition, the focus group identified certain combinations of criterion levels as clinically implausible, meaning they would be unlikely to occur in real-life scenarios. For example, a Clinical Frailty Scale score <4 combined with severe or moderate cardiac or pulmonary burden was considered unrealistic, as such comorbidities typically correlate with a higher frailty score. These implausible combinations were therefore excluded from the design of the choice experiment to ensure it would accurately reflect real-world decision situations.
Setting up choice experiment and sample size
A discrete choice experiment was designed with the selected criteria to explore the weight of each criterion in the decision whether to operate or not. Thirty hypothetical patient cases were created using efficient design technique in dedicated software (Ngene 1.2.1). The scenarios of the hypothetical patient cases were specified in terms of a different combination of values taken from the pre-specified decision criteria and their levels, avoiding implausible combinations. The cases were designed in such a way that the information that can be retrieved for estimating the weights is maximized.
The (pilot) choice experiment
Four experts conducted a pilot choice experiment. The analysis of the pilot choice experiment was used to generate the final choice experiment by carefully evaluating the adequacy of the division of answers between operate or do not operate. The result was 64 operate and 76 do not operate, which is considered a good balance. Pre-testing allows for the evaluation of various aspects of the design process, such as the selection and definition of criteria and their corresponding levels. 20
The final choice experiment, conducted by 21 experts, repeated the pilot choice experiment with a refined model and 30 new hypothetical patient cases.
Statistical analysis for determining importance weights
The choices from the expert group were assessed in a binary logistic regression model (Apollo package in R software) to estimate the weights of each criterion in the decision whether to operate on a hypothetical patient. Binary logistic regression resulted in a β-coefficient (the effect of a criterion) and its significance. The level of significance is informative of the degree to which an effect has been found in the sample and will generalize to the population from which the sample was drawn. The effect of a criterion (β-coefficient) was considered significant when p≤0.20. This differs from a traditional statistical approach in which the results of a small sample size are used to represent a large population and a p≤0.05 is considered significant. Since the purpose of our report is to understand the perspectives of the consulted experts and whether the sample itself effectively represents the population, a less conservative standard regarding the transferability to a hypothetical population is used.
The p-value was obtained by calculating the t ratio (the estimated weight divided by its standard error) and using a t ratio table.
McFadden’s ρ2 was used to estimate the model fit, which is one minus the ratio of the log likelihood of the predictive model and the log likelihood of the null model. A ρ2 between 0.2 and 0.4 is regarded as a strong model fit, considering the nature of the experiment with complex choice scenarios. 21
The RI is defined as the contribution of each criterion relative to the other criteria. To determine the RI of a criterion (e.g.; X1), we need to determine its absolute importance first. The absolute importance of a criterion is the range of that criterion (max—min) multiplied with its β-coefficient (effect). The RI of a criterion is then calculated as the ratio of its absolute importance and the total importance (summation of absolute importance of all criteria). For example, if a regression equation predicts outcome Y based on 2 factors (X1, X2) with ranges 60 and 30, respectively, as follows: Y=1.2X1+0.7X2. The absolute importances are X1=60×1.2=72 and X2=30×0.7=21, and the total importance is 72+21=93. Thus, the RI for each factor is X1=72/93=77.4% and X2=21/93=22.6%.
It is not possible to determine absolute cutoffs for “high” or “low” RI beforehand; thus, its interpretation all depends on the distribution between criteria.
Knockout criterion levels were not included in the regression model and therefore did not contribute to the RI.
Results
Experts reached consensus on 11 criteria of influence on their decision whether to perform surgery on an rAAA patient. The criteria were cardiopulmonary resuscitation, patient’s wish for operation, renal function, age, life expectancy, endovascular treatment options, hemoglobin blood concentration, mean arterial pressure, pulmonary burden, cardiac burden, and Clinical Frailty Scale.
Table 1 presents how the criteria were classified. The Global Initiative for Chronic Obstructive Lung Disease (GOLD) classification was considered for patients’ “Pulmonary burden,” as was the New York Heart Association (NYHA) Functional Classification for patients’ “Cardiac burden.” The Clinical Frailty Scale was determined according to the system of Rockwood et al, 22 in which a score of >5 means “moderately/severely frail”; 4 to 5 means “apparently vulnerable/mildly frail”; and <4 means everyone from “very fit” to “well, with treated comorbid disease.”
Relevant Criteria (Attributes) and Their Corresponding Levels That Influence the Decision Whether to Operate or Not.
Abbreviations: COPD, chronic obstructive pulmonary disease; eGFR, estimated glomerular filtration rate; NYHA Class: New York Heart Association Functional Classification; ROSC, return of spontaneous circulation.
Knockout criterion levels.
Or shortness of breath at rest or oxygen use at home.
Or signs of acute myocardial infarction or acute cardiac asthma.
The lowest level of 3 criteria were labeled as “knockout criterion levels”, meaning that they immediately result in no operation. Knockout criterion levels were cardiopulmonary resuscitation (yes, ongoing), patient’s wish for operation (do not operate), and renal function (dialysis).
Nine criteria—had a significant impact on the decision whether to operate or not: cardiac burden; age; life expectancy; clinical frailty; pulmonary burden; endovascular treatment options; patient’s wish for operation; cardiopulmonary resuscitation and renal function. Of all contributing criteria, mean arterial pressure (MAP) (RI, 2%) and hemoglobin concentration (RI, 1%) had the lowest impact on the model’s outcome (the decision whether to perform [endo]vascular surgery on a patient with rAAA and specific medical profile) (Table 2). McFadden’s ρ2 was 0.24, indicating a good model fit.
Weights and Relative Importances of Criteria.
Figure 2 illustrates the DSI. Criteria are color-coded, with red denoting a criterion negatively impacting the decision and green indicating a positive impact. Thus, a higher value for a red criterion decreases the probability that the experts would operate in that specific case, while a higher value for a green criterion increases it. The intensity of the green or red color depicts the RI of that criterion. The model also allows us to variate the input and to evaluate the influence on the prediction. The model’s output is expressed as the percentage of peers who would recommend surgery for the patient under consideration.

The decision-support instrument filled in with a hypothetical patient case. The color red denotes the negative impact on the decision whether to operate or not, and the color green indicates a positive impact. The intensity of the colors depicts the relative importance of that criterion.
Discussion
This study involved the development of a DSI by using codified interdisciplinary peer expertise to support decision-making for rAAA patients. We used a discrete choice experiment to construct a model that integrates the insights of multidisciplinary peer expertise. This instrument facilitates the computation of the proportion of medical specialists who would opt for surgical intervention for a patient, considering specific clinical variables as input criteria. It is crucial to underscore that although the model offers support, it does not supplant the clinical judgment of the treating physician or the medical team. Its primary aim is to capture the nuanced considerations inherent in decision-making processes within acute clinical setting, potentially assisting in the formulation of an informed, “expert-backed” decision.
We defined this model based on the existing literature and expert opinions. Specifically, we examined which patient factors the expert group and panel considered important in deciding to operate on a patient or not in an acute setting. The expected chance of survival of a patient is of great importance in this decision. However, by exclusively focusing on survival-critical aspects of patients’ wellbeing postoperatively, functional outcomes and quality of life might be overlooked.
Unlike traditional scoring models,7–9,23,24 this DSI acknowledges the ethical dimension of decision-making. In this context, the explicit inclusion of the patient’s wishes as a decision criterion represents an important and novel addition. This approach reflects current developments in medicine, where respecting autonomy and shared decision-making, even under time pressure, are increasingly emphasized.25,26
Nine of the 11 criteria demonstrated a significant impact on the decision to operate. Of these, the patient’s frailty, estimated life expectancy, and desire regarding the surgery are not included in previous scoring systems. Although in an urgent setting there seems not to be much time for extensive consultation with the patient and his/her relatives, the patient’s wishes should be discussed and taken into account before final decision-making. The model suggests that in these acute settings, vital parameters may carry less weight in the decision to operate than frailty. Frailty assessment is widely acknowledged as relevant for surgical outcomes and is regarded as a significant factor in medical decision-making. 27 In acute settings, it is challenging to assess frailty due to the severity of a patient’s illness and time constraints. Comprehensive geriatric assessments, which are necessary to fully capture the complexity of frailty, are unfeasible in these situations.
Since using this instrument will take no more than a few minutes, it encourages the medical team to actively consider a broad spectrum of factors that are usually not considered in an acute emergency setting. If treatment decisions must be made by a single clinician without time for full multidisciplinary discussion, this model may serve as a reflection of peer reasoning. Hence, it could become a valuable tool for optimizing acute medical decision-making and thereby also patient outcomes in rAAA treatment.
The use of artificial intelligence as a DSI in medicine is rapidly expanding. 28 During the COVID-19 pandemic, BAIT was used to develop a model that explicates the conditions that intensivists use to determine intensive care eligibility and the initiation of mechanical ventilation of COVID-19 patients. 29 Another study, in pediatric surgery, demonstrated that decision-support models can analyze the implicit weight of factors in the complex and critical decision for surgery or comfort care in patients with necrotizing enterocolitis. 30 This supports the concept that expert-based behavioral models can be valuable in complex care settings.
This study successfully developed a new decision-support model; however, it is important to acknowledge its limitations. Models used in emergency care should be easy to use and require only a limited number of parameters. As such, the clinical experts established consensus on 11 criteria for the decision-making process. Other pertinent factors might possibly have been overlooked or excluded from this model. Since no formal systematic review of the existing literature was conducted, some potentially important criteria may have been missed during the expert discussion rounds. Therefore, the model may inadvertently reflect the practice patterns and experiences of the 21 selected medical experts that work at an academic tertiary center. The panel was composed of vascular and oncological surgeons, anesthesiologists, intensivists, and geriatricians.
Although our sample size may seem modest in absolute terms, discrete choice experiments rely on the efficiency of their design rather than absolute numbers. We followed established guidelines for discrete choice experiment design, 31 ensuring that the number of choices per respondent exceeded the number of parameters to be estimated and that attribute levels varied independently. Each expert made 30 choices yielding over 600 observations in total. This is well above the 300-observation threshold at which parameter estimates typically stabilize. 31 Another limitation of this study is the absence of translation regarding whether the judgment of the current DSI is supported by real-life experience. Therefore, a prospective follow-up study is currently enrolling patients to validate the DSI using clinical outcome parameters, such as survival and quality-of-life measurements.
We used dialysis dependency as a proxy for advanced renal disease and high comorbidity (factors that limit physiological reserve and influence treatment decisions in emergency settings) making it a knockout criterion. We do recognize that stable dialysis patients may occasionally be suitable candidates for surgery. Future model versions could incorporate more granular renal-function metrics to better capture individual patient assessment.
The model was trained using responses to 30 fictional patient scenarios rather than real patient data. These computer-generated hypothetical scenarios are designed to intensify the consideration of the decision criteria and maximize the information obtained to optimize the model. However, they may not fully reflect the complexity and variability of real-world clinical situations.
Parameters should be fine-tuned and validated by an international panel after a prospective study shows how the outcome of this model corresponds to the postoperative course.
Further advancement of this model should focus on conducting a prospective multicenter study to assess its alignment with the post-operative course and to objectively assess which attributes are important for the post-operative course. This evaluation will enable us to establish a pertinent threshold for the proportion of peers advocating for surgical intervention.
Conclusion
A clinical decision-support model was developed based on codified multidisciplinary peer expertise to support real-life medical decision-making during acute treatment planning for rAAA. Three parameters that are underreported in previous scoring systems (patient’s frailty, estimated life expectancy, and patient’s desire to be operated) were considered most important for the treatment decision.
Footnotes
Acknowledgements
“MDSI rAAA expert group” members are Ignace F. J. Tielliu, Wim J. W. Drouven, Ben R. Saleem, Clark J. A. M. Zeebregts, Maarten J. van der Laan, Marjolein Leemkuil: Department of Surgery, Division of Vascular Surgery, University Medical Center Groningen, University of Groningen, The Netherlands; Suzanne Festen, Pauline de Graeff: Department of Internal Medicine, Division of Geriatric Medicine, University Medical Center Groningen, University of Groningen, The Netherlands; Götz J. K. G Wietasch, Ernesto R. R. Muskiet, Bart J. Lichtenbelt, Gertrude J. Nieuwenhuijs-Moeke, Jaap J. Vos: Department of Anesthesiology, University Medical Center Groningen, University of Groningen, The Netherlands; and Maarten W. N. Nijsten, Jaap E. Tulleken, Matijs van Meurs: Department of Critical Care, University Medical Center Groningen, University of Groningen, The Netherlands.
Ethical Considerations
Ethical approval was not required for this study.
Consent to Participate
Not applicable.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
