Abstract
Keywords
Introduction
Predicting complications following endovascular abdominal aortic aneurysm repair (EVAR) is an important clinical and research topic in vascular surgery. Endoleak, graft migration, limb thrombosis and rupture following EVAR culminates in a higher rate of reintervention.1–3 The occurrence of such complications are associated with increased morbidity, medical cost, and mortality. Therefore, regular surveillance imaging is necessary to detect these complications after EVAR. 4 However, regular surveillance with CT scanning is associated with additional radiation exposure which may increase cancer-related mortality.1,2 Identifying high-risk patients who should undergo more frequent surveillance, and low-risk patients who require less, would help to improve patient compliance and hopefully clinical outcomes.
Artificial Intelligence (AI) is revolutionizing healthcare research and clinical practice in many areas, including medical image analysis, drug discovery, personalized medicine, medical diagnosis, EHR analysis, and wearable technology. It has the potential to improve diagnoses, enable personalized treatments, and speed up drug discovery, leading to better patient outcomes.5–8 Several studies have reported the use of AI techniques to predict EVAR complications but the results were not promising.9–11 The limitations of the these studies are related to their retrospective nature, small sample size, lack of external validation, and reliance on pre-defined features. These limitations could have affected the accuracy and generalizability of the models. Additionally, the models were not able to predict individual patient outcomes, but only classified patients into low or high-risk groups. This appears to reflect the difficulties of applying AI algorithms to vascular surgery databases which are by nature imbalanced (complication rates are low) and with skewed imbalanced data distribution.12,13 In this study, we wished to leverage more updated AI techniques to overcome these difficulties and build a novel post EVAR complication prediction model with high sensitivity and negative predictive values by using a neural network model that can automatically extract data with minimal human annotation. To address difficulties with imbalanced data in the databases, we used data downsampling and augmentation instead of increasing the dataset size, which is not clinically feasible. This approach generated a more representative dataset of post-operative complications following EVAR.
The primary aim of this study was to develop an artificial intelligence model to predict the complication probability of individual patients and better identify those needing more intensive post-EVAR surveillance.
Methods
Patients
Electronic medical records (EMR) were used to identify patients who underwent elective EVAR at Covenant Medical Center in Saginaw, MI, from January 1, 2010 to December 31, 2020. Patients who underwent EVAR for ruptured AAA, secondary EVAR after prior endovascular repair, or attempted EVAR and conversion to open repair were excluded (Figure 1). Pre-operative CT angiography scans of the abdomen/pelvis with 3D reconstruction imaging were downloaded from the radiology department servers. The CT axial images were collected and used by the PACS software to generate the 3D reconstruction images. These multiple view 3D images from the PACS server, without any additional post-processing or sizing measurements, were exported in the jpeg format. After de-identification they were uploaded to Google Colab (https://colab.research.google.com/) and treated as the input data for the AI model (Figure 2). 3D CT reconstruction images were chosen for unstructured analysis as they focus only on the morphology of the aorta and exclude other internal organs and musculoskeletal structures present on the raw axial/sagittal CT slices.
5
Flowchart of patient selection. Imaging Process. A. Raw CT images in the axial planes are used to develop the 3- Dimensional reconstruction images (B) by the PACS software. C. Multiple views of the. 3D images were exported using the JPEG format for AI model development.

The medical records were reviewed for patient demographics, pre-operative and postoperative imaging reports, and clinical care. Clinical data variables included initial AAA size, age, gender, comorbidities, smoking status, complications, and reinterventions. Comorbidities included hypertension (HTN), diabetes mellitus (DM), coronary artery disease (CAD) or renal failure. Renal failure was defined as a glomerular filtration rate (GFR) of less than 15 mL/min. Complications included any type of endoleak, graft migration, AAA rupture, graft limb occlusion, renal artery occlusion, neck dilation, graft infection, pelvic ischemia, and stent strut/barb fracture. All clinical data was de-identified. Analysis for descriptive statistics of clinical data was performed using Microsoft Excel.
The study was reviewed and approved by the hospital Institutional Review Board (IRB). As the data was retrospective and anonymized, informed consent was not required.
Model Building
The patients were divided into two groups depending on the development of post-operative complications. The cases were assigned to a training or testing dataset. We chose a training: testing ratio of 5:1 so that the testing dataset would have sufficient cases to test model performance. The positive: negative ratio in each dataset was set to 1:4.7 to maintain the ratio of the whole database. Therefore, a total of 40 positive and 189 negative cases were randomly selected to form the training dataset to train the AI model, while the remaining 8 positive and 36 negative cases later tested the performance of the model. The training and testing datasets were completely independent of each other and only images that had not been seen in the training process were used in the testing phase. 6
State-of-the-art deep learning techniques were used to establish a prediction model, VascAI©, using the training dataset.
14
Specifically, we employed a multilayer convolutional neural network as the model backbone.
5
Convolutional neural networks are often applied to image processing and enables detection, segmentation, and recognition of objects and regions within images. It is composed of a multilayered network and mathematical model simulation which processes the input image and produces a filtered representation of the original image. This process is repeated in each layer. The model employed a stack of three convolution max pooling layers, followed by a nonlinear layer and then a linear fully connected layer. Overfitting was alleviated with data augmentation (see below). The inputs of the model were the pre-operative 3D reconstruction images of each patient while the output was the probability that the patient would have complications after EVAR. Because each patient had multiple images, all images were entered into the model to get a probability output for each image and then majority vote (see below) was performed to decide the final prediction result for each patient. The model (Figure 3) was built with TensorFlow (www.tensorflow.org) software and run on the Google Cloud platform (Google LLC; Mountain View, Calif). Flowchart of model building. During the training period, data down-sampling and. augmentation was utilized on the patient images and then, along with information on the later development of complications, was entered into a multi-layer neural network. New images of different patients tested the trained model to provide a prediction of later complications.
Data Down-Sampling and Augmentation
As complications are infrequent compared to uneventful cases, there is an imbalance phenomenon in the collected data where negative examples dominate the sample population. If the model is trained with just the original data, the skewed imbalanced data distribution would force the model to predict most of the cases to be negative in order to attain a high overall accuracy rate. However, this would be both undesirable and not clinically useful as in clinical care it is more desirable not to miss any positive case at the cost of an increased false positive rate. Therefore, down-sampling 6 , 8 was applied on the negative class, in which only a part of the 7 negative cases was randomly selected from the whole set, such that the positive and negative classes had similar numbers.
While down-sampling can alleviate the imbalance issue, it might also negatively impact the model sensitivity and negative predictive values due to the loss of data. To compensate for this, a data augmentation technique was employed to re-enlarge the dataset size (Figure 4). The augmentation included the following operations to the images - rotating, cropping, color jittering, blurring and resizing. While the processed images looked visually similar to the original ones, they were quite different in terms of representation and were considered new data by the model. This enlarged the dataset four-fold. Since data augmentation was applied simultaneously to both positive and negative cases, it did not reintroduce an imbalance issue. Receiver Operating Characteristics (ROC) curves and Area Under the Curve (AUC) calculations were used for performance measurement. A majority vote strategy was employed to make performance most robust (Figure 5). Specifically, each unstructured image was inputted for individual outcome predictions. The prediction was considered positive for a complication outcome for a patient if the output probability was larger than .45 which had been found have the best performance in the testing phase. The final prediction for each patient was the majority count, positive or negative, of all the inputted images. If the positive count was equal to the negative count the final prediction was considered positive. An example of data augmentation. The original image, in the center, was used as a template from which new images were developed through software by rotation, cropping, discoloration, angulation and blurring. These additional images were then uploaded into the model. Illustration of majority vote strategy used to define individual patient final prediction utilizing multiple images.

Results
Patient Population and Demographic Data
Patient Demographic Data.
EVAR and Follow-up Data
Follow-up Data.
Complications.
Model Performance
Within the testing database of 44 cases, the model correctly identified all 8 patients who subsequently had complications. Of the 36 negative cases, 16 were correctly predicted as negative while 20 were incorrectly predicted to be positive and have complications. Therefore, the sensitivity was 100%, specificity was 44%, negative predictive value was 100%, positive predictive value 29% and overall accuracy 55% for this model.
The ROC curves and AUC for the training data, with and without data augmentation, are both the same with an AUC of 1.0. This indicates not that this is a perfect model but rather that there is overfitting during training. The model memorizes the details of each training case and afterwards makes a false perfect training ROC curve. Additionally, without data down sampling, the model will always predict all the cases to be positive, indicating that it cannot learn anything meaningful due to the imbalance of data. Therefore, the ROC curve and AUC score in the testing, rather than the training, dataset is a better reflection of how the model performs. The ROC curves and AUC for the testing data set, without data augmentation and using individual image prediction, was associated with an AUC of .53 (Figure 6A). Employing majority vote increased the AUC to .57 while data augmentation with individual image prediction improved the AUC to .58 (Figure 6B). Combining both data augmentation and majority vote improved model performance and increased the AUC to .62 (Figure 6C).
15
Receiver Operating Characteristics (ROC) curves and Area Under the Curve (AUC) calculations used for performance measurement. A. Testing data set ROC curve without data augmentation and using individual image prediction was associated with an AUC of .53. B. Testing data set ROC curve with data augmentation and using individual image prediction was associated with an AUC of .58. C. Final model testing data set ROC curve employing both data augmentation and majority vote improved model performance with an AUC of .62.
Discussion
Prior AI Models and Limitations
Previous studies using AI for EVAR.
End-To-End Training to Replace Hand-Crafted Features
Prior models relied on expert hand-crafted features which has the potential for subjective bias and variability. Hand-crafted features refer to parameters (such as size and angles in the case of AAA) which are measured on images by human experts or having parameters defined by human experts and then such measurements are calculated by software within an AI model. 9 Charalambous 11 used radiomics analysis, extracting quantitative features from medical images using data characterization algorithms, to predict aneurysm sac expansion in patients who developed type II endoleak after EVAR. These methods took as input the hand-crafted features annotated by domain experts9,11 which have variability and subjective components that can influence and jeopardize model performance. Our VascAI© model differs from others because it downloaded patient imaging studies directly without human annotation of aneurysm morphological features. Our model features end-to-end training which is a methodology that uses images as the only input without any human or software directed measurements or calculations, in which features were automatically learned and selected from the downloaded data. Such a training pipeline has been widely adopted in 10 other areas and successfully implemented in other applications of AI. 14 The positive results attained with our model in this study suggests that it can be useful in other areas of vascular surgery as well.
Sensitivity has Been Neglected
Previous AI studies have also tended to focus on improving the overall accuracy of the proposed models which neglects to recognize that positive cases are of much higher clinical priority. For example, Kordzadeh A, et al 10 reported the accuracy of their EVAR complication model to be greater than 86%. However, in the analysis of type I and type III endoleaks, the model predicted all the cases to be negative while for type III endoleaks 30 out of 32 were predicted negative (both predicted positive cases were incorrect). Their model had a tendency to predict most cases as negative, consistent with the low overall incidence of endoleaks. However, in clinical practice, we desire to identify the high-risk patients to avoid missing life-threatening complications instead of pursuing a high overall accuracy. In other words, enhancing the sensitivity of the model to find all positive cases should be of higher priority, which has been largely neglected in prior studies.
Data imbalance is probably the likely contributing factor responsible for the low sensitivity of prior models. Like others, our complication rate was 17.6%.1–3 With such a low complication rate, predicting most cases as negative induces data-driven AI models to maximize accuracy. We addressed this issue by down-sampling the negative class, such that the positive and negative case numbers were even.
While down-sampling can alleviate the imbalance issue, it would also negatively impact the overall model performance due to the loss of data. To compensate for this, we applied data 11 augmentation techniques to the down-sampled data and thus enlarge the dataset size. After applying the data down-sampling and augmentation techniques, the sensitivity increased to 100% with a specificity of 44%. This resulted in successfully identifying all the complications although at the cost of 20/28 = 71% false positives. Ideally, we want to achieve both high sensitivity and specificity and the associated overall accuracy. However, as the sensitivity increases the specificity decreases. We chose to maximize the sensitivity of the model as the clinical priority is not to miss any life-threatening complications even though this increases the rate of false positives and decreases overall accuracy. As a practical clinical application, such results would not lead to unnecessary enhanced surveillance but rather more robust efforts to enforce present recommended surveillance which is disturbingly low in patients even in the first year. However, the negative predictive value of 100% is excellent and means that all the predicted negative cases were correct. This model therefore successfully identifies the low-risk group of patients who are not at risk of complications and therefore can safely undergo less frequent surveillance. Under the current surveillance protocol, all 44 patients in the testing dataset would require regular surveillance, either through ultrasound or CT scans. However, if our model were to be applied in clinical practice, 16 of these patients (36%) could be recommended for long-interval surveillance due to the model predicting a low risk of complications (true negatives). This could potentially save 36% of patients from unnecessary post-operative surveillance, leading to improved patient compliance and reduced radiation exposure.
Continuing Challenges for AI in Vascular Disease
Data Rarity
There were 273 clinical cases in our dataset with 8,859 CT images and thus an average of 33 images for each patient. Compared to AI prediction models in other fields, which regularly contain millions of images, 7 the rarity of data hinders the model from chieving a higher-level performance. A multicenter study incorporating many more patients and images would make the modeling more robust and address this issue. Additional patients would also allow the inclusion of other clinical comorbidities and risk factors, such as atrial fibrillation and prior deep venous thrombosis 12 or pulmonary embolus, that might require the use of anticoagulation which may influence the risk of Type II endoleaks.
The majority of complications in our study, as is well known, were type II endoleaks which are mostly benign. More relevant, especially pre-operatively, is the prediction of type I endoleaks which are more life-threatening but also rarer. The sample size for a specific type I endoleak prediction study is estimated to require a 10 times larger number of patients than the current study. A larger database of cases may allow the possibility for specific complication predictability in the future. After inputting a much larger set of images, a deep dive into the neural network layers can be performed to identify which features are most highly related to the complication rate and specific complications. Such information can be very useful in clinical decisions.
Heterogeneity of Data
AI algorithms require a large amount of data from different clinical environments to train effective and robust models that can provide generalizable predictions. However, imaging data which are generated from different sources tend to be very heterogeneous. Image data collected from hospital systems using different machines and stored in different formats will have a large amount of heterogeneity in terms of quality, resolution and accessibility. Clinical data collected from electronic medical record (EMR) systems are similarly heterogeneous with different follow-up time periods and re-intervention criteria. Data collection and standardization to produce a widely-applicable AI prediction model will be challenging.
Data Storage and Handling
For it to be clinically useful, the prediction model and associated data has to be readily accessible and stored on a computer hard drive or online platform. As more patients are entered and images are collected how to store and handle increasingly large data sets can become challenging.38,39 In addition, protecting privacy and security during large data processing needs to be considered.
Knowledge Gap Between Vascular Surgeons and AI Scientists
The ideal team for AI research in vascular surgery should include vascular surgeons and AI scientists. 16 AI scientists generally do not have the domain knowledge in vascular surgery to identify the important problems in this field while vascular surgeons rarely have the training to use available AI tools. Therefore, bridging this professional gap will be an important key in the future development of AI in vascular disease management. Vascular surgeons should play a leading role in the team by knowing what are the important clinical research questions and interpreting/optimizing AI models from a clinical perspective.
Future Directions
Convolutional neural networks and AI algorithms can be rather abstract because the AI evaluates all the images with no external specifications on what anatomic features to look for and itself not subsequently specifying what variables it has used to conclude the likelihood of a later complication. For clinicians, this “black box” processing may be challenging to accept. This project is just the beginning for the successful use of AI in vascular surgery. In the future, we should be able to explore more options in aortic disease-related areas. First of all, model performance can be further improved by incorporating patient related factors, such as the magnitude of systemic atherosclerotic disease, hypertension, anti-platelet and anti-coagulant usage along with utilizing intraoperative and postoperative images. Secondly, additional related predictors can be combined and added to the model such as automated image 14 measurements or expert-measured image features along with other clinical characteristics and comorbidities. This additional data will provide more information to the model and thus likely improve model performance. Thirdly, a deep dive in the neural network layer can subsequently be performed to identify which features are most highly related with the complication rate. The data can also incorporate specific endografts and their implanted characteristics and conformability to assist in pre-operative planning. 40 Lastly, these prediction models can be used more broadly in other areas of peripheral arterial and carotid artery disease.
Conclusion
Modern end-to-end AI models offer the potential to effectively predict complications after EVAR with data down-sampling and augmentation to address data imbalance and rarity limitations. The VascAI© model developed in this study suggests 100% sensitivity for postoperative complications. It has the potential to assess patients based on their pre-operative anatomy and categorize them on their risk for later post-operative complications. Although not yet prospectively validated, our results would suggest that adherence to postoperative surveillance follow-up can be guided and optimized based on such results.
Footnotes
Acknowledgments
We wish to recognize Ronald A. Bays MD and Ryan J. Kim MD for their clinical contributions represented in this report.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: BL and JB have a patent on the VascAI© software program.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
