Abstract

Previous studies used billing code algorithms to identify peripheral artery disease (PAD) within electronic health record (EHR) data with variable performance.1,2 Owing to the unique pathophysiology of PAD in chronic kidney disease (CKD),3,4 algorithms may perform differently in this population. We sought to develop an algorithm that can identify PAD in individuals with and without CKD within an EHR using easily ascertained discrete data elements.
This project was approved by a Stanford University Institutional Review Board, and the requirement for informed consent was waived. We used data from the Stanford Medicine Research Repository mapped to the Observational Medical Outcomes Partnership Common Data Model. We identified patients aged ≥ 18 years with one or more of 375 PAD-related International Classification of Diseases, 10th revision (ICD-10) billing codes from January 2020 through August 2022. 1 We required ≥ 1 year of observation prior to the first PAD code during this time frame to ensure that we could ascertain comorbidities. We had a pool of 17,625 patients meeting these criteria. To detect 75% sensitivity and 60% specificity, assuming 80% power and 5% type I error, required 107 and 214 patients with and without CKD, respectively. As we planned to perform manual chart review (see below) as the gold standard, we randomly selected a sample of 1200 patients to ensure more than enough statistical power and an adequate prevalence of PAD. We split the sample into 1000 patients for the training cohort and 200 patients for the test cohort. We excluded PAD codes used in fewer than three patients, leaving 88 codes. Eight patients with none of these codes were excluded from the training cohort.
True PAD status was determined by a standardized seven-step chart review process that searched all available information in the EHR through August 2022 for results reported from: (1) ankle–brachial index (ABI) testing (a noncompressible ABI alone was not considered diagnostic of PAD); (2) vascular ultrasound; (3) computed tomography (CT) angiography; (4) invasive angiography; (5) lower-extremity revascularization; (6) amputation due to PAD; and (7) if none of these tests was available, the EHR was searched for any potential PAD terms. Two physicians (TIC, GRP) reviewed the charts and disagreements were resolved by a vascular specialist (ER). Inter-rater reliability was excellent (weighted kappa 0.78, 95% CI 0.48–1.0).
We ascertained age at the time of the first PAD code, and demographics and comorbidities prior to the first PAD code, including CKD defined as two CKD ICD-10 codes > 1 day apart, two kidney failure codes > 30 days apart, or two outpatient laboratory results showing an estimated glomerular filtration rate < 60 mL/min/1.73 m2 or albuminuria. 5 We included four PAD-related indicator variables: (1) lower-extremity revascularization identified using ICD and Current Procedural Terminology (CPT) codes, 1 which had an accuracy of 90% when compared with chart review in the training cohort; (2) diagnostic testing (e.g., ABI, vascular ultrasound, angiography); (3) specialist encounters (e.g., vascular surgery, wound care); and (4) two or more PAD-related encounters. 1
Owing to multicollinearity (variation inflation factor > 5), we removed codes L97.423, L97.424, and L97.412. We applied random forest methods 6 and selected the top 10th percentile of variables with the highest permutation-based importance, forcing the inclusion of age and diabetes given their strong associations with PAD. 7 We built a logistic regression model with the selected variables and evaluated sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy. We selected a 23% threshold probability to maximize Youden’s index, ensuring specificity > sensitivity. 8 We performed two validations: (1) using 1000 bootstrapped samples from the training cohort (internal validation), 9 and (2) using the test cohort (split-sample). Analyses were conducted in Python and RStudio, version 2023.12.1.402.
Of 992 patients in the training cohort, 497 (50.1%) had CKD. Patients with CKD were older (mean age 76 years) and had a higher prevalence of comorbidities, including diabetes (54.8% vs 34.7%) and coronary artery disease (50.3% vs 30.1%), than patients without CKD (mean age 71 years). Overall, 222 patients (22.4%) had true PAD, 130 (58.5%) with and 92 (41.5%) without CKD. The Supplemental Table shows the variables included in the algorithm. The sensitivity, specificity, NPV, and accuracy were high overall (> 78%) and the PPV was modest regardless of CKD status (Figure 1). The prevalence of PAD in the test cohort was 39%, and performed similarly: sensitivity 75%, specificity 72%, PPV 66%, NPV 88%, and accuracy 77%.

Performance measures for the ascertainment of the peripheral artery disease diagnostic algorithm in the training cohort, overall and stratified by CKD status.
Our study is among the few EHR algorithms developed to identify PAD that evaluates performance in CKD, where accurate identification may be more challenging due to factors such as arterial stiffness and calcification. 10 The variables included in our algorithm can be readily obtained from the EHR without the need for complex computing. As with all predictive models, we cannot explain why certain variables, such as the distinction between right versus left foot ulcers, are included in the algorithm. We can confirm that these variables, when considered together, provided the highest accuracy in predicting PAD. Therefore, our algorithm should be applied similarly in future studies aimed at identifying patients with PAD using an EHR. Our chart review incorporated information from a wide variety of source documents to offer a more accurate, true PAD diagnosis than prior studies. 2 We used bootstrapping and split-sample testing to validate our findings, but external validation would be useful in the future. The performance of our model aligns closely with a prior study, which used a combination of ICD-9 and ICD-10 codes in a Duke University cohort. 1 The consistency in performance between studies provides some evidence for external validity. Notably, our study exclusively used ICD-10 codes to reflect the current coding system, extending the relevance of our findings to contemporary practice.
In summary, we developed an algorithm to identify PAD from the EHR that is easily adaptable to most EHR environments. Our algorithm may help with the implementation of quality improvement initiatives or population health-based care models that seek to improve outcomes for patients with PAD on a large scale, as they require the ability to easily and accurately identify affected patients. The development of improved methods using large language models for identifying PAD, particularly in understudied populations such as patients with CKD, hold great promise.
Supplemental Material
sj-pdf-1-vmj-10.1177_1358863X251322182 – Supplemental material for Identifying peripheral artery disease in persons with and without chronic kidney disease from electronic health records
Supplemental material, sj-pdf-1-vmj-10.1177_1358863X251322182 for Identifying peripheral artery disease in persons with and without chronic kidney disease from electronic health records by Georgia R. Parsons, Gomathy Parvathinathan, Ali Etemadi, Sai Liu, Elsie Ross, W Schuyler Jones, Margaret R Stedman and Tara I Chang in Vascular Medicine
Footnotes
Acknowledgements
All data were recorded in the Stanford REDCap platform (
), which is developed and operated by Stanford Medicine Research IT team. The REDCap platform services at Stanford are subsidized by (a) Stanford School of Medicine Research Office, and (b) the National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health, through grant UL1 TR001085.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article. W Schuyler Jones is supported by research grants (to institution) from Bayer, Boehringer Ingelheim, Bristol Myers Squibb, Merck, NIH, Novartis, and the Patient-Centered Outcomes Research Institute (PCORI). Tara I Chang serves as a consultant for AstraZeneca, Bayer, Novo Nordisk, ProKidney, Alexion, Tricida, and the George Clinical Institute; and she receives salary support from CSL Behring through funds paid directly to Stanford University. The remaining authors have no conflicting interests.
Funding
Research reported in this publication was supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health (NIH) under award number R01HL151351 to Tara I Chang. The content is solely the responsibility of the authors and does not necessarily represent the official view of the NIH.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
