Abstract
Billing data including International Classification of Diseases (ICD) codes are increasingly used to identify cohorts of patients with peripheral artery disease (PAD) in electronic health records (EHRs) and administrative claims databases (ACDs). However, the validity of common PAD phenotyping approaches is a central challenge to the utilization of EHR and ACD data. We present a scoping review of contemporary PAD observational studies to describe the electronic phenotyping strategies employed in PAD identification and propose recommendations for improvement. We searched two databases, MEDLINE and Web of Science, identifying a total of 748 articles that underwent title and abstract review. Of these articles, 163 met the criteria for full-text review, with 84 articles ultimately included in the study. We demonstrate that 19.0% of eligible studies utilized ICD, Ninth Revision (ICD-9) codes, 11.9% utilized ICD, Tenth Revision (ICD-10) codes, and 69.0% of studies utilized a combination of ICD-9 and ICD-10 codes in their electronic phenotyping methodology. Of the included studies, 76.2% utilized a single-code query approach for electronic phenotyping despite low diagnostic yield, and 21.4% utilized rule-based methods. Only five studies utilized logistic regression modeling, despite the demonstrated effectiveness of this method. The current study demonstrates high utilization of unreliable electronic phenotyping methods such as single-code-based queries, which severely limits research quality. Improvements in electronic phenotyping methods are necessary to leverage data from EHRs and ACDs for high-quality research.
Keywords
Introduction
Peripheral artery disease (PAD) is a common disease affecting over 230 million people globally with significant morbidity, mortality, and decreased quality of life associated with disease progression. 1 Despite its widespread prevalence and risk for adverse outcomes, it remains underappreciated compared with other atherosclerotic disease processes such as coronary artery disease and cerebrovascular disease.2,3 To address the systematic understudying of PAD, the American Heart Association has published PAD-related gaps in research, clinical practice, and implementation to encourage cross-collaboration between researchers, clinicians, and government agencies to increase the awareness and understanding of this disease. 3
One way to enhance research in PAD is to leverage the investigative potential of administrative claims databases (ACDs) and electronic health records (EHRs). These data sources provide low-cost and readily available clinical information on small or large populations, which has resulted in their widespread application in studies of epidemiology, quality improvement, pharmacovigilance, clinical effectiveness, and clinical trial recruitment. 4 However, because data are coded into EHRs and ACDs in heterogeneous, incomplete, and complex ways, it can be exceptionally challenging to accurately identify cohorts of patients with PAD via a process called electronic phenotyping.
Traditionally, the identification of phenotypes of interest in EHRs has relied on rule-based approaches where clinical experts use structured data such as laboratory values, imaging reports, and medication data to create inclusion and exclusion criteria often based on consensus guidelines. Though a multimodal rule-based approach is achievable within an EHR, most ACDs do not routinely collect granular clinical information, and phenotyping relies solely on billing data such as International Classification of Diseases (ICD) and Current Procedural Terminology (CPT) codes.
PAD-associated procedural codes have demonstrated high diagnostic accuracy, yet PAD-associated diagnosis codes have repeatedly demonstrated inadequate sensitivity to detect PAD phenotypes in both ACDs and EHRs.5 –9 Additionally, there are hundreds of PAD-associated ICD diagnosis codes (9th and 10th Revisions, ICD-9 and ICD-10, respectively), diagnosis codes in the United States with little consensus on which codes should be utilized for reliable electronic phenotyping. This uncertainty is compounded by the observation that many contemporary studies utilize single-code queries, attributing a PAD diagnosis by the prevalence of one PAD-related code for PAD cohort selection. However, the literature demonstrates that rule-based electronic phenotyping approaches that combine data show superior performance to single-code queries. 10
Overall, this presents a concerning quality problem for PAD research that utilizes ICD codes to identify PAD cohorts, as PAD cohorts must reflect patients with genuine disease diagnoses for meaningful conclusions and population generalizations to be made. To date, there has been no attempt to map the landscape of PAD electronic phenotyping methods employed in observational studies. To better characterize the extent of this informatics problem, the current review aims to describe the electronic phenotyping strategies most commonly used to identify PAD cohorts in observational research studies that utilize ICD codes and offer possible solutions.
Methods
Study selection
Study selection followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) guidelines. 11 For this review, studies were collected from MEDLINE and Web of Science. A search strategy was constructed and refined in collaboration with an institutional librarian. The search terms included combinations of key words and MeSH terms related to PAD, ICD, and observational cohort studies combined using Boolean operators. The full search strategy is detailed in Supplemental Table S1. Articles were eligible for inclusion if they were observational studies that identified a cohort of patients with PAD utilizing ICD codes. Only studies that extracted a cohort specifically for PAD without including additional comorbid conditions were included. To focus on cohorts built with ICD codes, articles utilizing predominately patient-level clinical information or non-ICD billing codes (ex. excluding only utilized CPT codes) were excluded. Articles were included if they were published between January 1, 2010 and April 30, 2024, to overlap the US transition from the ICD-9 to the ICD-10 coding system that occurred in 2015. Only studies conducted in the US were included given the country-specific differences in coding systems. Articles that were not written in English were excluded.
Titles and abstracts of articles from MEDLINE were reviewed by two independent authors (AAS and AB). If disagreements arose, a discussion between the two authors occurred to achieve consensus. An additional search of Web of Science was conducted by AAS and no studies were added from this search. Articles that satisfied the inclusion criteria underwent independent review by two authors (AB, JC, DM, NK, MM, BBa). If disagreements arose, a third independent reviewer resolved disagreements (AAS). For articles that underwent full-text review, data extraction in the domains of title, authors, data source, ICD coding system, and electronic phenotyping method was conducted.
The electronic phenotyping methods were categorized as ‘single-code’ or ‘rule-based.’ The ‘single-code’ method was defined as utilization of designated PAD ICD codes as diagnostic criteria and ascertainment of PAD status based on the presence of only one of these codes. The ‘rule-based’ method was defined as utilization of a set of diagnostic rules, including the presence of designated PAD ICD codes in addition to other diagnostic criteria defined by the study authors, including non-ICD billing codes, patient-level clinical information, and expert evaluation. A subset of ‘rule-based’ methodologies that operationalized regression modeling was further identified, and modified Standards for Reporting Diagnostic Accuracy (STARD) criteria in the domains of study design, eligibility criteria, test methods, analysis, and results were employed to evaluate the construction of these regression models based on diagnostic accuracy. 12
Results
The initial search strategy yielded 748 studies for review. After title and abstract screening, 163 studies were ultimately selected for full manuscript review. A total of 68 studies were further excluded because of incorrect study methodology including no use of ICD codes, not an observational study, or a non-PAD cohort. An additional 11 studies were further excluded due to the inability to retrieve full text. Ultimately, 84 studies were included in the comprehensive review (Figure 1). Overall, 19.0% (n = 16) of eligible studies used ICD-9 codes; 11.9% used ICD-10 codes (n = 9); and 69.0% used a combination of both ICD-9 and ICD-10 codes (n = 58) to build their PAD cohorts. The most common codes were those related to atherosclerosis (ICD-9 440.x, ICD-10 I70.x); however, there was wide variability in the specific codes chosen to identify PAD phenotypes. Looking at databases, the most used databases included claims data from the Center of Medicare and Medicaid (n = 28) and the Healthcare Cost and Utilization Project’s National Inpatient Sample (n = 19) (Figures 2–4).

Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) flow diagram for article inclusion.

Top 30 most common ICD-9 codes by proportion of studies using ICD-9 code.

Top 30 most common ICD-10 codes by proportion of studies using ICD-10 code.

Frequency of database use across studies.
Next, we evaluated electronic phenotyping methods utilized in each study – a summary of electronic phenotyping methods is provided in Figure 5. Overall, 76.2% of studies (n = 64) employed a single-code query, where having at least one PAD-related ICD-9 or ICD-10 code equated to a positive PAD phenotype. The next most common diagnostic methodology comprising 21.4% of studies (n = 18) was a rule-based methodology that combined multiple elements of structured and/or unstructured data to identify PAD phenotypes. Of these rule-based studies, 27.8% (n = 5) utilized a logistic regression model as their electronic phenotyping method. Two studies did not specify an electronic phenotyping methodology. Study details are summarized in Table S2.

Overview of electronic phenotyping methods.
For all studies utilizing a rule-based methodology, rules were constructed based on the presence of PAD-related ICD diagnosis codes in addition to other structured data found in the individual study’s chosen database. Every study operationalized a different set of rules, but common elements included the presence of more than one PAD-related ICD diagnosis code, specification of ICD code position, additions of PAD-related procedure code data such as CPT codes, the presence of ankle–brachial index (ABI) data and/or imaging data demonstrating PAD, and visitation with a vascular expert (Table 1).
Rules operationalized in studies utilizing rule-based methodology.
ABI, ankle–brachial index; CC, complication or comorbidity; CLTI, chronic limb-threatening ischemia; CM, Clinical Modification; CMS, Centers for Medicare and Medicaid Services; CPT, Current Procedural Terminology; DRG, diagnosis-related group; ICD-9, International Classification of Diseases, Ninth Revision; ICD-10, International Classification of Diseases, Tenth Revision; MCC, major complications/comorbidities; NOS, not otherwise specified; PAD, peripheral artery disease; PRV, peripheral revascularization; PVI, peripheral vascular intervention.
Efforts to validate the selected electronic phenotyping methods varied considerably, with most studies not undertaking validation of their chosen approach. Fanaroff et al. conducted a study using a single-code approach to identify patients who had undergone PAD-related major or minor amputations. 13 They then performed a sensitivity analysis, applying a rule-based methodology to define a ‘stricter’ cohort of PAD-related major amputations, and found no significant difference between this ‘stricter’ cohort and their primary cohort. Sussman et al. operationalized a rule-based phenotyping method to identify patients with PAD, then validated this cohort with a manual chart adjudication of EHRs. 14 Eighteen of the studies referenced another source as the basis for their approach to selecting ICD codes in a single-code method or establishing rules in a rule-based method; however, these studies did not explicitly state whether the referenced method had been validated. Two studies utilizing a rule-based method explicitly noted that their chosen rule-based algorithm had been adapted from a previously published and validated algorithm. The five studies utilizing a logistic regression utilized models that had been previously published and validated (Table 2).
Studies addressing construct validity of patient identification methods.
ACD, administrative claims databases; CMS, Centers for Medicare and Medicaid Services; DEDUCE, Duke Enterprise Data Unified Content Explorer; DUHS, Duke University Health System; EHR, electronic health records; HCUP, Healthcare Cost and Utilization Project; ICD-9, International Classification of Diseases, Ninth Revision; ICD-10, International Classification of Diseases, Tenth Revision; NASS, National Ambulatory Surgery Sample; NIS, Nationwide Inpatient Sample; NRD, Nationwide Readmission Database; PAD, peripheral artery disease; REP, Rochester Epidemiology Project; SPARCS, Statewide Planning and Research Cooperative System.
Examining the studies utilizing a logistic regression model as their rule-based method, two models served as the basis for all five studies: a model published by Fan et al. 6 and a model published by Weissler et al. 15 A modified STARD criteria was utilized to evaluate the original model construction studies for these two models with the result detailed in Table S3. Both models utilized retrospective patient data extracted from large institutional databases with clearly defined inclusion and exclusion criteria for patients with PAD included in model-building cohorts. The Fan model utilized ABI measurements as the test reference standard, and the Weissler model utilized ABI measurements, a history of prior revascularization, or evidence of lower-extremity amputation for an indication of PAD as the test reference standard. Both models clearly reported standard model performance metrics to report diagnostic accuracy including area under the receiver operating curve (AUROC), sensitivity, specificity, positive predictive value, and negative predictive value of algorithm. In summary, both models closely adhered to the STARD criteria.
Discussion
This study provides an overview of electronic phenotyping methods used to identify patients with PAD in contemporary observational research. Most studies (69%) utilized both ICD-9 and ICD-10 codes in their electronic phenotyping methods, with fewer studies utilizing ICD-9 (19%) or ICD-10 (11.9%) coding systems in isolation. This distribution is expected, as our study population purposefully spanned the ICD-9 to ICD-10 transition, and datasets utilized in observational studies often predate the time of publication, contributing to a data lag.
Our study confirms that most contemporary PAD observational studies utilize single-code queries to identify cohorts of patients with PAD. Several quality issues arise with this observation. First, cohorts identified using single-code queries have been shown to have a low probability of true occurrence of the disease of interest. Additionally, PAD codes specifically have been shown to have poor diagnostic accuracy, with the sensitivities of individual codes as low as 0.85% and with the highest performing codes reaching sensitivities of only 30.5%. 5 Lastly, our study demonstrates there was little consensus between studies regarding the specific PAD codes utilized to identify PAD cohorts.
The second most common electronic phenotyping strategy employed in our review was rule-based methods that combined ICD codes with other structured health data, with 18 studies utilizing this method. Evidence suggests rule-based methodologies offer improved diagnostic accuracy over single-code queries for PAD phenotyping. 10 We demonstrate that each study operationalized a different set of rules for PAD-phenotyping; however, there were similarities between the elements of inclusion. Several studies augmented ICD diagnosis codes with procedural codes, as well as clinical data such as ABIs, imaging, and/or consultations with vascular specialists. For example, Arya et al. 16 established a rule requiring the presence of ICD diagnosis codes for PAD along with any one of three criteria: two ABI measurements within 14 months, two visits to a vascular surgeon or clinic within 14 months, or any PAD procedure code. Kwong et al. 17 required at least two ICD diagnosis codes for PAD, whereas Itoga et al. 18 added a temporal element, requiring two ICD diagnosis codes spaced at least 2 months apart.
Several considerations arise regarding utilizing a rule-based methodology for PAD phenotyping. First, with rule-based approaches, similar issues arise as with single code-based queries, as there appears to be little consensus among studies regarding the specific ICD codes used to identify PAD. Though researchers have begun work to validate ICD diagnosis codes for other disease pathologies such as pulmonary embolism, this work is less robust for PAD. 19 Additionally, it is important to note that requisite health data for the construction of robust rules differs between EHRs and ACDs. Though granular clinical data, such as ABI testing, radiology reports, and clinic notes from vascular specialists, can be extracted from an EHR, ACDs primarily contain structured data and rely on corresponding billing claims to indicate abnormal ABIs, imaging findings, or visits to a vascular specialist. To this end, robust rule-based phenotyping methodologies are more feasible in EHRs compared to ACDs.
On the other hand, five studies in our review utilized validated regression models as a rule-based phenotyping method, with evidence that regression modeling as an alternative to code-based queries has potential in PAD phenotyping. All studies were based on two models: a model published by Fan et al. and a model published by Weissler et al. The Fan model uses a total of 13 ICD-9 and CPT codes as model covariates to identify patients with PAD in administrative databases. The model performs with high accuracy in identifying PAD in patients who had been referred for vascular laboratory evaluation (sensitivity 85.5%, specificity 82.6%) compared to standard ABI testing. The Weissler group constructed a model-based algorithm utilizing ICD-9 codes, ICD-10 codes, and various administrative flags identifying PAD-related encounters, revascularization procedures, and relevant imaging. 15 At a classification threshold of 45% probability of PAD phenotype, the regression model performed with a sensitivity of 75.3% and specificity of 81.7%. However, both models utilize patient cohorts from before or during the 2015 to 2016 ICD-9 to ICD-10 transition, and this timing may limit applicability to more contemporary cohorts given natural challenges in implementation and adoption of the new system during this transition period. 20
As demonstrated in our results, few studies attempted to validate their chosen electronic phenotyping method outside of citation that their chosen method had been previously operationalized in the literature. One clear benefit of logistic regression models is that model validation is often an integral component to model construction, which confers increased reliability in utilizing these methods. Additionally, machine learning methods can integrate multiple data sources and identify patterns, which may improve the reliability of detecting PAD within ACDs where EHR data are not readily available. On the other hand, regression models can be time intensive and complicated to construct. Despite internal validity, these models may not hold up to external validation when using outside data sources.
Limitations
The study has limitations that should be addressed. It is possible that our search did not capture all qualifying articles in the literature. However, our search was conducted in close consultation with librarians with expertise in conducting review searches and we believe our search is a broad representation of available studies. Additionally, the study search was likely narrowed by focusing on isolated PAD cohorts instead of including studies with multiple comorbidities. However, the goal of the current study was to provide a more specific understanding of electronic phenotyping methods used to identify PAD, so focusing on PAD created a more homogenous cohort aimed at minimizing confounding with other comorbidities. Additionally, though our review conducts a quality assessment of the studies of diagnostic accuracy using the STARD criteria, we did not evaluate study quality for all the reviewed articles beyond identification of electronic phenotyping methods and construct validation efforts as this was not the primary study goal.
Recommendations
Looking towards improving quality of PAD observational research, the authors recommend explicit discussion of the validity of the electronic phenotyping method used to identify patients within studies and citation of prior studies that have validated the same chosen method. Given the poor diagnostic accuracy detailed in the literature with viable alternatives, the authors recommend against utilization of single-code-based queries to identify patients with PAD. Rule-based methods are better supported in the literature; however, these methods may be best suited for researchers utilizing EHR data as opposed to administrative claims data as the most robust rules seem to utilize patient-level clinical data. When utilizing administrative claims data, it is important to recognize the limitations in being able to validate a PAD cohort. Researchers can consider techniques to improve specificity such as utilizing multiple diagnosis codes instead of a single-code query, adding a temporal element (i.e., two codes at least 60 days apart), or combining diagnosis codes with procedural codes to temper this limitation.
Furthermore, regression models may be one solution to the outlined informatics challenge, though more contemporary models accounting for the predominance of current ICD-10 coding practices may be warranted. To this end, the authors propose the construction of a contemporary algorithm to identify patients with PAD from administrative databases. Given the challenges of identifying PAD in administrative databases that lack the comprehensive clinical data found in EHRs, the authors propose utilizing claims data linked to individual-level EHR data for model construction and validation. This technique has not been utilized before in algorithm construction and would afford for the direct comparison of model performance utilizing administrative claims data with diagnostic standards such as noninvasive vascular studies (ABIs, toe–brachial indices, pulse volume recordings) and cross-sectional imaging.
Conclusions
Robust PAD research requires the utilization of a wide variety of data sources. However, the current study demonstrates high utilization of unreliable electronic phenotyping methods such as single code-based queries, which may undermine the validity of research studies that rely on these approaches. For epidemiological or observational comparative effectiveness studies to have a meaningful clinical impact, their electronic phenotyping approaches should be validated and reported transparently.
Supplemental Material
sj-docx-1-vmj-10.1177_1358863X251328671 – Supplemental material for A scoping review of electronic phenotyping methodologies used to identify peripheral artery disease in observational studies
Supplemental material, sj-docx-1-vmj-10.1177_1358863X251328671 for A scoping review of electronic phenotyping methodologies used to identify peripheral artery disease in observational studies by Abena Appah-Sampong, Ascharya Balaji, Jack H Casey, Navya Kotturu, Danielle Montano, Mohit Manchella, Bassil Bacare, James J Fitzgibbon, Patrick Heindel, Tanujit Dey, Behnood Bikdeli and Mohamad A Hussain in Vascular Medicine
Footnotes
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Dr Bikdeli is supported by a Career Development Award from the American Heart Association and VIVA Physicians (#938814). Dr Bikdeli was supported by the Scott Schoen and Nancy Adams IGNITE Award and is supported by the Mary Ann Tynan Research Scientist award from the Mary Horrigan Connors Center for Women’s Health and Gender Biology at Brigham and Women’s Hospital, and the Heart and Vascular Center Junior Faculty Award from Brigham and Women’s Hospital. Dr Bikdeli reports that he is a member of the Medical Advisory Board for the North American Thrombosis Forum, and serves in the Data Safety and Monitory Board of the NAIL-IT trial funded by the National Heart, Lung, and Blood Institute, and Translational Sciences. Dr Hussain is supported by a Brigham and Women’s Hospital Heart and Vascular Center Faculty Award and Brigham and Women’s Osteen Award. Dr Hussain is a consultant for Humacyte, Inc. Dr Hussain reports research funding from Vascular Therapies (site princiapl investigator [PI] of ACCESS-2 Trial), Humacyte, Inc. (site PI of V012 Trial), and VenoStent (site PI of SAVE-FistulaS Trial). The remaining authors have no conflicting interests.
Funding
This work was supported by the American Heart Association Research Supplement to Promote Diversity ‘Validation of the MAGNIFY-PAD Tool Identify Peripheral Artery Disease in Electronic Health Databases’ (grant no. 23DIVSUP1069428).
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
