Abstract
Despite high morbidity and mortality associated with peripheral artery disease (PAD), it remains under-diagnosed and under-treated. The objective of this study was to develop a screening metric to identify undiagnosed patients at high risk of developing PAD using administrative data. Commercial claims data from 2010 to 2012 were utilized to develop and internally validate a PAD screening metric. Medicare data were used for external validation. The study population included adults, aged 30 years or older, with new cases of PAD identified using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis/procedure codes or the Healthcare Common Procedure Coding System (HCPCS) codes. Multivariate logistic regression was conducted to determine PAD risk factors used in the development of the screening metric for the identification of at-risk PAD patients. The cumulative incidence of PAD was 6.6%. Sex, age, congestive heart failure, hypertension, chronic renal insufficiency, stroke, diabetes, acute myocardial infarction, transient ischemic attack, hyperlipidemia, and angina were significant risk factors for PAD. A cut-off score of ⩾20 yielded sensitivity, specificity, positive predictive value, negative predictive value, and c-statistics of 83.5%, 60.0%, 12.8%, 98.1%, and 0.78, respectively. By identifying patients at high risk for developing PAD using only administrative data, the use of the current pre-screening metric could reduce the number of diagnostic tests, while still capturing those patients with undiagnosed PAD.
Keywords
Introduction
Peripheral artery disease (PAD) is a common disease associated with high morbidity and cost. Its prevalence increases with age and ranges from 10% among patients aged 65–69 years to 16% in patients aged 80–84 years. 1 PAD is associated with an increased risk of cardiovascular events, including myocardial infarction (MI) and stroke, as well as death.2,3 Approximately one in three patients suffering from PAD are hospitalized within 2 years of diagnosis, and nearly 50% die from cardiovascular complications. In fact, one year post-diagnosis, the cardiovascular mortality rate is 3.7-fold higher in patients with PAD than in those without PAD.4,5 This morbidity leads to a high economic burden with PAD-attributed costs of $5,955 per patient per year in a managed care population. 4
Evidence-based guidelines lack consensus regarding the appropriateness of screening for undiagnosed PAD. The Trans-Atlantic Inter-Society Consensus Document on Management of PAD (TASC II) report recommends targeted screening for PAD using cardiovascular risk factors and age. 6 The American College of Cardiology and American Heart Association similarly endorse targeted screening for patients under the age of 65, and universal screening in patients over that age. 7 Still, other groups, such as the American College of Physicians and the United States Preventive Services Task Force, have determined that the evidence is insufficient to determine the benefits and harms of screening for PAD, and they recommend against screening asymptomatic patients. 8
Because nearly half of PAD patients are asymptomatic, and only a fraction of them experience the primary symptom, intermittent claudication, the disease is largely under-diagnosed and under-treated.5,9,10 Clinical examination findings, such as the absence of pedal pulses, are associated with poor sensitivity for the detection of PAD in undiagnosed patients. 11 Although the Doppler ankle–brachial index (ABI) has high sensitivity, it is seen by many as infeasible in primary care practice due to the substantial increase in workload for the primary care physician. 12 In a survey of primary care physicians, nearly 70% reported never using the ABI in their practice setting. 13 Often, testing is delayed until patients present with classic leg claudication symptoms – the primary indication that PAD is already present, precluding the implementation of strategies to prevent the disease. 14
Half of PAD patients have known coronary artery disease (CAD), and low ABI is an independent predictor of cardiovascular risk.15,16 Identification of patients at risk for PAD could enable early initiation of PAD and CAD prevention strategies. A screening metric that utilizes administrative data allows for large populations of undiagnosed patients to be evaluated for their risk of PAD. Identified high-risk patients can be notified directly to request a diagnostic test, or their physician can be alerted to their increased risk of PAD. Various studies, including those investigating diabetic retinopathy, breast cancer screening, and cervical cancer screening, have demonstrated success with this notification method based on information from administrative claims data.17–21
The objectives of this study were to develop a risk prediction algorithm for patients at risk of developing PAD in a select commercial population, internally validate the algorithm within the same population, and externally validate the algorithm using Medicare claims data.
Methods
Data sources
The present study used proprietary administrative claims data that included 1.44 million individuals who received health coverage through self-insured employer groups and commercial insurers, and members from third party administrators, within both health maintenance organization and preferred provider organization models. The combined claims database included demographics, enrollment information, and medical claims (including inpatient stays and outpatient physician visits).
Medicare data were used for external validation. The dataset included a 5% random sample of Part B beneficiaries, matched to their Medicare Part A inpatient claims. PAD diagnoses and baseline comorbidities in the validation sample were ascertained using Medicare Part A and Part B (physician claims) files. These files contain information on diagnoses, procedures, dates of service, and providers.
The administrative data used for both datasets ranged from 1 January 2010 through 31 December 2012. This study was conducted using fully de-identified Medicare and commercial claims data obtained through Health Advocate, Inc., received and managed in compliance with the Health Insurance Portability and Accountability Act of 1996. Thus, approval from the Institutional Review Board was not required.
Study design and study cohort
The present study used a retrospective cohort design and medical claims-based algorithm to identify patients with PAD. The study population included individuals 30 years and older. Patients were classified as new cases of PAD if they had a diagnosis code for PAD without any prior evidence of PAD for at least 12 months prior to the first diagnosis. The date of the first diagnosis of PAD was defined as the index date. The control group included patients 30 years or older who did not have any claims for PAD during the study period. The first claim between 1 January 2011 and 31 December 2012 formed the index date for the non-PAD group. Patients were excluded from the study cohort if they were younger than 30 years, did not have continuous coverage (with an allowable gap of 30 days) in the 1-year baseline period, or had any missing demographic information (sex and race).
Independent and dependent variables
Diagnosis of PAD was the primary dependent variable. It was classified as a dichotomous variable (yes/no). A claims-based algorithm using International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis codes, ICD-9-CM procedure codes, and Current Procedural Terminology (CPT) codes was used to identify patients with PAD diagnoses. This algorithm was based on a thorough review of the literature.4,10,22–24 These codes indicate reasonable diagnostic accuracy in identifying PAD cases using administrative claims data. 22 The diagnosis and procedure codes used to identify PAD and non-PAD cases are listed in Appendix A. Patients were identified as PAD cases if they had at least one inpatient, outpatient, or professional claim with a primary or secondary diagnosis or procedure code listed for PAD in Supplementary Appendix A.
Past literature indicates that demographic factors, behavioral factors, comorbidities, and various biomarkers are important risk factors for PAD.6,25 Independent variables used in the present study included demographic information (age and sex), chronic conditions (diabetes, hypertension, dyslipidemia, chronic renal insufficiency, congestive heart failure (CHF), transient ischemic attack, angina, and chronic renal insufficiency), and acute conditions (acute myocardial infarction (AMI) and stroke). Angina and unstable angina were used as a measure for CAD. 10 These chronic conditions were captured during the 1-year baseline period using ICD-9-CM diagnosis codes. The ICD-9-CM diagnosis codes used to capture chronic conditions were obtained from the 2015 HEDIS value set directory, past literature and Health Advocate proprietary metrics.
Statistical analyses
Model development
Variable selection for developing the PAD screening metric was achieved using a combination of past literature and statistical methods. Correlation between independent variables was estimated using a correlation table (correlation value >0.70), and other collinearity indices such as variance inflation factor (VIF) >10, tolerance value <0.1, eigenvalue close to 0, or condition index >30. 26 Bivariate associations between PAD and risk factors were examined using a χ2 test for categorical variables and t-test for continuous variables. The final multivariable regression model for the PAD screening metric was obtained using stepwise forward and backward multiple logistic regression analysis, where p < 0.15 was used for entry of a variable and p < 0.2 was used for retaining the variable in the model. The mean β-coefficient of each risk factor in the logistic regression model was derived from 200-cycle bootstrapped simulation samples. As per past literature, an integer-based, weighted PAD risk score was developed by multiplying the mean β-coefficient by 10, and rounding off to the nearest integer. These risk scores were added, along with the model intercept, to obtain the total risk score for each patient.27,28
Model validation
Both internal and external validations were performed. Internal validity was examined by splitting the study cohort and applying the risk score and intercept from the developed PAD model in 50% and 25% of the study cohort selected randomly. External validation was performed separately in the Medicare dataset without any modifications or reconstruction of the original model.
Model calibration, discrimination, and reclassification
The calibration of the model was assessed graphically by plotting the predicted probability of PAD and the percentage of participants who developed PAD by their risk scores. Discrimination was assessed using the area under the receiver operating characteristics (ROC) curve, or the c-statistic. Reclassification improvement was quantified using the net reclassification improvement (NRI) statistic which measures the added value of the improved model (final model) when compared to the basic model (age and sex only). 29 The NRI is sensitive to the choice and number of categories. Different thresholds may result in very different NRIs for the same added test. To overcome this issue, integrated discrimination improvement (IDI) was calculated. IDI uses probability differences, instead of categories, to classify patients into diseased or non-diseased categories. 29 All analyses were conducted using SAS Proprietary Software Release 9.4. 30
Results
Model development
There were 139,610 patients in the cohort after applying inclusion and exclusion criteria. Of these, 9,192 (6.6%) developed PAD during the study period. Figure 1 depicts study sample selection and cohort development. Baseline characteristics of the study cohort and results of the bivariate analysis are presented in Table 1. Most newly diagnosed patients with PAD were aged 30–64 years (72.0%), male (57.9%), and commonly diagnosed with hyperlipidemia (65.1%), hypertension (60.5%), and diabetes (36.1%). All patient characteristics measured in the 1-year baseline period were significantly associated with a PAD diagnosis. Table 2 presents results from the stepwise logistic regression model indicating the predictors significantly associated with risk of PAD diagnosis, which included demographic variables (age, sex) and cardiovascular and metabolic conditions (CHF, hypertension, chronic renal insufficiency, stroke, diabetes, AMI, transient ischemic attack, hyperlipidemia, angina). Table 3 presents risk score calculations using mean β-coefficients obtained from the 200-cycle bootstrapped simulation samples. The mean, median, and range for the risk scores were 20.5, 18, and 14–68, respectively; a higher score indicates a greater risk of being diagnosed with PAD in the following year.

Flowchart of study sample selection and cohort development. (PAD, peripheral artery disease.)
Baseline characteristics of the development cohort.
All comparisons are significant at p<0.0001.
PAD, peripheral artery disease.
Final stepwise multivariable logistic regression model for the risk of developing PAD in the study population.
All comparisons are significant at p<0.0001.
PAD, peripheral artery disease.
Risk score calculation using mean β-coefficient obtained from the 200-cycle bootstrapped simulation samples.
Cut-off score calculation
Sensitivity, specificity, Youden index, Matthew correlation coefficient, total accuracy, and total misclassification error were used to select the cut-off score to classify patients at risk for PAD. 31 The Matthew correlation coefficient is a measure of test accuracy. A coefficient of +1 represents a perfect prediction, 0 is no better than random prediction and −1 indicates total disagreement between prediction and observation. 32 A cut-off score of 20 was chosen, as this score yielded maximum accuracy (Youden index, Matthew correlation coefficient, total accuracy) and minimum total misclassification error (Supplementary Appendix B). This cut-off score yielded sensitivity, specificity, positive predictive value, and negative predictive value of 83.5%, 60.0%, 12.8%, and 98.1%, respectively.
With a cumulative incidence of 6.6% in our commercial population, screening all patients in the population would require 15 patients to be screened in order to identify a single case of PAD. Using the screening metric with a cut-off score of 20, which was associated with a positive predictive value of 12.8%, only eight patients would need to be screened to find a single case of PAD. This assumes a perfect diagnostic test for PAD. There is no bias introduced, however, since the positive predictive value of the chosen test would affect both the PAD and non-PAD group.
Model validation, calibration, discrimination, and reclassification
The internal validation in 50% of the study cohort yielded sensitivity, specificity, positive predictive value, and negative predictive value of 83.2%, 59.9%, 12.8%, and 98.1%, respectively. The internal validation in 25% of the study cohort yielded sensitivity, specificity, positive predictive value, and negative predictive value of 83.6%, 59.7%, 12.8%, and 98.0%, respectively.
External validation in a 5% sample of Medicare beneficiaries (n=887,536) yielded sensitivity, specificity, positive predictive value, negative predictive value, and c-statistic of 98.0%, 8.6%, 18.8%, 95.3%, and 0.68, respectively.
The ROC curve analysis in the study cohort (n=139,610) yielded a c-statistic of 0.78, indicating good discrimination (Figure 2). The calibration curve for the PAD risk score model indicated that the model is well calibrated (Supplementary Appendix C). The NRI and IDI for the PAD risk score model were 0.62 (95% CI: 0.60, 0.64) and 0.066 (95% CI: 0.063, 0.069), respectively, when compared to the model using age and sex alone. Both NRI and IDI indicated significant improvement in risk prediction using the PAD risk score model when compared to the age and sex only model. The IDI statistic indicated that the discrimination ability of the PAD risk score model was 7 percentage points higher than the basic model.

ROC (receiver operating characteristic) curve for the PAD risk score model. (PAD, peripheral artery disease.)
Discussion
To our knowledge, this is the first predictive risk model developed using administrative claims data for screening PAD patients. Administrative claims databases present a unique opportunity to examine risk factors and develop a screening metric for PAD in real world settings by providing rich information on patient demographics and comorbidities for large cohorts of patients. The 12.8% positive predictive value obtained in the present study improved upon the 9.5% positive predictive value obtained using the current Inter-Society Consensus (ISC) screening criteria. 33 Our screening algorithm also resulted in fewer patients to be screened (7.8 vs 9.9) to obtain one diagnosis of PAD. 33 When compared to universal screening, where 15 patients would need to be screened to identify a single patient with PAD, the present screening algorithm decreases this number by nearly half.
Our risk model was developed using factors that are both clinically relevant and have a strong association with PAD.6,25 This approach provided strong face validity to the PAD risk model. Angina and unstable angina were used as a measure for CAD. 10 Based on correlation and multicollinearity analysis, there was no strong correlation between angina and unstable angina. Thus, angina and unstable angina were used as separate risk factors in our model. The developed model showed good internal and external validity, sensitivity, and accuracy. Additionally, NRI and IDI indicated significant improvement in PAD risk prediction by using the PAD risk model when compared to the simpler model comprising age and sex alone.
The calibration curve showing the proportion of actual versus predicted PAD patients suggested good calibration. Although we used a cut-off score of 20 to define at-risk PAD patients to yield maximum accuracy and minimum error, different cut-off values can lead to different false positive and false negative rates of PAD. Also, the predictive values of the present screening model are dependent upon the prevalence of PAD in a population.
The discriminatory ability of the present model (c-statistic = 0.78) is comparable to the PAD model developed by Makdisse et al. (2007) (c-statistic = 0.85), 34 which had access to additional data such as laboratory tests and lipid profiles. Although these additional data are available to some large employers and payors, their sporadic availability limits the applicability of the Makdisse model. In a French study, authors gathered demographic information, clinical symptoms, and medical history from a hospital setting to develop a PAD risk model. Their PAD risk model had lower discrimination (c-statistic = 0.66) when compared to the present study (c-statistic = 0.78). 5 The discrimination ability of the present metric was also superior than the pre-screening test developed in a Spanish population using demographic, clinical, and biomarker information (c-statistic = 0.76). 33 The PAD detection tool developed by Duval et al. (2012) using demographic, clinical, and laboratory information had lower discrimination (c-statistic = 0.61) compared to our model (c-statistic = 0.78). 35 Our proposed model, developed solely from administrative claims data, incorporates the majority of key risk factors for PAD and has a strong dis-criminatory ability compared to previously developed screening models.
PAD screening is important to prevent the progression of PAD through early interventions such as smoking cessation and progressive walking programs, which have been shown to slow PAD progression.36,37 These interventions are relatively low cost and can be accessible to most patients with early PAD. Furthermore, low ABI is an independent predictor of cardiovascular risk, even after adjusting for common risk factors. Therefore, PAD screening and early diagnosis of low ABI can help identify and address increased cardiovascular risk and ensure appropriate diagnosis and management of CAD as well. The present pre-screening metric can be readily applied by health plans to identify high-risk patients (defined by positive predictive value), initiate early interventions, and, in turn, reduce the large economic burden of PAD.
Evidence-based guidelines differ on routine screening for PAD in undiagnosed patients using ABI. In addition, ABI has not been found to be cost effective in detecting PAD cases,38,39 but may prove to be if patients are pre-screened using the algorithm described here. Furthermore, the cost of ABI testing is greater in the hospital or vascular lab setting than in the physician office setting. 40 Training primary care physicians to perform ABI and notifying them of their high-risk PAD patients could increase ABI screening rates in the primary care setting and subsequently reduce cost.
Strengths and limitations
Strengths of this study include the large, heterogeneous study population and the use of data readily available in administrative claims databases. ABI is commonly used to confirm PAD diagnosis, but is under-utilized for routine screening of undiagnosed patients due to time constraints and costs. 5 In contrast, the present PAD screening metric uses demographic and clinical information routinely captured in administrative claims data, making it possible for health plans to stratify patients according to their PAD risk and direct those at highest risk for confirmatory testing. We envision this metric to aid a clinician’s decision of whether or not to pursue a diagnostic test. This approach could reduce low-value ABI testing and encourage stewardship of healthcare resources.
Limitations of this study include the fact that validation of the results of the screening metric with a gold standard confirmatory test like ABI was not performed. However, a thorough review of the literature led to the development of our claims-based algorithm to identify PAD,4,10,22,23 which had been shown to have reasonable diagnostic accuracy. 22 Further studies to confirm the predictive value of this screening tool using an ABI would be useful. Our analyses were limited to the information available in the administrative claims data, which can be incomplete. For example, smoking, obesity, and alcoholism are important risk factors for PAD, but are poorly captured in claims data and were not included in the study. Similarly, information on race, biomarkers, and severity of PAD was not available. Variables used to develop the pre-screening metric have variable and imperfect accuracy. Additionally, the use of claims data to identify incident cases of PAD may result in both under- and over-detection of cases. Lastly, there may be concerns that a screening tool can provide information to patients that is worrisome or lead to unnecessary procedures. However, the significant morbidity and mortality of PAD, and the relative low cost and non-invasive nature of ABI as the confirmatory test, gives merit to consideration of the use of this tool to pre-screen for high-risk patients.
Conclusions
PAD often remains unrecognized because its initial stages are asymptomatic. Identifying patients at earlier stages of disease allows for treatment strategies that can slow the progression of disease. We describe a predictive algorithm that decreases by half the number of patients needed to be screened to find a single PAD case, which could decrease the number of ABI exams, in addition to the harms present with any type of diagnostic testing. This model is easily reproducible being based on administrative claims data, and showed superior discriminatory ability over other PAD screening tools. It could be adopted by commercial and public payors and large employers to pre-screen patients, employ early treatment strategies, and engage high-value screening strategies to improve population health.
Footnotes
Acknowledgements
We would like to thank Dr Alain Koyama and Dr Kira Ryskina for their editing of the manuscript.
Declaration of conflicting interest
All the authors are affiliated with Health Advocate, Inc.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
