Abstract
Keywords
Background
Chronic venous insufficiency (CVI) is a prevalent vascular disorder associated with substantial healthcare costs and reduced quality of life. 1 Conventional clinical evaluation, although valuable for initial assessment, primarily yields qualitative data that may not reliably confirm the diagnosis or characterize disease severity. Quantitative assessment of venous reflux is essential for accurate diagnosis and effective management.
Ambulatory venous pressure (AVP) reflects the combined influence of hemodynamic factors in CVI, such as valvular reflux and calf muscle pump function. Although previously considered the gold standard for quantitative evaluation, its clinical use is limited by its invasive nature. 2 Duplex ultrasonography (DUS) and air plethysmography (PG) are noninvasive methods for evaluating CVI. DUS, introduced in the early 1980s for deep vein thrombosis diagnosis, is now also employed to assess reflux and its anatomical location and severity.3 –6 Introduced in the early 1960s, air plethysmography (PG) was initially used to measure relative lower-limb volume changes during postural shifts and muscular activity. 7 It provides a quantitative assessment of reflux and is valuable for monitoring surgical outcomes.8,9 PG has been widely applied in evaluating CVI, particularly calf muscle pump function and venous reflux.10 –12
There is limited evidence directly comparing the diagnostic accuracy of US and PG in CVI. This study aimed to evaluate the accuracy of US and PG as initial screening tools at the two extremes of venous function defined by Clinical, Etiologic, Anatomic, and Pathophysiologic (CEAP) classification criteria.
Methods
Study design and patients
The Mayo Clinic Gonda Vascular Laboratory database was queried for patients aged 18 years or older with complete venous physiological testing by PG from March 1, 2015 through July 31, 2024. For patients with more than one study, the initial study was used for this analysis. Next, insufficiency US studies were identified if performed within 90 days before or after a PG study. Data from the insufficiency US were extracted, and right and left leg findings were merged with the appropriate limb findings from the PG study. For patients with unilateral insufficiency US, the findings from PG on the contralateral limb could not be used and were discarded. The Mayo Clinic institutional review board reviewed and approved this study.
Insufficiency ultrasound
Patients with suspected venous insufficiency were first imaged for deep venous thrombus from the common femoral to the popliteal vein level. Evaluation for insufficiency was performed with the patient in the upright position at a 50–60° angle or greater with the leg externally rotated. The common femoral vein (CFV), femoral vein (FV), and popliteal vein (PV) were evaluated with spectral Doppler and distal augmentation for reflux. Deep veins are considered competent if the reflux duration is < 1.0 second. Mild to moderate deep incompetence in our practice is considered to be 1.0–3.0 seconds, with mild incompetence having a lower amplitude than moderate incompetence. Severe deep incompetence is defined as reflux > 3.0 seconds.
The superficial venous system was evaluated, including the saphenofemoral junction (SFJ), anterior accessory saphenous vein (AASV), posterior accessory vein if associated with a varix, great saphenous vein (GSV; upper thigh, at the knee and in the calf), saphenopopliteal junction (SPJ; if present), and small saphenous vein (SSV; at the midcalf). When varices are noted, insufficiency is evaluated from the originating vein, and additional sites of testing for insufficiency may be added. Superficial veins are considered competent when reflux duration is < 0.5 seconds. The degree of incompetence is determined based on the duration of reflux, and when borderline, it can be adjusted into a category based on amplitude. Mild incompetence is defined as 0.5–1.0 second of reflux, moderate incompetence as 1.0–3.0 seconds of reflux, and severe incompetence as > 3.0 seconds of reflux.
Plethysmography
Venous hemodynamics (obstruction, incompetence, calf pump function [CPF], and postexercise venous refilling time [P-EVRT]) were assessed in ambulatory outpatients by standard PG methods (Supplemental material) using the VenView chair (AdvanzeCardio, Fountain Hills, AZ, USA), as previously described.13 –15 Briefly, testing included three different phases: venous outflow to assess venous obstruction, passive drainage and refilling (PDR) to assess and grade valvular incompetence, and then ankle flexes followed by passive refilling to calculate CPF and P-EVRT. CPF was measured as an ejection fraction (EF) and was analyzed as a continuous variable. Venous outflow was classified as obstructed or patent, and venous incompetence per extremity was categorized into normal, mild, moderate, or severe, based on flow and volume, using established laboratory criteria. 12 P-EVRT represents the time taken for venous refilling following exercise-induced emptying; it is measured in seconds and treated as a continuous variable for analysis. 16
CEAP classification
This system evaluates CVI based on clinical presentation, etiology, anatomy, and pathophysiology. 17 The clinical classification was performed and/or supervised by a certified vascular technician at the time of the PG study for each limb and classified into seven classes: C0 (no signs of venous disease), C1 (telangiectasias or reticular veins), C2 (visible varicose veins), C3 (edema), C4 (A: hyperpigmentation, B: lipodermatosclerosis, or atrophie blanche), C5 (healed venous leg ulcer), and C6 (active venous leg ulcer). For this study, the maximum clinical (C) class from each limb was categorized into three groups: C0–C2 was categorized as mild disease, C3 as moderate disease, and C4–6 as severe disease.
Statistical analysis
As each limb contained independent PG, venous insufficiency US, and CEAP categories, the analysis was performed per limb. The primary outcome of interest was the ability of venous insufficiency US parameters compared to PG parameters to predict severe CVI (C4–6). Descriptive characteristics were presented as means and SDs for continuous variables, and numbers and percentages for categorical variables. Pearson’s chi-squared test was used to evaluate the relationship between categorical variables. The Kolmogorov–Smirnov test was used to test the distribution of continuous variables. For comparisons involving more than two groups, one-way ANOVA was used for normally distributed variables, and the Kruskal–Wallis test was applied otherwise. All statistical tests were two-sided, with significance defined as p < 0.05. Statistical analyses were performed using Python with the SciPy 16 and statsmodels libraries. 18
Machine learning analyses
Parameters from venous insufficiency US and PG were analyzed using multiple machine learning models to predict the presence of severe CVI (C4–6) as compared to mild CVI (C0–C2). Calf ejection fraction (continuous), P-EVRT (continuous), obstruction (present or absent), and category of incompetence (none, mild, moderate, severe) were used for the PG data, and classification of incompetence (none, mild, moderate, severe) was used per deep and superficial vein segment for venous insufficiency US data inputs. Multiple machine learning models were performed to account for potential differences in the underlying structure of venous insufficiency US versus PG data, with the two most successful models reported alongside traditional logistic regression models. To address class imbalance in machine learning, the majority class was randomly down-sampled without replacement to match the number of observations in the minority class. The dataset was then partitioned into training and testing subsets using an 80:20 split. Multiple classification algorithms were implemented using Python (version 3.10) and the scikit-learn library (version 1.6.1), 18 including random forest, support vector classifier (SVC), AdaBoost, k-nearest neighbors (KNN), logistic regression, gradient boosting, decision tree, and multilayer perceptron (MLP). XGBoost (version 3.0.2) 19 was also included as a high-performance gradient boosting framework. Each model (for US or PG parameters) was trained on the training set and evaluated on the test set using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUROC). Confusion matrices were computed to derive true/false positive and negative rates. For models lacking native probability outputs, decision function scores were normalized to estimate class probabilities for AUROC analysis.
Results
A total of 1478 limbs in 839 patients had both venous insufficiency US and PG measurements and were included in the analysis. Most patients were women (62%) and White (91%), and the mean age was 61 years (SD ± 14.9). A total of 753 left and 725 right limbs were analyzed. Most patients underwent venous insufficiency US and PG within 10 days of each other (80%). The clinical severity of CVI by the CEAP classification was severe in 477 limbs (32.3%), moderate in 496 limbs (33.6%), and mild in 505 limbs (34.2%) (Table 1). There was no difference in the severity of CVI by limb side (p = 0.89). Limbs with severe CVI had a higher mean leg circumference than those with mild CVI (259 ± 35 vs 232 ± 21 mm; p < 0.001).
Demographics, plethysmography, and venous insufficiency ultrasound findings by severity of chronic venous insufficiency.
CVI, chronic venous insufficiency; PG, plethysmography; US, ultrasound.
Plethysmography data
Any abnormal PG (obstruction, incompetence, CPF, or P-EVRT) study results were observed in 1078 limbs (72.9% of the total cohort), with the highest prevalence in the severe CVI group (88.5%), decreasing to 78.2% in the moderate and 53.1% in the mild CVI groups (Table 1; p < 0.001). No venous incompetence was observed in 652 limbs (44.1%) with the highest proportion in the mild CVI group (62.8%), compared to 44.6% in the moderate and 23.9% in the severe CVI groups (Table 2). Severe incompetence was noted in 171 limbs (11.6%), predominantly in the severe CVI group (22.4%), compared to 8.7% in moderate and 4.2% in mild CVI. Hemodynamic obstruction was observed in 47 limbs (3.2%) and was most prevalent with severe CVI (4.4%, 21 limbs), followed by 3.8% (19 limbs) with moderate CVI, and 1.4% (seven limbs) with mild CVI (p = 0.002). The mean EF was lowest in the severe CVI group at 40.2% (SD 22.2), increased to 44.9% (22.0) in the moderate group, and was highest in the mild group at 56.5% (21.7; p < 0.001). The mean P-EVRT was lowest in the severe CVI group at 13.0 seconds (SD 11.0), increasing to 18.8 seconds (14.5) in the moderate group and 28.5 seconds (18.3) in the mild group (p < 0.001).
Venous air plethysmography results across chronic venous insufficiency categories.
CVI, chronic venous insufficiency; P-EVRT, postexercise venous refilling time.
Venous ultrasound data
Among the 1478 limbs, 997 (67.4%) exhibited valvular incompetence on venous insufficiency US in at least one location (deep or superficial vein segment), most frequently observed in the severe CVI group (75.3%), followed by 64.7% in the moderate and 62.8% in the mild CVI groups (p < 0.001; Table 1). Deep venous incompetence at one or more vein segments was associated with CVI severity (p < 0.001). Among limbs with deep venous incompetence based on US, 33.3% were in the severe CVI group, 19.8% in the moderate CVI group, and 13.7% in the mild CVI group. Superficial venous incompetence based on venous US was also significantly associated with disease severity (p = 0.02). The severity of venous incompetence, graded as none, mild, moderate, and severe for each deep and superficial vein segment tested, can be found in Table 3. Ultrasound evidence of prior phlebitis was found in 18.0% of limbs with severe CVI, 7.9% of limbs with moderate CVI, and 4.8% of limbs with mild CVI (p < 0.001).
Ultrasound findings across chronic venous insufficiency categories.
Phlebitis in deep or superficial veins was defined as postthrombotic vein thickening or partial or complete thrombotic occlusion of a vein segment.
AASV, anterior accessory saphenous vein; CFV, common femoral vein; CVI, chronic venous insufficiency; FV, femoral vein; GSV, great saphenous vein; PV, popliteal vein; SFJ, saphenofemoral junction; SPJ, saphenopopliteal junction; SSV, small saphenous vein; US, ultrasound.
Prediction of chronic venous disease severity
Table 4 summarizes the performance of machine learning models trained to predict severe CVI as defined by CEAP classification scores of 4, 5, or 6. Models developed using insufficiency US data (parameters from Table 3) demonstrated the lowest predictive performance across all classifiers. The gradient boosting model, despite being the best-performing US-based approach, reached an AUROC of 0.65. The SVC followed closely with an AUROC of 0.63. Traditional logistic regression had the lowest predictive capacity, with an AUROC of 0.61. For PG-based data models using the complete set of parameters (from Table 2), MLP yielded the highest overall performance with an AUROC of 0.82. Similarly, AdaBoost and logistic regression classifiers using the full PG dataset also demonstrated strong predictive capabilities, with AUROC values of 0.81–0.82. These results indicate that models utilizing comprehensive PG data were highly effective in predicting the presence of severe CVI. As PG contains measurements not performed by US (CPF, P-EVRT), we then limited the PG input to only the incompetence and obstruction data; the model performance declined moderately but remained higher than insufficiency US results. For instance, logistic regression on this dataset achieved an AUROC of 0.72 (as compared to 0.81 before). Both MLP and KNN models showed similar outcomes (AUROC of 0.72).
Top performing machine learning models for the prediction of severe chronic venous insufficiency (CEAP classes 4–6) by venous study type and components.
All machine learning models were analyzed on the same sampling; the top two performing models and logistic regression were included for each subset.
AUROC, area under the receiver operating characteristic curve; CEAP, Clinical, Etiologic, Anatomic, and Pathophysiologic; KNN, k-nearest neighbors; MLP, multilayer perceptron; PG, air plethysmography; PPV, positive predictive value; SVC, support vector classifier; US, ultrasound.
Discussion
This study evaluated PG and insufficiency US in assessing the severity of CVI across 1478 limbs in a large cohort of patients. Using machine learning models, our analysis demonstrates the superior predictive performance of PG results compared to US-derived results for the prediction of CVI severity. Even when limiting the PG input to only the incompetence and obstruction data, results with PG remained better than those of the US (Figure 1). The superior performance of PG models indicates that quantitative hemodynamic measures of the overall limb perform better than assessments of deep and superficial vein segments, even when analyzed together with sophisticated analysis techniques. We found that approximately 25% of patients with clinically severe CVI had a completely normal venous insufficiency US (compared to only 11% with normal PG), a finding that supports the superiority of PG not only for determining the severity of CVI but also indicates that PG has a higher sensitivity in patients with more advanced CVI, likely in part secondary to the inclusion of CPF and P-EVRT measurements.

Receiver operating characteristic curves for machine learning models predicting severe chronic venous insufficiency. Plethysmography all, using all parameters (AdaBoost), achieved the highest performance (AUC = 0.82), followed by Plethysmography limited, restricted to incompetence and obstruction parameters (logistic regression, AUC = 0.72), and ultrasound (GradBoost, AUC = 0.65).
We have previously examined how venous hemodynamic parameters, which can only be assessed via PG, correlate with the diagnosis and classification of CVI. Notably, CPF was independently correlated with CVI severity based on the CEAP classification. 20 A stepwise relationship between CPF was observed with the occurrence of active or past ulcers. This relationship was most pronounced when CPF was severely impaired, with ejection fractions between 0% and 9%. Conversely, limbs with CPF in the 40–49% range did not show a statistically significant association with active or prior ulceration when compared with those displaying higher EF values. Reduced CPF has also been shown to be a risk factor for venous thromboembolism14,21 and has been independently associated with higher mortality.13,22 Furthermore, reduced CPF cannot be explained by generalized muscle weakness as measured by handgrip strength. 23 P-EVRT has also been shown to independently associate with the severity of CVI classified by CEAP. 15 Rapid P-EVRT (less than 20 seconds) was correlated significantly with increased clinical severity, as indicated by CEAP classes. For every 10-second reduction in P-EVRT below 40 seconds, there was a notable rise in CEAP classes. Refilling times under 10 seconds were strongly associated with a higher prevalence of skin changes and healed or active venous leg ulcers. In this analysis, PG outperformed US in predicting CVI severity; in part, because of these quantitative hemodynamic measurements.
Ambulatory venous pressure was historically regarded as the gold standard for assessing the hemodynamic burden of CVI. 2 However, due to its invasive nature, it is not suitable for routine screening or repeated measurements. DUS and PG are the two most frequently used noninvasive modalities for evaluating venous function. DUS has become the first-line imaging technique for detecting the presence, extent, and anatomical distribution of venous reflux; in part, this is because of the discontinuation of specific billing codes for PG in the United States, which has led to its decline. DUS does not, however, provide an overall assessment of the limb, but rather a detailed anatomic map that can be helpful for surgical decision-making. Notably, no studies have demonstrated a strong correlation between DUS findings and ambulatory venous pressure. In contrast, hemodynamic parameters obtained from PG have shown a consistent association with invasively measured ambulatory venous pressures. 10
A direct comparison of the US to PG in CVI exists but is limited. Bays et al., 24 in a study of 20 patients, demonstrated that PG accurately distinguishes limbs with and without venous reflux when compared with DUS, which aligns with our findings. They observed that PG is equally valuable to DUS for evaluating patients undergoing ligation and stripping of varicose veins. Moreover, PG allows for the quantitative assessment of superficial venous incompetence, detects any outflow obstruction, and helps determine whether varicosities significantly contribute to elevated venous pressure or serve as collateral outflow pathways from obstructed deep veins. Current guidelines offer only limited recommendations for PG in clinical practice. 25
Our work is the largest study comparing US to PG in the assessment of CVI, but it has several limitations. The dataset assembled was restricted to patients who ultimately underwent both PG and insufficiency US testing, which likely enriched the cohort with patients who had more advanced venous disease than would be expected in the general population. Although this does not bias the direct comparison of machine learning models for PG and the US in our cohort, it does prevent a complete understanding of the comparison across the more representative population-based sample. CEAP classification was performed at the time of PG testing (rather than at the time of the US), which may have influenced the strength of its correlation with PG. CEAP classification was performed and recorded before the PG results were known; however, the US results in this study could have been performed before or after the PG study (and CEAP recording).
The CEAP classification was used as the reference standard in this study as a clinically relevant and well-validated system; however, the clinical (C) component as a clinical assessment lacks specificity for venous pathology. This may explain why in some patients with C4, C5, and C6 disease (which we classified as severe CVI), no evidence of venous pathologies on either the US or PG was found. Such cases may in fact represent patients with lymphedema without venous pathology. Unfortunately, additional quantitative scoring tools, such as the Venous Clinical Severity Score, were not uniformly available in the electronic health record. In order to limit any time-based changes of CVI, we ensured that studies were performed in close proximity (< 90 days), and most were done within 10 days (80%). Because the study population consisted of patients with signs or symptoms potentially consistent with CVI who were referred for diagnostic testing, the severity of CVI should not be generalized to the broader population or be used to estimate the true prevalence and characteristics of CVI in the community.
Conclusions
This analysis compares the US and PG directly in a large sample of patients being evaluated for CVI and demonstrates the superior performance of PG for predicting patients with severe CVI. Unfortunately, PG is unavailable to many patients and clinicians, in part due to poor reimbursement and the decommissioning of specific billing codes. This analysis, alongside recent publications firmly establishing CPF and P-EVRT as important parameters in CVI assessment, redemonstrates the clinical value of PG in venous testing. Streamlined PG technology, updated billing codes, and modernized venous testing guidelines are essential to re-establish the importance of PG in venous disease. Although PG outperformed the insufficiency US in this analysis, we believe both tests offer important insights into venous disease and are likely complementary in the assessment and ultimate treatment decisions for patients with CVI. 26
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
