Abstract
The rates of recurrent venous thromboembolism (RVTE) vary widely, and its causes still need to be elucidated. Statistical multivariate methods can be used to determine disease predictors and improve current methods for risk calculation. The objective of this study was to apply principal component analysis to a set of data containing clinical records of patients with previous venous thromboembolism and extract the main factors that predict recurrent thrombosis. Records of 39 factors including blood and lipid parameters, hereditary thrombophilia, antiphospholipid syndrome, clinical data regarding previous thrombosis and treatment, and Doppler ultrasound results were collected from 235 patients. The results showed that 13 principal components were associated with RVTE and that 18 of 39 factors are the important for the analysis. These factors include red blood cell, white blood cell, hematocrit, red cell distribution width, glucose, lipids, natural anticoagulant, creatinine, age, as well as first deep vein thrombosis data (distal/proximal,
Keywords
Introduction
Recurrent venous thromboembolism (RVTE) is an important subject in any medical center. Once the patient is diagnosed with a first venous thromboembolism (VTE), anticoagulation therapy is the prevention method of recurrence commonly used. However, the duration of anticoagulation therapy is a difficult decision and depends on multiple factors that interact with each other and need to be evaluated individually. 1 –5 Dash, 6 Vienna, 7 and Men and HERDOO-2 8 are 3 scores used to decide about anticoagulant prophylaxis by the risk of recurrence, and each one takes different factors in consideration.
Recent reviews showed that these score models have a strong limitation when predicting patients with low risk of RVTE and still lacks a complete validation. 9,10 Patients with provoked VTE are also not included in these models. There are some studies indicating that some patients with provoked VTE could benefit of prolonged prophylaxis 11 –13 and also that they have the same risk of recurrence when comparing those with a previous unprovoked event. 14 The scores also does not include patients with antiphospholipid antibody syndrome, cancer, natural anticoagulant activity deficiencies, and so on. 9,10
Thus, it is important for the development of prediction scores that include those individuals. To overcome these issues, sophisticated multivariate statistical techniques could be used to identify the main risk factors of RVTE for a wide range of patients.
Principal component analysis (PCA) is a multivariate statistical method that identifies patterns and classifies the factors that influences a given phenomenon. It is a technique widely used to identify patterns in the medical field. 15 –19 Principal component analysis is a decomposition of a set of correlated variables into a set of uncorrelated variables, named as principal components (PCs), which are organized by descending order of variance. Principal component analysis can also be used to reduce dimensionality by cutting the PCs that are less important (less variance) than the original data. The remaining PCs are useful to develop new models, and their loadings can be used to calculate the contribution of each factor in the collected data.
The aim of this study was to collect clinical data of several patients diagnosed with a first thrombotic episode and use PCA to identify the predictor factors for RVTE. To obtain a more comprehensive view of the risk factors, we included patients with provoked and unprovoked VTE, antiphospholipid antibody syndrome, cancer, and natural anticoagulant activity deficiencies.
Methods
Population
In this study, 261 consecutive patients with a first acquired or provoked VTE, assisted at outpatient clinic of Hemocentro Unicamp between January 2009 and August 2016, were followed. Mathematically, the occurrence of RVTE (or not) can be considered as a function of several risk factors. Acquired risk factors for VTE were pregnancy and postpartum, hormone therapy for contraception or hormonal replacement, surgery, trauma, air travel, hereditary thrombophilia, cancer, and antiphospholipid antibodies. Exclusion criteria were patients with missing data, thrombosis that occurred in other sites than pulmonary embolism (PE), lower limbs or central nervous system, and age under 18 years. After the first VTE, all patients were treated with anticoagulant therapy during the standard 6-month period. Then, every 6 months, each patient returns to the clinical center for check-up examinations.
Recurrent VTE was defined by the detection of a novel thrombotic episode at a different site from the primary event or a new episode occurred at the same localization of previous thrombosis. Thrombosis of the limbs was confirmed by duplex ultrasonography, thrombosis of the cerebral veins by magnetic resonance imaging, and PE by ventilation-perfusion scanning or magnetic resonance imaging.
Table 1 shows all the evaluated factors, which included clinical, thrombotic data, acquired and inherited risk factors for thrombosis, laboratorial parameters, and residual vein thrombosis detected by ultrasound Doppler. All patient data, blood samples, and examinations were collected between the first and the second thrombotic events in an average time of 1 month after the end of anticoagulation therapy. The 39 factors presented in Table 1 were used to perform the PCA. Since there are some factors that are not numerical data, they were converted to a numerical form as shown in Table 2. All numerical factors were used as is.
Factors Considered in the Present Study.
Abbreviations: FV, factor V; US, ultrasound; AT, Antithrombin; PS, Protein S.
Codification values for Nonnumerical Variables.
Abbreviations: FV, factor V; US, ultrasound.
Principal Component Analysis
The PCs are a new set of variables that are organized in order of its capacity to describe a given phenomenon. To calculate the PCs from an original data matrix
where X
Once the matrix
The importance of each PC can be determined by ordering the eigenvalues in a descending order, corresponding to the descending order of variance.
In this study, PCA was used to determine the main factors for RVTE. We considered only the PCs in which eigenvalue is greater than 1, which assures that the set of PCs are more important than the original variables. To compute the importance of each factor Z
where
Only those variables with
Results
From all 261 patients, 26 were excluded because they presented missing data, a first VTE in other sites than pulmonary, lower limbs or central nervous system, and age under 18 years. Therefore, 235 patients were included whereas 74 (31.4%) were men and 161 were women, age >18 and <82 years.
Table 3 summarizes the principal characteristics of the thrombotic episodes. In all, 49 (20.8%) patients presented recurrent VTE. The first episode was more predominant in the left leg (45.7%); 36% of the patients presented pulmonary embolism (PE), and only 2.1% presented thrombosis in the central nervous system. When analyzing RVTE events, the percentages were almost the same: 46.0%, 30.0%, and 2.0% for patients who presented RVTE in the left, right, and both legs, respectively. In all, 22% presented PE as RVTE as well as 2.0% in the central nervous system. Of all patients, 39% presented a provoked first thrombosis, and 40% of these patients had RVTE, corresponding to 16% of all recurrences. This is an interesting result, since recurrence in patients with a provoked VTE is expected to be low. Cancer was presented in 13 patients. Three of them had provoked VTE, and only 1 presented RVTE.
Main Characteristics of the Thrombotic Events.
Abbreviations: RVTE, recurrent venous thromboembolism; VTE, venous thromboembolism.
The RVTE events were not in the same location for all patients. Table 4 shows the relation the VTE/RVTE location for all patients who presented RVTE. It can be seen that the thrombotic events repeat more frequently in the same location for patients who presented VTE in the left leg: Of 24 VTE patients who presented VTE in the left leg, 16 presented RVTE in the same location. For other events, this observation is not the same.
Relation Between VTE and RVTE Events.
Abbreviations: PE, pulmonary embolism; RVTE, recurrent venous thromboembolism; VTE, venous thromboembolism.
In this study, we used all gathered data to calculate the eigenvalues and the eigenvectors required to obtain the PCs, and then using Equation 3, to determine the main factors for recurrent VTE. Since we considered 39 factors, 39 PCs were generated. The eigenvalues obtained and the cumulated variance for each PC is shown in Figure 1.

Principal component analysis (PCA) results. A, Eigenvalues. B, Cumulated variance.
The results indicated that 26 of the 39 PCs correspond to 90.37% of the overall variance. In Figure A, the line y = 1 corresponds to the marginal value below which a PC accounts for less variance than the original ones. So, only the PCs with eigenvalues >1 were considered. Thirteen PCs satisfy this condition, so we believe that they are sufficient to describe RVTE, corresponding to 67.34% of the total variance (horizontal line in Figure 1B).
The correlations calculated by Equation 3 between the factors and the 13 PCs considered are presented in the Supplementary Information (Table S1). A factor can be considered highly correlated with a PC, therefore an important factor, when its correlation modulus is higher than 0.5. The correlations for the other PCs (13-39) were not presented, since they were discarded. The main factors for RVTE found in this work are presented in Table 5 in order of importance.
Main Factors for RVTE Determined by PCA.
Abbreviations: HB, hemoglobin; HCT, hematocrit; HDL, high-density lipoprotein; LDL, low-density lipoprotein; PC, principal component; PCA, principal component analysis; RBC, red blood cell; RDW, red blood cell distribution width; RVTE, recurrent venous thromboembolism; VTE, venous thromboembolism; WBC, white blood cell; AT, Antithrombin; PS, Protein S.
Principal component 1 accounted for 13% of overall variance, including red blood cell (RBC) count, hematocrit (HCT), triglycerides, glucose, creatinine, cholesterol, high-density lipoprotein (HDL), and low-density lipoprotein (LDL). The former 3 factors presented much higher correlations with PC1, indicating that they have strong influence in this PC.
Principal component 2 accounted for 9% of the overall variance, including age, protein C, protein S, and antithrombin levels. The third component accounted for 8% of the overall variance, including RBC, hemoglobin (HB), and HCT.
The other highly correlated variables were white blood cell (WBC) and red cell distribution width (RDW; PC-5: 5.7% of overall variance),
Discussion
Based on the results, one can relate each PC with a determined cluster of variables. As we see, PC-1 is characterized by RBC, total cholesterol, LDL and HDL cholesterol, triglycerides, glucose level, and creatinine level. Principal component 2 relates with age and natural anticoagulants levels. The main factors in PC-3 are RBC parameters. The size distribution of RBC and WBC count are the main factors in PC-5. The other PCs presented only 1 factor as important, so they are mainly characterized by them.
It is well-known that the factors for RVTE identified by PCA in this work are related to thrombotic events 3,13,20 as transient or persistent risk factors such as recent surgery, major trauma, pregnancy, puerperium, hormone replacement therapy, and oral contraceptive use and cancer. 21 However, since RVTE is a multifactorial disease, the exact mechanism and interactions between these factors still need a better elucidation.
Principal Component 1
The main factors given by PC-1 corroborate recent publications regarding the association of RBC, lipids, glucose, and creatinine levels with recurrent VTE. From Table S1, one can see that cholesterol, HDL, and LDL had higher correlation modulus than the others. This result means that they highly influence PC-1.
The role of the lipids in VTE is still uncertain, and contradictory results were reported. 22,23 It was shown that HDL influences the extrinsic coagulation pathway, the protein C cascade, the fibrinolysis, and also reduces blood viscosity. Therefore, it has antithrombotic properties. On the other hand, triglycerides have procoagulant effect, suppressing the tissue plasminogen activator activity and increasing plasminogen activator inhibitor-1. 24 A recent prospective study 25 showed that there was no association between these lipid level and recurrent VTE. Eichinger et al 26 reported that lower levels of HDL were associated with recurrent VTE. In both studies, blood samples were collected after 3 months of discontinuation of anticoagulation. Our results showed that patients with recurrent VTE presented higher levels of total cholesterol (193.29 ± 44.25 mg/dL vs 184.90 ± 49.19 mg/dL), HDL (49.29 ± 14.91 mg/dL vs 46.41 ± 14.45 mg/dL), and LDL (114.38 ± 35.49 mg/dL vs 105.56 ± 35.04 mg/dL) than those without recurrence. On the other hand, the level of triglycerides was lower (148.52 ± 60.00 mg/dL vs 156.04 ± 152.66 mg/dL). It is important to mention that when looking at the median, it can be observed that the values for HDL are 45 mg/mL and 46 mg/mL for patients with and without RVTE, respectively. Moreover, for triglycerides, the median is 137 mg/dL and 122 mg/dL for patients with and without RVTE, respectively. Both results indicate that the statistical distribution of these factor values is in agreement with those reported in the literature.
The RBC gained importance in thrombotic events only in the last decades, when its effects in blood rheology and endothelium interactions were taken into account as a venous thrombotic risk factor. 27 –29 A recent review 30 showed that the RBC influence on recurrent thrombosis can be difficult to interpret but that antithrombotic RBC targets can be developed. Yu et al 31 and Marchioli et al 32 stated that high levels of RBC is a potential risk factor for thrombotic events. Another proposed mechanism suggest that RBC can interact with the vessel walls, enhance platelet aggregation and activation, contribute to thrombin generation, and bind with fibrinogen contributing to thrombus size. 30 Our results showed that patients who presented recurrent VTE showed higher number of RBC (4.89 × 106 ± 0.62 × 106/µL) than those without recurrence (4.61 × 106 ± 0.62 × 106/µL), which is in agreement with cited studies.
Another factor of blood viscosity is the HCT. It was reported that increase in HCT levels in the venous system can decrease the blood flow due to the increase in its viscosity, thus favoring clot formation, 33,34 platelet adhesion, and accumulation at the subendothelial cells. 30 Indeed, higher HCT levels were previously correlated with thrombotic events. 20 As far as we know, there is only 1 study that reported HCT level as a predictor of recurrent VTE. The study of Eischer et al 35 showed that patients with RVTE presented higher HCT levels (3 months after discontinuation of anticoagulation) than those without recurrence. However, higher levels of HCT correlated with recurrent VTE in women and not in men. Our results showed that HCT level in patients with RVTE were higher (42.16% ± 4.39%) than those without (40.70% ± 4.95%).
The role of glucose level in RVTE is still uncertain and as far as we know, there are no studies that analyzed its effects on it. High glucose levels can trigger the coagulation system in healthy men and in patients with diabetes mellitus. 36,37 For this reason, we choose to include this factor in the study. Our results demonstrated that patients with RVTE presented lower glucose levels than those without recurrent episodes (93.12 ± 21.27 mg/dL vs 105.23 ± 51.17 mg/dL, respectively). Besides, this is not an expected result, recurrence is multifactorial, and the interaction with other factors could influence the thrombus formation.
The role of creatinine level on RVTE is still unknown. Shlipak et al
38
reported that higher creatinine levels are accompanied with the increase in factor VII, factor VIII, and
Principal Component 2
Age and coagulation proteins are the main identifiers of PC-2, and, as shown Table S1, their importance on the PC is similar. Age is a well-known important factor for thrombosis. Several research indicated that the thrombosis affects 0.01% of the population before the age of 40, and 0.7% of the population between the ages 45 and 55. In addition, the morbidity is higher as age increases. 4,39,40 Hansson et al 13 and Farzamnia et al 3 showed that age has no direct relation between age and a higher risk of recurrence. White et al 41 reported that recurrence was more common in younger patients, while Beyth et al 42 found that this was observed in patients aged 65 and younger. On the other hand, Heit et al, 2 Eichinger et al, 26 and Galanaud et al, 43 reported that increasing age was related to a higher risk of recurrence. Our results showed that patients with RVTE presented similar average age (43.40 ± 14.82 years) against (44.05 ± 16.07 years) those without recurrence.
Deficiency of natural anticoagulants can trigger hypercoagulability and a thrombotic event. 36,44 –47 Rosendaal and Reitsma 47 reported that these deficiencies are a strong risk factor for VTE. However, the authors affirm that this is not necessarily true for RVTE. De Stefano et al 48 indicated that patients with these deficiencies have an increased risk of recurrent VTE. Our results showed that antithrombin, protein C, and protein S activity were important factors, as patients with RVTE presented lower levels of these proteins when compared to those without recurrence (Protein C: 106.26 ± 28.39 mg·dL−1; Protein S: 90.69 ± 23.83 mg·dL−1; antithrombin: 105.69 ± 16.11 mg·dL−1 vs Protein C: 117.87 ± 23.60 mg·dL−1; Protein S: 92.41 ± 23.44 mg·dL−1; antithrombin: 106.22 ± 15.57 mg·dL−1, respectively). These results are very interesting as even without a classical deficiency, we showed that lower levels are important for RVTE. Our findings are in agreement Xu et al 49 who proposed a mathematical model for the coagulation cascade and showed that lower levels of protein C is associated with thrombus formation.
Principal Component 3
Principal component 3 is characterized by the RBC parameters: RBC, HB, and HCT. Both RBC and HCT were also important factors in PC-1. Since each PC is calculated as a function of all original factors, one factor can be important in more than one PC due the multidimensionality of the problem. However, their importance in PC-3 is higher. As we previously discussed, RBC and HCT can be associated with VTE and RVTE. Brækkan et al 20 found a direct relation between HB at the admission and VTE: The higher the HB levels, the higher the risk of VTE. Our results showed that patients with recurrence presented similar levels of HB (13.99 ± 1.44 g/dL) when compared to those without (13.54 ± 1.77 g/dL). As far as we know, there is no study that evaluated the influence of HB levels in RVTE. However, it is known that HB molecules released from damaged RBCs enhance platelet activation and aggregation. 30
Principal Component 5
Both RDW and WBC are the major descriptors of PC-5. The leukocyte number and the RDW have been related to thrombotic events. 50 –52 However, their role in RVTE is still not well established. Some studies showed that high WBC count is related to RVTE. 53 –55 On the other hand, the study of Rezende et al 56 reported that those with higher risk of VTE presented lower levels of WBC and higher levels of RDW after VTE. The authors also suggested that other studies should be performed to investigate the relationship between these variables with RVTE. Our results showed that patients with VTE recurrence presented lower levels of WBC (6.89.103 ± 2.69.103/µL) and higher levels of RDW (14.17 ± 1.96%) than those without recurrence (WBC: 7.94.103 ± 3.86.103/µL; RDW: 13.59 ± 1.49%).
Remaining PCs
According to Galanaud et al, 43 the influence of the location of VTE on the leg for RVTE is important. The authors concluded that those with proximal VTE presented more recurrent thrombotic events than those with distal VTE. Boutitie et al 64 and Hansson et al 13 reported similar results. Our results showed that 83% of patients with distal VTE presented recurrence against 16% of those with proximal VTE.
Time of anticoagulation therapy is one of the major factors that can lead or prevent RVTE. 65 This is a subject of debate, and its recommendations can be found in the American College of Chest Physicians Guidelines. 66 Boutitie et al 64 reported that patients treated for 6 months or more presented lower RVTE events than those treated during 3 months. Also, that patients treated for 1 or 1.5 months presented higher risk of RVTE than those treated for 3 months. In a recent study, Agnelli et al 11 evaluated the extended anticoagulation therapy with apixaban and reported that the extended therapy reduced the risk of recurrence without increasing the risk of bleeding. Similar results are reported in Investigators. 12 Studies that analyze the time of anticoagulant after a first recurrence can also be found in the literature. 67,68 Our results showed that patients with RVTE were treated during a longer time than those without recurrence (12.72 ± 18.51 months vs 9.88 ± 11.50 months). This result was somewhat unexpected, since patients with less time of anticoagulation therapy is supposed to have higher probability of RVTE. However, when comparing only patients with RVTE (data not shown), it could be observed that RVTE occurred mainly in patients treated during shorter periods.
The results obtained in this work showed that RVTE, a priori, can be predicted using clinical parameters and data from the previous thrombosis. It is important to mention that some variables were expected to be within the main factors, but they were not indicated by PCA, since only variables with correlation above 0.500 usually are considered as important.
The first one is the provoked/unprovoked VTE factor. Our results showed that 16% of all patients with RVTE had a first provoked VTE. On a daily basis, patients with provoked VTE are usually treated during 3 to 6 months, and then the anticoagulation therapy is ceased. However, a different decision could avoid such a high recurrence rate. This factor is reported in Supplementary Table S1 with a correlation of 0.493 in PC-4, which is very close to 0.500. Due to its importance to RVTE, this factor could be included as input in a prediction model. Cancer is another variable expected to be in the main factors list. However, only 13 patients of 235 presented this disease. Therefore, its influence is too small when compared to the other 38 variables. This can be confirmed by its correlation coefficients in Supplementary Table S1, which are smaller than 0.360. A future study should address only these patients.
Main Limitations
The limitations for this study are mainly due the sample size, which is small considering the eligible population. However, for determination of the main factors using PCA, it was considered to be sufficient. The results showed that the important predictors of RVTE are in agreement with previous studies and are simple to obtain in clinical routine. Future work should consider another population for validation purposes.
Conclusion
In this work, PCA was applied in order to identify the main factors that predict recurrent deep vein thrombosis. Our results showed that 13 PCs are sufficient to describe the phenomenon and that 18 of 39 factors are the important for RVTE analysis. The important factors include blood cells, lipids, coagulation proteins levels, renal disease, age, as well as previous VTE parameters. This work showed that simple parameters, obtained directly from blood measurements, could be used to develop a model to predict the rates of RVTE.
Supplemental Material
Supplemental Material, Supporting_Information - Principal Component Analysis on Recurrent Venous Thromboembolism
Supplemental Material, Supporting_Information for Principal Component Analysis on Recurrent Venous Thromboembolism by Tiago D. Martins, Joyce M. Annichino-Bizzacchi, Anna V. C. Romano and Rubens Maciel Filho in Clinical and Applied Thrombosis/Hemostasis
Footnotes
Authors’ Note
Each author contributed to the development of the manuscript, reviewed, and commented on each draft, and approved the final draft. R. Maciel-Filho and J. M. Annichino-Bizzacchi contributed to the study protocol. J. M. Annichino-Bizzacchi and A.V.C. Romano contributed to data acquisition. T. D. Martins and R. Maciel-Filho contributed to study design. T. D. Martins, J. M. Annichino-Bizzacchi, A.V.C. Romano, and R. Maciel-Filho contributed to data interpretation and discussion. Ethical approval to report this case was obtained from Comite de Ética da Universidade Estadual de Campinas (Approval number: 88970218.0.0000.5404). Informed consent for patient information to be published in this article was not obtained because this was a retrospective study. No new blood samples were taken and requesting the presence of all patients would make the study unfeasible.
Acknowledgments
The authors thank Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), which funded this study. Process number: 2016/14172-6.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funded by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP). Process number: 2016/14172-6.
Supplemental Material
Supplemental material for this article is available online.
