Abstract
There are no biomarkers that differentiate cardioembolic from large-vessel atherosclerotic stroke, although the treatments differ for each and ~30% of strokes and transient ischemic attacks have undetermined etiologies using current clinical criteria. We aimed to define gene expression profiles in blood that differentiate cardioembolic from large-vessel atherosclerotic stroke. Peripheral blood samples were obtained from healthy controls and acute ischemic stroke patients (< 3, 5, and 24 h). RNA was purified, labeled, and applied to Affymetrix Human U133 Plus 2.0 Arrays. Expression profiles in the blood of cardioembolic stroke patients are distinctive from those of large-vessel atherosclerotic stroke patients. Seventy-seven genes differ at least 1.5-fold between them, and a minimum number of 23 genes differentiate the two types of stroke with at least 95.2% specificity and 95.2% sensitivity for each. Genes regulated in large-vessel atherosclerotic stroke are expressed in platelets and monocytes and modulate hemostasis. Genes regulated in cardioembolic stroke are expressed in neutrophils and modulate immune responses to infectious stimuli. This new method can be used to predict whether a stroke of unknown etiology was because of cardioembolism or large-vessel atherosclerosis that would lead to different therapy. These results have wide ranging implications for similar disorders.
Introduction
Ischemic stroke can be classified as cardioembolic, large-vessel atherosclerotic, lacunar, other, or undetermined cause (Adams et al, 1993). The differentiation of these types of stroke currently depends on clinical judgment inferred from patient history, symptoms, and laboratory evidence of potential sources of thromboembolism. The clinical diagnosis of etiology can be highly specific, but the sensitivity is modest (Ferro, 2003). As a result, even with a thorough evaluation, the etiology remains undetermined in 30% or more stroke and TIA (transient ischemic attack) patients.
Etiologic diagnosis is critical to develop an appropriate secondary prevention plan (Caplan and Manning, 2006). For example, cardioembolic stroke owing to atrial fibrillation (AF) typically requires oral anticoagulants such as warfarin to decrease the recurrent stroke risk (McCabe and Rakhit, 2007). Cardioembolic stroke owing to bacterial endocarditis requires antibiotic treatment. For large-vessel atherosclerotic stroke, carotid endarterectomy is performed for significant carotid stenosis and aspirin, and newer antiplatelet agents are used to decrease the risk of recurrent stroke (Ocava et al, 2006).
Ischemic stroke produces inflammation both locally in the brain and systemically in leukocytes (Danton and Dietrich, 2003). Recently, we showed that gene expression changed rapidly in blood of experimental animals after stroke (Tang et al, 2001, 2003). Moore et al (2005) and our group have confirmed that a large number of genes change expression in the peripheral blood of humans as early as 3 h after ischemic stroke (Tang et al, 2006b). However, it is not known if gene expression profiles in blood differ with different etiologies of ischemic stroke.
We reasoned that the pathogenesis of a specific subtype of ischemic stroke might be associated with unique gene expression pattern in blood leukocytes and that this could be used as a surrogate marker for the etiologic diagnosis of ischemic stroke. Therefore, we examined gene expression in the blood of acute ischemic stroke patients and show that different gene expression profiles exist for patients with cardioembolic compared with large-vessel atherosclerotic stroke.
Materials and methods
Human Subjects
Acute ischemic stroke patients were enrolled in the CLEAR trial (under identifier NCT00250991 at Clinical-Trials.gov). The Institutional Review Boards at the participating institutions approved the study protocols and consent forms. Stroke patients were diagnosed clinically and computed tomography brain scans performed to exclude hemorrhage. After informed consent, patients were randomized to receive either standard-dose r-tPA (recombinant tissue plasminogen activator) or a combination of eptifibatide and low-dose r-tPA in a 1:3 ratio in a double-blinded protocol within 3 h of the onset of stroke. Blood (15 mL) was drawn from each patient into PAXgene tubes (PreAnalytiX, Hilden, Germany) before the treatment (< 3 h samples), approximately 2 h after the thrombolysis treatment (5 h samples) and 24 h after the stroke onset (24 h samples). A total of 45 blood samples were collected from the 15 stroke patients.
The etiologies of the strokes were assessed using the TOAST (The Trial of ORG 10172 in Acute Stroke Treatment) criteria (Adams et al, 1993) and classified as large-vessel atherosclerotic stroke, cardioembolic stroke, and stroke with undetermined etiology. Patients with small artery lacunar stroke or other special etiologies were excluded from this study. The TOAST criteria have been the most widely used standards for ischemic stroke subtype classification according to underlying pathophysiologic mechanisms based on comprehensive information, including clinical features, imaging findings, and ancillary diagnostic test results. Previous research has shown that this etiologic classification system can achieve a very good agreement among individual clinicians, especially for the two main categories of ischemic stroke: large-vessel atherosclerotic stroke and cardioembolic stroke (Adams et al, 1993; Gordon et al, 1993). Briefly, large-vessel atherosclerotic stroke should have evidence of greater than 50% stenosis, or occlusion of an ipsilateral extracranial or intracranial artery from carotid imaging findings and the clinician should have excluded any obvious source of cardiac embolus. Cardioembolic stroke should have at least one obvious source of cardiac embolus and have eliminated potential large artery atherosclerotic thrombosis or embolism. Both categories only included those cortical or cerebellar lesions and brain stem or subcortical hemispheric infarcts larger than 1.5 cm in diameter based on brain imaging. All patients should have all other causes of stroke excluded, such as nonatherosclerotic vasculopathies, hypercoagulable states, or hematologic disorders. This was accomplished using blood tests, Doppler or magnetic resonance angiography or arteriography, past medical history, and other tests as indicated. The diagnosis of undetermined etiology arose either because of multiple etiologies identified or no etiology identified.
Sixteen control peripheral blood samples were drawn from 8 healthy volunteers. Each volunteer contributed two independent blood samples 1 day apart. Healthy controls had no history of cardiovascular or cerebral vascular disease, recent infection, or hematologic disease.
Sample Processing
Whole blood (15 mL) was collected into six PAXgene tubes via antecubital fossa venipuncture from each subject. PAXgene tubes were frozen at −80°C after 2 h at room temperature. Total RNA was isolated according to the manufacturer's protocol (PAXgene blood RNA kit; PreAnalytiX). The RNA is from polymorphonuclear cells (neutrophils, basophils, and eosinophils), mononuclear cells (lymphocytes and macrophages/monocytes), platelets, and red blood cell precursors.
Microarray Hybridization
RNA samples were labeled, hybridized, and scanned according to standard Affymetrix Protocols (Affymetrix Expression Analysis Technical Manual; Affymetrix, Santa Clara, CA, USA). Total RNA (10 μg) was labeled using the One-Cycle Target Labeling protocol. Affymetrix Human U133 Plus 2.0 Arrays, which contain more than 54,000 probe sets, were used for each RNA sample. Because the CLEAR trial is ongoing, reverse transcription-PCR will be performed at the conclusion of the trial.
Microarray Data Analysis
Probe-level data were saved in Affymetrix .cel files and summarized with Robust Multi-array Average (RMA) software (http://www.bioconductor.org/). Expression values in each sample are then normalized to the means of the expression values from healthy controls. One array from one healthy control has been found to be an outlier as its average intensity was significantly different from other arrays and was eliminated. Statistical analyses including a one-way analysis of variance (ANOVA) and Student's t-test were performed using Genespring software (Silicon Genetics, Redwood City, CA, USA). The Benjamini—Hochberg FDR (false discovery rate) was used to control for multiple comparisons with a 5% FDR being considered significant. Different fold change filters were applied to minimize type 2 error. Demographic data were analyzed with Student's t-test or Fisher's exact test. The methods have been described in detail in our previous studies (Tang et al, 2006b).
Results
Demographic Data for Cardioembolic and Large-Vessel Atherosclerotic Stroke Patients
There were no significant age, gender, or race differences and no differences in history of hypercholesterolemia or hypertension between the groups (Table 1). As expected, many of the cardioembolic stroke patients had a history of heart disease, whereas none of the large-vessel atherosclerotic stroke patients did. Some of the other potential confounding factors are examined below.
Demographic comparison between cardioembolic and atherosclerotic stroke patients
Differential Expression Profiles of Cardioembolic Stroke and Large-Vessel Atherosclerotic Stroke
Twenty-one samples from cardioembolic stroke were compared with twelve samples from large-vessel atherosclerotic stroke to find the genes that show consistent expression differences within 24 h between the two subtypes of ischemic stroke. The one-way ANOVA (FDR < 0.05, Student—Newman—Keuls post hoc test, equal variance) yielded 666 genes, which were differentially regulated between cardioembolic and large-vessel atherosclerotic stroke. The Student t-test (P < 0.001) yielded 135 genes. Of these 135 genes identified using the t-test, 95 were also identified using the one-way ANOVA.
A total of 77 genes have at least 1.5-fold expression differences between cardioembolic stroke and large-vessel atherosclerotic stroke (significant using ANOVA or t-test; fold change > 1.5) (Supplementary Table 1). A Pearson cluster analysis using these 77 genes showed segregation of control samples and cardioembolic stroke samples compared with large-vessel atherosclerotic stroke samples at each of the times after stroke (Figure 1). This also held true whether cluster analysis was performed using the 666 genes from the ANOVA analysis or the 135 genes from the t-test (data not shown).

Expression heat map of genes differentially regulated between cardioembolic and large-vessel atherosclerotic stroke. X axis shows each condition and Y axis shows individual genes. The color coding indicates gene expression intensity, with red being high and green being low. The 77 genes were significant using ANOVA or t-test, and each gene changed at least 1.5-fold.
We next examined the effect of possible confounding factors on the identified etiology-related genes. Of the 54 genes identified as race-related (white/black, t-test, FDR < 0.05, fold change > 1.5), none are among the 77 etiology-related genes (Tang et al, 2006b). Of the 424 genes identified as being regulated by aspirin in patients on aspirin before the stroke, only 5 were among the 77 etiology-related genes (6.5%) (Tang et al, 2006b). Regarding the potential effect of stroke severity, the NIHSS score is not significantly different between cardioembolic stroke and atherosclerotic stroke. Of the 82 genes weakly correlated with NIHSS score (permutation P-value larger than 0.05) (Tang et al, 2006b), only 1 gene is also in the list of 77 etiology-related genes. Another concern is the randomized, differential thrombolysis treatment arms among patients enrolled in CLEAR trial (tPA versus tPA plus eptifibatide). However, this is probably not a confounding factor, because the etiology-related genes identified here show their expression differences between the two etiologies early on even before any thrombolysis treatment (< 3 h).
Etiology Prediction of Strokes of Undetermined Etiology
Prediction analysis of microarrays was used to determine the minimum number of genes that differentiated cardioembolic from atherosclerotic stroke. Prediction analysis of microarrays employs the shrunken nearest centroid algorithm to find the most reliable genes that differentiate two or more classes (Tang et al, 2006b; Tibshirani et al, 2002). Prediction analysis of microarrays identified a minimum of 23 genes that best differentiated cardioembolic from atherosclerotic stroke. A 10-fold crossvalidation with a leave-one-out approach showed that these 23 genes correctly classified 32 of 33 samples from subjects with known causes of stroke (Table 2; Supplementary Figures 1 and 2). Prediction analysis of microarrays and the minimum set of 23 genes were then used to predict the etiology of the subjects whose cause of stroke could not be determined based on clinical TOAST criteria. Prediction analysis of microarrays classified all 12 of the unknown samples as being cardioembolic stroke with probabilities more than 90% (92.9% to 99.9%; Figure 2). This prediction agrees with the results from a Pearson cluster analysis using the genes identified both by ANOVA and t-test (Supplementary Figure 3). All of the samples with an unknown etiology have an expression pattern similar to cardioembolic stroke, but not atherosclerotic stroke or healthy controls (Supplementary Figure 3).
Prediction of stroke samples of known etiologies with 23 genes by crossvalidation

Etiologic prediction of stroke samples with undetermined cause using prediction analysis of microarrays with 23 genes. X axis shows individual samples from the three time points for each patient. Y axis is the predicted probability for each sample to be either cardioembolic stroke (red squares) or large-vessel atherosclerotic stroke (blue squares).

Functional comparison of cardioembolic stroke-specific genes versus atherosclerotic stroke-specific genes. The Function Rank score (Y axis) is a log-transformed P-value: the higher the score, the more significant the functional pathway (shown on X axis).
Function Analysis of Cardioembolism and Large-Vessel Atherosclerosis-Regulated Genes
The differentially regulated genes between cardioembolic and atherosclerotic strokes were then classified by cluster analysis. Two subgroups of genes were identified. One is mainly regulated in cardioembolic stroke relative to healthy control (281 genes with a 1.2-fold filter or 148 genes with a 1.3-fold filter) (Supplementary Figure 4A). The other is mainly regulated in atherosclerotic stroke relative to healthy control (84 genes with a 1.2-fold filter or 63 genes with a 1.3-fold filter) (Supplementary Figure 4B). The functions of the two lists of etiology-specific genes were then explored in the NextBio System (Cupertino, CA, USA), a web-based data search and analysis engine. The genes were ranked according to fold change and then queried against GO, KEGG pathways, and REACTOME databases in NextBio with P-value 0.05 as cutoff. This analysis found that the genes regulated by cardioembolic stroke were mainly involved in response to pathogens, including immune cell activation, defense response, proliferation, and apoptosis (Figure 3). In contrast, the genes mainly regulated by atherosclerotic stroke were related to hemostasis, cytokines, and chemokines (Figure 3). For many of the functional pathways represented, there is little overlap between the pathways related to atherosclerotic and cardioembolic stroke.

Expression activities of etiology-specific genes across subtypes of blood cells. (
Comparison of all the studies deposited in the NextBio System (currently 6,000 studies) with our study, showed that atherosclerotic stroke-specific expression profiles shared the most genes with inflammatory bowel disease (28 genes in common with Crohn's disease, 14 genes in common with ulcerative colitis) (NCBI GEO Series GSE3365), and rheumatoid arthritis (26 genes in common) (NCBI GEO Series GSE4588). Conversely, cardioembolism-specific expression profiles were most similar to those from patients with sepsis and septic shock (105 genes and 109 genes in common at day 1, respectively) (NCBI GEO Series GSE4607).
Potential Cell Type Sources of the Stroke Etiology-Regulated Genes
The genes regulated in cardioembolic stroke or large-vessel atherosclerotic stroke versus control might be expressed in only certain cell types in blood. Using our reference gene list derived from healthy controls (Du et al, 2006), most of the genes regulated in atherosclerotic stroke appear to be expressed in platelets and monocytes, with only a few expressed in lymphocytes (CD8 T cells and natural killer cells) or neutrophils (Figure 4A). In contrast, most of the genes regulated in cardioembolic stroke appear to be expressed in neutrophils, with a few expressed in monocytes or lymphocytes (Figure 4B).
Discussion
This is the first report of differences of gene expression in the blood of patients with cardioembolic versus large-vessel atherosclerotic stroke. The data suggest that it might be possible to determine the etiology of stroke based on gene expression in blood. This would be most useful for the 30% or more of patients with stroke and TIAs in whom the etiology cannot currently be determined. In our case, the four ischemic stroke patients with undetermined etiology were predicted to be cardioembolic stroke. This could prompt a more thorough cardiac evaluation and a consideration of anticoagulant therapy in these subjects, and/or could serve as the basis for future trials to evaluate such an approach.
Insights into Molecular Mechanisms of Large-Vessel Atherosclerotic Stroke
The gene changes not only provide biomarkers for strokes of different etiology, but also show etiology-specific molecular events. Genes regulated in large-vessel atherosclerotic stroke are expressed mainly in platelets and monocytes, supporting current literature showing that both play important roles in the pathophysiology of atherosclerosis. During the initiation of atherosclerosis, monocytes infiltrate the activated endothelium through chemotaxis and become foam cells, a classic feature of atherosclerosis (Libby, 2005). Platelets can interact with dysfunctional, intact endothelium to promote monocyte chemotaxis by release of chemokines (i.e., PF4 and RANTES) and expression of adhesion molecules (i.e., P-selectin). Platelets also promote migration and proliferation of smooth muscle cells via platelet-derived growth factor around plaque (Huo and Ley, 2004). Our data show regulation of related genes in atherosclerotic stroke including the platelet-derived chemokines PPBP, PF4, and PDGFA. The receptor for platelet-derived chemokine RANTES, CCR5 (chemokine (C—C motif) receptor 5), was also upregulated.
Although rupture of atherosclerotic plaque and thrombus formation is the main mechanism of acute coronary ischemia, its role in carotid atherosclerosis and cerebral ischemia has been controversial because the degree of carotid stenosis correlates best with stroke risk (Caplan and Manning, 2006). Recent studies, however, suggest that thrombus formation on unstable plaque in the carotid arteries is important in carotid stenosis-mediated cerebral ischemia (Rothwell, 2007; Spagnoli et al, 2004). This is consistent with our results that show platelet-derived hemostasis genes being significantly regulated in atherosclerotic but not cardioembolic stroke. These genes include platelet glycoprotein IIb of IIb/IIIa complex (ITGA2B), platelet glycoprotein IIIa of IIb/IIIa complex (ITGB3), β-thromboglobulin (PPBP), platelet factor-4 (PF4), thrombospondin-1 (THBS1), osteonectin (SPARC), platelet-derived growth factor-α (PDGFA), and coagulation factor XIIIa (F13A1). Contrary to previous views, platelets and particularly newly formed platelets express many genes/mRNAs, and corresponding proteins are found for 69% of the genes (McRedmond et al, 2004).
The finding that platelet-derived hemostasis genes do not change much in cardioembolic stroke compared with large-vessel atherosclerotic stroke and controls was unexpected. This may relate to white, platelet-rich thrombi where platelets are crosslinked by fibrinogen or fibrin; and red, fibrin-rich thrombi where red blood cells are entrapped in fibrin networks. Pathology shows white platelet-rich thrombi on atherosclerotic plaques and red fibrin-rich cell thrombi in diseased atria and ventricles (Caplan and Manning, 2006; Mohr and Sacco, 1992). The former (white) are formed by activation and crosslinking of platelets that adhere to damaged endothelium. The red thrombi result from activation of coagulation cascade by tissue factors, followed by cleavage of soluble fibrinogen to form fibrin.
Subjects with large-vessel atherosclerotic stroke have higher protein expression of platelet CD62, CD63, and thrombospondin (Zeller et al, 1999), increased fibrinogen-bound platelets (Yamazaki et al, 2001), and increased platelet-secreted PPBP and PF4 in plasma compared with cardioembolic stroke (Tombul et al, 2005). Activation of glycoprotein IIb/IIIa and other platelet receptors are observed in atherosclerotic stroke patients (Shimizu et al, 2006). Platelet aggregation stimulated by ADP, collagen, or shear was enhanced and more leukocyte-platelet complexes formed in atherosclerotic stroke (Cha et al, 2004).
Equally convincing evidence for more significant platelet involvement in large-vessel atherosclerotic stroke comes from clinical trials of antiplatelet versus anticoagulant therapy. Anticoagulants, like warfarin, are superior to aspirin for the secondary prevention of recurrent cardioembolic stroke (McCabe and Rakhit, 2007). However, antiplatelet drugs including aspirin are equally effective as anticoagulants for treatment and secondary prevention of recurrent atherosclerotic stroke with the advantage of less hemorrhage than anticoagulants (Ocava et al, 2006). This suggests that, although the coagulation cascade may participate in large-vessel atherosclerotic stroke and cardioembolic stroke, platelet activation plays a major role in atherosclerotic stroke.
Insights into Molecular Mechanisms of Cardioembolic Ischemic Stroke
Perhaps the most unexpected finding in this study was a gene expression profile in cardioembolic stroke that was similar to the immune responses to pathogens, predominantly associated with neutrophils. Although unexpected, infection has been suspected to trigger ischemic stroke (Elkind, 2007). There is an association of stroke within 1 week of infection (Grau et al, 1998). Up to 25% to 35% of ischemic strokes have had a history of recent infection in the preceding month (Macko et al, 1996). The risk of stroke in the 3 days after upper respiratory infection or urinary infection is three times higher than noninfected subjects (Smeeth et al, 2004). Specifically, recent infection is reported to significantly increase risk for cardioembolic stroke (Grau et al, 1998, 1995). Indeed, vaccination against viral influenza decreases stroke incidence by more than 50%, and decreases cardiovascular risk and mortality (Gurfinkel et al, 2002).
Infection can participate in the pathologic mechanism of cardiac disease. Most of the cardioembolic stroke patients enrolled in our study had AF. Recent studies suggest that infection can be associated with AF (Pan et al, 2006; Tang et al, 2006a). There is inflammation in atrial tissue from AF patients and animals models (Frustaci et al, 1997). Increased plasma inflammatory markers, such as CRP (C-reactive protein) and IL6 (interleukin-6) in AF patients, negatively correlate with cardioversion rate or future occurrence of AF. There is persistence of CRP, IL6, and TNF-α (tumor-necrosis factor-α) in paroxysmal AF long after successful cardioversion, perhaps indicating a causative role (Aviles et al, 2003). Increases of IL6, CRP, and the white blood cell count correlate with onset of AF after cardiac surgery (Bruins et al, 1997). Urgently admitted AF patients often have pneumonia (Lip et al, 1995). Antiinflammatory agents reduce the recurrence of AF, including statins (HMG-CoA inhibitors), angiotensin-converting enzyme inhibitors, angiotensin-2 receptor blockers, and glucocorticoids (Boos et al, 2006). It is notable that although all of the samples of the subjects with unknown cause of stroke are identified as having cardioembolic stroke in this study, none of the subjects with unknown causes of stroke had AF or a history of AF.
Infection also promotes cardiac thrombus formation (Boos et al, 2006; Conway et al, 2004). Thromboembolism complications in sepsis patients are common. Infection activates the inflammatory response and coagulation factors, and antagonizes natural anticoagulation mechanism (Aird, 2005). Stroke patients with recent infection also have increased C4b-binding protein, which binds protein S and decreases anticoagulant protein C activity (Macko et al, 1996). Bacteria can act as a base for thrombus growth and the thrombi may form shelters for bacterial adhesion and proliferation (Baumgartner and Cooper, 1996). Even subclinical infection can produce measurable coagulation activation (Elkind, 2007).
The results suggest that specific platelet functions differ in large-vessel atherosclerotic stroke compared with cardioembolic stroke and that there may be an infectious etiology or possibly an underlying susceptibility to infections in subjects with cardioembolic stroke. The regulated genes also can serve as biomarkers for the etiology of stroke and help guide appropriate secondary prevention measures in subjects with undetermined causes of stroke.
One unanswered question is whether the observed gene expression differences between cardioembolic and atherosclerotic strokes arise from the baseline disease differences before the acute event or from the molecular differences after the stroke event or both. Because only healthy controls were included, our pilot study was not designed to make this distinction. However, there are some clues that may help answer this question. The expression of some etiology-related genes remain fairly constant over the 3- to 24-h periods for both large-vessel atherosclerotic and cardioembolic stroke patients, possibly suggesting their closer relation to baseline disease rather than the acute stroke event. There are other etiology-related genes that change with the evolving stroke over the period of 3 to 24 h that differ between large-vessel atherosclerotic and cardioembolic disease. These genes are much more likely to be related to both the ischemic stroke and the underlying etiology of the stroke. Combined with the functional annotation of these etiology-regulated genes and cell-specific natures of the genes discussed above, it seems that the unique expression profiles for each stroke subtype may have reflected both the characteristics of the underlying diseases and the acute thromboembolic events. Future studies with much larger numbers of subjects will be needed to sort out genes that differentiate patients with heart disease or large-vessel atherosclerosis with and without acute strokes.
Footnotes
Acknowledgements
We thank the Cincinnati Children's Hospital Medical Center for running the arrays for this study and the Cincinnati Stroke Team for obtaining the blood samples.
HX and FRS have filed for patents for the list of genes reported in this paper, which can differentiate cardioembolic stroke from atherosclerotic stroke. No other authors reported financial disclosure.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
