Abstract
Pulmonary hypertension (PH), when it complicates sarcoidosis, carries a poor prognosis, in part because it is difficult to detect early in patients with worsening respiratory symptoms. Pathogenesis of sarcoidosis occurs via incompletely characterized mechanisms that are distinct from the mechanisms of pulmonary vascular remodeling well known to occur in conjunction with other chronic lung diseases. To address the need for a biomarker to aid in early detection as well as the gap in knowledge regarding the mechanisms of PH in sarcoidosis, we used genome-wide peripheral blood gene expression analysis and identified an 18-gene signature capable of distinguishing sarcoidosis patients with PH (n = 8), sarcoidosis patients without PH (n = 17), and healthy controls (n = 45). The discriminative accuracy of this 18-gene signature was 100% in separating sarcoidosis patients with PH from those without it. If validated in a large replicate cohort, this signature could potentially be used as a diagnostic molecular biomarker for sarcoidosis-associated PH.
Pulmonary hypertension (PH) is a known complication of sarcoidosis, but its prevalence in this patient population has been variably reported and is largely unclear. 1 Historically, it has been detected more commonly in patients with advanced lung disease, with one report documenting its presence in 74% of patients with sarcoidosis awaiting lung transplantation. 2 Sarcoidosis-associated pulmonary hypertension (SAPH) dramatically decreases functional status and increases oxygen requirements relative to sarcoidosis patients without PH. As reported by Shorr et al., 3 approximately 70% of SAPH patients require assistance with daily functional activities, compared to a small fraction of non-PH sarcoidosis patients. Furthermore, PH is an independent predictor of mortality in patients with sarcoidosis, increasing it by 10-fold among patients listed for lung transplantation. 4 The overall prognosis of SAPH is a dismal 59% survival at 5 years from diagnosis. 5
Contributing significantly to both the uncharacterized prevalence of SAPH and its associated poor prognosis are the current challenges faced by clinicians to detect the condition in its early stages. 6 The presenting symptoms of PH, such as dyspnea, reduced exercise capacity, and fatigue, are clinically indistinguishable from those of advancing pulmonary parenchymal sarcoidosis. 7 Physical findings of PH are readily detectable only when right heart failure occurs in the late stages of disease. 8 The diagnosis of SAPH requires invasive right heart catheterization, and there is no reliable noninvasive screening test for SAPH. 9 Continuous Doppler flow echocardiography estimating systolic pulmonary arterial pressure from the maximal velocity of the tricuspid regurgitation jet is a noninvasive method routinely used to assess PH in nonsarcoid patients. However, it is known to be less accurate in patients with chronic lung disease, and therefore its ability to perform as a good screening test for SAPH is questionable. 10
There is currently no specific therapy for SAPH. 6 Many of the usual secondary mechanisms of PH in chronic lung disease have been found to be of minimal significance as contributing factors in the development of SAPH. These include fibrotic destruction of the pulmonary vascular bed,2,7 extrinsic compression of the pulmonary vessels by mediastinal lymphadenopathy, 11 hypoxic vasoconstriction leading to pulmonary vascular remodeling, 2 and sarcoid cardiomyopathy. 2 Increasing evidence points to a sarcoidosis-related mechanism of pulmonary vasculopathy that is distinct from many of the other known causes of PH but is mechanistically uncharacterized.12,13
To address the challenges in early clinical detection of SAPH as well as the current deficit in knowledge regarding its mechanism of pathogenesis, we used gene expression profiling of peripheral blood mononuclear cells (PBMCs) to study cohorts of sarcoidosis patients with and without PH. Whole-genome expression studies have been utilized to identify novel candidate genes for PH by applying an unbiased, genome-based approach. 14 Since gene expression, an intermediate phenotype situated between DNA sequence variation and cellular, whole-body phenotypes, has a substantial genetic basis,15–17 profiling gene expression in PBMCs could provide an opportunity to explore gene expression signatures or biomarkers that may be associated with PH in sarcoidosis from both a diagnostic and a pathobiologic standpoint.
METHODS
Inclusion and exclusion criteria
The study was approved by the institutional review board of each collaborating institution, with written informed consent obtained from all subjects, and was performed in accordance with the principles in the Declaration of Helsinki. The inclusion criterion was a diagnosis of sarcoidosis based on the criteria established by the joint international statement on sarcoidosis. 18 Healthy controls were patients without any known chronic medical condition. Among the patients with sarcoidosis, PH was considered to be present in those who had undergone a right heart catheterization for clinical suspicion of PH and who met criteria for PH, as defined by mean pulmonary arterial pressure of ≥25 mmHg and a pulmonary artery occlusion pressure of ≤15 mmHg. Subjects with cardiac sarcoidosis or neurosarcoidosis and other concurrent systemic inflammatory diseases were excluded.
Functional parameters in test subjects
Subjects with sarcoidosis underwent spirometry in accordance with the joint American Thoracic Society/European Respiratory Society guidelines for standardization of lung function testing. 19 Patients who were clinically suspected to have PH underwent right heart catheterization, as detailed in the American College of Cardiology Foundation/American Heart Association 2009 expert consensus document on PH. 20
PBMC preparation
Blood samples were collected from 17 sarcoidosis patients without PH, 8 sarcoidosis patients with PH, and 45 healthy controls (Table 1). PBMCs were isolated from these samples via Ficoll-hypaque density gradient centrifugation, as in Grigoryev et al. 21
Clinical characteristics of sarcoidosis patient population
Note: Unless otherwise noted, data are reported as mean ± SEM. There were no significant differences in age and sex (P = 0.3 by t test for age and P = 0.4 by χ2 test for sex), but there were significant differences in pulmonary function (P = 0.002 by t test). No PH: patients without pulmonary hypertension; PH: patients with pulmonary hypertension; FVC: forced vital capacity; ETRA: endothelial receptor antagonist; PDE-5: phosphodiesterase 5; PAP: pulmonary arterial pressure; BNP: brain natriuretic protein.
RNA isolation and microarray hybridization
Total RNA was isolated from PBMCs with Trizol LS (Life Technologies), and additional purification was achieved by using the RNeasy purification kit (Qiagen). All RNA sample (n = 70) quality was high, without signs of DNA contamination or RNA degradation. Further sample processing (e.g., complementary DNA generation, fragmentation, end labeling, and hybridization to Affymetrix GeneChip Human Exon 1.0 ST arrays) was performed by the Functional Genomics Facility at the University of Chicago and the Core Genomics Facility at University of Illinois at Chicago, per manufacturers' instructions.
Microarray data preprocessing
Expression arrays were analyzed with Affymetrix Power Tools, version 1.12.0 (http://www.affymetrix.com/estore/partners_programs/programs/developer/tools/powertools.affx), as in Zhou et al. 22 We used the experimental probe-masking workflow provided by the Affymetrix Power Tools to filter the probe set (exon-level) intensity files by removing probes that contain known single-nucelotide polymorphisms (SNPs) in the dbSNP database (ver. 129). 23 Overall, of the ∼1.4 million probe sets on the exon array, ∼350,000 were found to contain at least one probe with an SNP (∼600,000 probes). 24 The resulting probe signal intensities were quantile normalized over all 70 samples. Probe set expression signals were summarized with the robust multiarray average (RMA) algorithm 25 and log2 transformed with a median polish. We then generated the expression signals of the ∼22,000 transcript clusters (gene level) with the core set (i.e., with RefSeq-supported annotations) 26 of exons by taking averages of all annotated probe sets for each transcript cluster. Adjustment for a possible batch effect was conducted by COMBAT (http://jlab.bu.edu//ComBat/). 27 We consider a transcript cluster to be reliably expressed in these samples if the Affymetrix-implemented DABG (detection above ground) 28 P value was less than 0.01 in at least 83% of the samples in either patient group (sarcoidosis patients with or without PH). We removed genes on chromosomes X and Y to avoid the potential confounding factor of sex. We further limited our analysis set to the genes with unique annotations (i.e., transcripts corresponding to unique genes) from the Affymetrix NetAffx website (https://www.affymetrix.com/analysis/index.affx#1_2, accessed March 1, 2012). In total, 12,999 transcript clusters met these criteria and were further analyzed.
Identification of differentially expressed genes
Significance analysis of microarrays (SAM), 29 implemented in the samr library of the R statistical package, 30 was used to compare log2-transformed gene expression levels between sarcoidosis patients with and those without PH. The false-discovery rate (FDR) was controlled via the q-value method. 31 Transcripts with a fold change greater than 1.25 and an FDR less than 10% were deemed differentially expressed. We searched for any enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) 32 physiological pathways among the differential genes relative to the final analysis set, using the National Institutes of Health Database for Annotation, Visualization and Integrated discovery (DAVID).33,34 A P value of <0.05 was used as the cutoff.
Identification of gene signatures for distinguishing sarcoidosis with and without PH
To identify gene signatures useful in the diagnosis of PH in sarcoidosis, a machine-learning algorithm based on a support vector machine (SVM) 35 using a linear kernel was applied, in combination with recursive feature elimination (RFE), for generating a predictive model. The decision function of a linear SVM is
where
RESULTS
Patient characteristics
PBMC samples were collected from sarcoidosis patients with PH (n = 8), sarcoidosis patients without PH (n = 17), and healthy controls (n = 45). The clinical characteristics of study patients are displayed in Table 1. Significant differences in age and sex were not observed between patients with and those without PH (P = 0.3 by t test for age, and P = 0.4 by χ test for sex). Not surprisingly, sarcoidosis patients with PH had worse pulmonary function (P = 0.002 by t test).
Identification of differentially expressed genes in SAPH
At the specified significance level (fold change > 1.25, FDR < 10%), 275 genes were differentially expressed between the sarcoidosis patients with and those without PH (Fig. S1; Figs. S1–S3 available online), among which 269 genes were upregulated while 6 genes were downregulated in PH patients. Ten KEGG pathways were significantly enriched among these sarcoidosis PH-associated genes (P < 0.05), including Fc gamma R–mediated phagocytosis, cytokine-cytokine receptor interaction, and MAPK signaling pathway (Fig. 1). We also compared the gene expression pattern between the sarcoidosis patients without PH and healthy controls; 263 genes were found to be differentially expressed between these two groups. The top KEGG T cell receptor signaling pathway identified (Fig. S2) was reported in a previous study. 22

Enriched pathways among the differentially expressed genes between sarcoidosis patients with and those without pulmonary hypertension. The 20 top-ranked Kyoto Encyclopedia of Genes and Genomes pathways are listed. The red line indicates the cutoff of significance (0.05). MAPK: mitogen-activated protein kinase; VEGF: vascular endothelial growth factor.
Identifying a gene signature for PH in sarcoidosis
The 275 genes differentially expressed between the sarcoidosis patients with and those without PH were used as the initial analysis set for the SVM-based algorithm. Figure S3 shows an overview of the prediction accuracy for gene signatures with different numbers of genes during recursive feature selection (see “Identification of gene signatures for distinguishing sarcoidosis with and without PH” for details). We chose an 18-gene signature (Table 2) as the most parsimonious gene list with the peak prediction accuracy. Among the 18 genes, HIST1H4C (encoding histone cluster 1, H4c), CACHD1 (encoding cache domain containing 1), STOX1 (encoding storkhead box 1), and NRCAM (encoding neuronal cell adhesion molecule) were strongly downregulated in PH patients, whereas the remaining 14 genes were upregulated in PH (Fig. 2A). This 18-gene signature distinguished sarcoidosis patients with PH from the ones without PH (Fig. 2B) with 100% discriminative accuracy. We also tested the performance of the 18-gene signature in distinguishing PH patients from all non-PH subjects (including sarcoidosis patients without PH and healthy controls). We found that the discriminative power was also considerably strong, with a classification accuracy of 97.8% (sensitivity: 85.1%, specificity: 99.5%; Fig. 2C).

A, Heat map of our 18-gene signature. Red represents increased gene expression, while blue represents gene expression. ++: sarcoidosis patients with PH; +: sarcoidosis patients without PH. B, C, Principal-component analysis (PCA); X-axis: first principal component (PC1); Y-axis: second principal component (PC2); left, PCA on the expression values of all the genes detected by microarray; right, PCA on the expression values of our 18-gene signature. PH: pulmonary hypertension/sarcoidosis patients with PH; Non-PH: sarcoidosis patients without PH. B, PCA on expression of the patients with and without PH. C, PCA on expression of the patients with and without PH and healthy controls. CTRL: healthy controls.
18-gene signature
We mapped the 18-gene signature to a lung gene expression data set, published by Zhao et al., 39 containing 11 controls and 12 PH patients. Among the 14 genes upregulated in PBMCs of our cohort of SAPH patients, only 3 were found to be upregulated in PH lungs (t test: P < 0.05). Meanwhile, none of the four genes downregulated in PBMCs of SAPH patients were downregulated in lungs of pulmonary arterial hypertension patients. These results suggest that the 18-gene signature is SAPH and PBMC specific.
DISCUSSION
The prevalence of PH in patients with sarcoidosis is unknown, obscuring the predictive value of initial noninvasive diagnostic testing, such as echocardiography. 1 The only described pretest indication of the potential presence of PH in this population is subjective dyspnea or functional limitation out of proportion to abnormalities on pulmonary function testing, which has been used to identify a subpopulation in which 47% of subjects are found to have PH 40 but may be difficult to reproduce, given the subjective nature of the assessment. Our study has sought to determine whether peripheral blood gene expression profiling, an easily reproducible and objective measurement, can be used to further enrich a sarcoidosis subpopulation for PH, thereby potentially enhancing the predictive value of subsequent diagnostic testing for the disease.
Gene signatures in blood have been shown to serve as a potential diagnostic tool in sarcoidosis. 41 Leveraging whole-genome expression profiles in a cohort of sarcoidosis patients, we identified a gene signature consisting of 18 autosomal genes for differentiating patients with PH from patients without PH. The RFE approach recursively reduces the number of genes that can be used to classify PH and non-PH patients by removing those genes with lowest weights and refitting the SVM algorithm with the remaining genes. The 18-gene signature showed prediction accuracy equivalent to that of other signatures containing a greater number of genes (e.g., 35, 69, 138, or 275), and these larger signatures generally exhibited the same level of classification power as our 18-gene signature. The 18-gene sarcoidosis signature not only correctly distinguished sarcoidosis patients with PH from patients without PH but also successfully differentiated patients with PH from healthy controls. Therefore, the 18-gene signature not only captured differences between sarcoidosis patients with and those without PH but also conveyed information as to differences between PH patients and healthy controls.
We compared our 18-gene signature with other known PBMC gene signatures in PH. None of the genes in our signature overlap with those of a 106-gene PBMC signature separating PH patients from healthy controls described by Bull et al. 42 Furthermore, Risbano et al. 43 elucidated a 9-gene signature that defined scleroderma patients who had PH versus those without PH, and in a recent publication by Desai and colleagues, 14 a 10-gene signature was proposed to identify patients with PH from a cohort of sickle cell disease patients. However, we did not find any overlap between our 18-gene signature gene and either of these etiology-specific PH signatures. Altogether, these comparisons suggest that the mechanisms for PH in sarcoidosis consist of genetic pathways different from those for idiopathic or other secondary etiologies.
While it remains unclear how representative PBMC gene expression changes are of cellular events occurring in the lung, it is very interesting to note that 9 of the 18 genes identified in our study are linked to cellular proliferation,44–52 which is one of the central events occurring in pulmonary endothelial cells and/or vascular smooth muscle cells during PH. 53 A handful of these are linked to specific pathways or cell types relevant to PH. ETHE1 and RASSF3 are mediators of p53-induced apoptosis,46,51 suppression of which in pulmonary vascular smooth muscle cells is sufficient to induce PH in rats. 54 CORO1B plays a role in platelet-derived growth factor–mediated smooth muscle cell migration and proliferation, 49 a pathway that has been studied for its potential as a target for therapy in PH. 55 HIST1H4C has been linked to proliferative pathways involving bone morphogenetic protein (BMP) and activin receptor-like kinase 1 (ALK-1). 56 BMP and ALK-1 are both known to carry mutations associated with heritable PH syndromes. 57
Major limitations to our work is the lack of reverse transcription polymerase chain reaction (RT-PCR) validation of the microarray findings and the lack of prospective confirmation in an independent validation cohort. Unfortunately, the amount of sample available from the original cohort was insufficient to perform RT-PCR validation, and the sample size of SAPH patients is too small to perform an independent validation cohort study at this time. We also recognize that PBMC gene expression changes could be affected by many conditions and factors, not the least of which is the presence of other disease conditions, such as infections. In this small study, we have not collected data regarding indolent coexisting conditions that could play such a confounding role.
In summary, despite several significant limitations, including the relatively small size of our study cohort and the lack of an independent validation cohort, we have identified a molecular gene signature as a potential novel molecular biomarker in the diagnosis of PH in sarcoidosis. This PH gene signature performed with substantial classification accuracy not only between sarcoidosis patients with PH and those without PH but also between patients with PH and healthy controls, and, if validated in a large replicate cohort, may represent a gene signature for PH in sarcoidosis that could be explored as a diagnostic molecular biomarker for the disease.
