Abstract
Breast cancer (BC) is the second most common cancer among women. Research shows many women with BC experience anxiety, depression, and stress (ADS). Epigenetics has recently emerged as a potential mechanism for the development of depression. 1 Although there are growing numbers of research studies indicating that epigenetic changes are associated with ADS, there is currently no evidence that this association is present in women with BC. The goal of this study was to identify high-throughput methylation sites (CpG sites) that are associated with three psychoneurological symptoms (ADS) in women with BC. Traditionally, univariate models have been used to examine the relationship between methylation sites and each psychoneurological symptom; nevertheless, ADS can be treated as a cluster of related symptoms and included together in a multivariate linear model. Hence, an overarching goal of this study is to compare and contrast univariate and multivariate models when identifying methylation sites associated with ADS in women with BC. When fitting separate linear regression models for each ADS scale, 3 among 285,173 CpG sites tested were significantly associated with depression. Two significant CpG sites are located on their respective genes FAM101A and FOXJ1, and the third site cannot be mapped to any known gene at this time. In contrast, the multivariate models identified 8,535 ADS-related CpG sites. In conclusion, when analyzing correlated psychoneurological symptom outcomes, multivariate models are more powerful and thus are recommended.
Introduction
Breast Cancer (BC) is the second most common cancer among women. Research shows that many women with BC experience major depression and generalized anxiety disorder. 2 Significantly higher incidences of post-traumatic stress disorder and higher scores on emotional distress were also observed in women with breast cancer. 3 One well-studied biological mechanism leading to anxiety, depression, and stress (ADS) is cytokine-induced behavioral changes in sick individuals. 4 There are both clinical and experimental evidence indicating that activation of the brain cytokine system is related to depression. 4 Cytokine activation not only relates to depression but also to cancer development. 4 Lyon et al have shown that many groups of cytokines were elevated in women with BC. 5
Another biological change (other than cytokine activation) that has recently emerged as a potential mechanism for the development of depression is epigenetics. 1 DNA methylation can prevent enzymes from binding to the promoter region of a gene and thereby cause up- or down-regulation of DNA transcription, 1 with these effects potentially being present for sites throughout the genome. An alternative means for altering the expression of genes is through position effects for genes localized near the telomere (the ends of the chromosomes).6,7 As telomeres shorten, the reduction of heterochromatin could result in alterations in chromatin conformation, which leads to changes in the expression of nearby genes.8,9 Furthermore, shortening of telomeres is associated with increased levels of chromosomal instability, which, in turn, could lead to alterations in gene expression. It has been shown that accelerated telomere shortening is associated with psychosocial stress, 10 and that hypermethylation is associated with panic disorder. 11 Although there are growing numbers of research studies indicating that epigenetic changes are associated with ADS, there is currently no evidence that this association is present in women with BC.
The goal of this study is to identify methylation patterns that are associated with three psychoneurological symptoms, ADS, in women with BC. A commonly used and well-validated questionnaire to measure anxiety and depression is the Hospital Anxiety and Depression Scale (HADS), 12 and a widely used and well-validated instrument for measuring perceived stress is the Perceived Stress Scale (PSS). 13 Genome-wide methylation was assessed using the high-throughput Illumina Human Methylation 450K assay.
Traditionally, univariate models have been used to examine the relationship between molecular features and each outcome variable (ADS); nevertheless, analyzing multiple outcome variables globally has become more frequent in many fields of biomedical research. 14 In fact, this approach is especially helpful for evaluating traits that appear related. For example, evidence that stress is related to depression and anxiety in women with BC comes from the results of a study by Varni et al, who found that higher levels of perceived stress were associated with increased depression and anxiety among survivors of childhood cancer. 15 Therefore, instead of fitting separate univariate models to each psychoneurological symptom, ADS can be treated as a cluster of related symptoms and included together in a multivariate linear model. Hence, an overarching goal of this study is to compare and contrast univariate and multivariate models when identifying molecular features associated with ADS in women with BC.
Methods
Study Design
The study included 73 women with BC in stage I (n = 21), II (n = 44), or III (n = 8). This research was approved by the Institutional Review Board at Virginia Commonwealth University (VCU IRB #HM 13194) and complied with the principles of the Declaration of Helsinki. The molecular features and ADS were measured at study enrollment, prior to their receiving chemotherapy. The severity of depression and anxiety was measured using the HADS, which is a 14-item questionnaire. 12 Among the 14 items, 7 assess anxiety and 7 assess depression. Each item is on a four-level ordinal scale, the scale response is calculated such that the ordinal levels contribute 0–3 points, and the 14 items within each scale are summed. Using these sums, subjects may be classified as normal (score <8), borderline (8–10), or having clinical anxiety or depression (score >10). 16 Although the scale has an ordered structure, we used the continuous version of HADS in order to maintain all the information in the scale and protect power.
Stress was evaluated using a 10-item PSS. 17 PSS was designed to measure the degree to which situations in one's life are appraised as stressful. 13 Items were designed to monitor how unpredictable, uncontrollable, and overloaded a respondent's life is by asking about the respondent's feelings and thoughts during the last month. 17 Of the 10 total responses on the PSS, 6 were negative responses and were given the following numerical value: never = 0; almost never = 1; sometimes = 2; fairly often = 3; and very often = 4. The remaining four positive items in the PSS were scored in a reversed format according to standard approaches (ie, never = 4; very often = 0).18,19 The scores were summed, with the higher total scores of PSS indicating higher overall stress. 17
The Illumina Human Methylation 450K assay was used to profile human DNA methylation using whole blood from women diagnosed with breast cancer. The β-values were defined as the proportion methylated, and were reported for each CpG site using the minfi Bioconductor package 20 in the R programing environment. 21 β-values were plotted against the percentage of GC content across all samples. Based on the results, CpG sites with GC content greater than 40% were filtered out. CpG sites with all β-values over 0.9 or below 0.1 were considered to be constitutively methylated or unmethylated and therefore were excluded from the study. 22 Because single nucleotide polymorphisms (SNPs) may affect the performance of this assay, 23 the CpG sites containing SNPs were also excluded. After filtering, 285,173 CpG sites remained. The Illumina Human Methylation 450K Assay uses two types of probe designs: Infinium I and Infinium II. Peak correction was used to align the M-values produced by the two types of probe designs (M-value = log(β/(1 − β)). 24 In this study, five subjects had multiple methylation profiles: because these profiles were highly correlated, the mean methylation levels were used. After taking the average, the M-values were back-transformed into β-values, which were used in the analysis.
Statistical Analysis
The HADS and PSS scales were first examined using descriptive statistics. To identify the molecular features significantly associated with each of the scales, a linear regression model was fit to each of the 285,173 CpG sites. ADS were treated as three separate continuous responses. The P-values obtained from the linear regression models were adjusted using the Benjamini–Hochberg (BH) method to control the false discovery rate (FDR). CpG sites with an FDR less than 0.02 were considered to be significant.
The ADS values were plotted against each other to produce a scatterplot matrix. Pearson's correlation among the three scales was also calculated. Based on the results, it seemed reasonable to fit a multivariate linear mixed model by combining all scales. The mixed model included a methylation component, two dummy coded indicators representing the type of psychoneurological symptom with anxiety as referent level, and the interactions between each indicator and the methylation component. In addition, the model included a subject-specific term to account for the correlation among scales. To illustrate the model with a formula, let Yij represent the HADS-Anxiety, HADS-Depression, and PSS outcome for ith subject, where j = 1, 2, or 3, respectively. The model can then be expressed as
All statistical analyses were performed using the R programming environment (version 3.0.2) and the lme4 package. 21
Results
The means and standard deviations for the three scales are provided in Table 1. The distributions of the three scales are shown in Figure 1. Although the distribution of HADS-Depression is skewed somewhat to the right, based on the Central Limit Theorem, the mean of HADS-depression will follow a Gaussian distribution due to the large sample size (n = 73). 25 Pearson's correlation between anxiety and depression, anxiety and stress, and depression and stress were 0.657, 0.668, and 0.715, respectively, indicating that the three outcome scales were highly correlated; the scatterplot matrix supported this conclusion (Fig. 2).

Density plot of HADS-Anxiety, HADS-Depression, and PSS.

Scatterplot matrix of HADS-Anxiety, HADS-Depression, and PSS.
Means and standard deviations of the three response variables.
When fitting separate models for each ADS scale, among 285,173 CpG sites, the sites with Illuminas CpG loci ID (IlmnID) cg14474520, cg17527195, and cg23431989 were significantly associated with depression (Table 2). Further investigation suggested that the methylation level at sites cg14474520 and cg23431989 were positively associated with depression (slope coefficients: 104.6 and 70.52). However, at site cg17527195, the methylation level was negatively associated with depression (slope coefficient: −39.42). The scatter plots supported this conclusion (Fig. 3). The CpG site cg14474520 is located on gene FAM101A, and the CpG site cg23431989 is located on gene FOXJ1. Both FAM101A and FOXJ1 are cancer-related genes. It has been shown that down-regulation of FAM101A significantly reduces early apoptosis in RKO colorectal cancer cells. 26 Recent studies have suggested that FOXJ1 may be involved in the development of more than one cancer. FOXJ1 can function as a tumor suppressor in breast cancer. 27 It is also involved in adverse clinical outcome in ovarian cancers. 28 Chen et al found that high expression of FOXJ1 is associated with prognosis and tumor cell proliferation in hepatocellular carcinoma. 27 The CpG site cg17527195 cannot be mapped to any known genes at this time.

Scatterplots of HADS-Depression versus β-values for the three significant CpG sites.
Significant CpG sites from the separate models.
After fitting multivariate models, the P-value histograms support the notion that the multivariate model is more powerful in identifying significant CpG sites (Fig. 4). In Figure 4, the observed P-values for the CpG sites from the multivariate models are skewed more to the right compared to the three separate models (Fig. 4). The multivariate models identified 8,535 ADS-related CpG sites (FDR <0.02). These 8,535 significant CpG sites are located in different regions including 4,652 open sea (54.5%), 797 CpG islands (9.3%), 622 north shelf (7.3%), 1,099 north shore (12.9%), 518 south shelf (6%), and 847 south shore (9.9%) sites. The CpG islands were defined as DNA sequences with greater than 50% GC bases and more than 60% CpG observed/expected ratio. The 2-kb sequences upstream and downstream of the CpG islands were defined as north and south shores, respectively, and the 2-kb sequences directly upstream and downstream of the shores were referred to as north or south shelves, respectively. DNA methylation sites in regions other than these are annotated as “open sea”. 23

Histograms of observed P-values from the CpG site methylation models.
Smaller numbers of significant sites can be identified by decreasing the FDR threshold (FDR = 0.01, number of significant sites = 6847; FDR = 0.005, number of significant sites = 4579; FDR = 0.001, number of significant sites = 8). The information for the eight significant CpG sites when FDR = 0.001 are shown in Table 3. Of particular interest among these eight significant sites is cg22749855, which is located within the SOCS3 gene and is associated with a gene promoter. SOCS3 is a well-studied oncogene, and hypermethylation of the SOCS3 promoter is an important regulatory mechanism for ulcerative colitis-related colorectal cancer. 29 Interestingly, ulcerative colitis-related colorectal cancer is a condition that has also been associated with telomere shortening. 30 Using the mouse as a model, researchers have found that the Socs3 gene is involved in depression in the mouse brain. 31 SOCS3 has also been shown to play a role in limiting axon regeneration. 32 Another interesting CpG site is cg11334709, which is located within the PIK3CD gene. The PIK3CD gene is associated with schizophrenia and increased expression of PIK3CD has been observed with a variation of ErB4, a critical neurodevelopmental gene. 33 Figure 5 demonstrates how methylation is differently associated with the three psychoneurological symptoms at these eight significant CpG sites.

Scatterplots of combined HADS and PSS versus β-values for the eight significant CpG sites.
Annotation for eight significant CpG sites from the multivariate models using an FDR threshold of 0.001.
Additionally, we examined the robustness of our results by analyzing the modified version of HADS (HADS with items 7, 11, and 12 removed). 16 The modified HADS was able to identify a larger number of significant CpG sites when fitting mixed models using an FDR of 0.02 (10,856 sites). However, the univariate analyses using the modified HADS did not yield any significant CpG sites when the FDR was 0.02. These results are consistent with the unmodified HADS results, where we found 8,535 CpG sites significant when fitting mixed models but none that were significant univariately for anxiety or PSS, and only three that were significant univariately for depression.
Discussion
Although HADS is commonly used to measure anxiety and depression, many researchers have argued that HADS suffers from issues such as lack of consensus on an optimal cutpoint for detecting anxiety and depression, problematic fit of some items, and contradictory results regarding optimal dimensional structures. 16 Lambert suggested that, instead of using the original 14 items, removing items 7, 11, and 12 can improve the model fit. 16 Although in this current study the full HADS was analyzed to allow for comparisons of results with other published work, we also analyzed the modified HADS so that the results could be compared with the analysis of the original scale. For our methylation data, both the unmodified and modified HADS were able to identify a larger number of significant CpG sites when fitting mixed models using an FDR of 0.02 (8,535 and 10,856 CpG sites, respectively) in comparison with the univariate models. In fact, the univariate analyses using the modified HADS did not yield any significant CpG sites for anxiety, depression, or perceived stress, which is similar to the unmodified HADS that only detected three CpG sites, all of which were significantly associated with depression.
The sensitivity study conducted by Illumina stated that the Human Methylation 450 Assay is reliable when detecting a Δβ-value of at least 0.2. 23 When comparing two groups, Δβ is the difference between the mean β-values in the two groups. Because our outcome was continuous, we defined Δβ as the range of β-values. Among three significant CpG sites found by separate models, only cg17527195 had a Δβ greater than 0.2 (0.22); therefore, caution is needed when investigating the other two sites. Among 8,535 significant CpG sites found by the multivariate models using the unmodified HADS, 6,847 had a Δβ greater than 0.2 (80%).
The results of our study demonstrate that the multivariate mixed model is more powerful than fitting separate models. Using multivariate models, a larger number of CpG sites were identified as significant. Figure 4 further demonstrated that the P-value distribution from the multivariate models was more skewed to the right compared to those from the univariate models, giving rise to more significant findings at a fixed FDR. Therefore, when analyzing correlated psychoneurological symptom outcomes, multivariate models are recommended.
In the future, we will additionally have each patient's cytokine levels assayed using the BioPlex system, telomere lengths using a fluorescent in situ hybridization (FISH)-based assay, and Illumina methylation levels at baseline and at several follow-up time points. The multivariate mixed model approach will likewise be useful for analyzing longitudinally collected data.
Author Contributions
Conceived of this study: CJC, DL, KJA. Planned the data collection and subject recruitment: CJC, DL. Analyzed the data: QZ, KJA. Wrote the first draft of the manuscript: QZ. Contributed to the writing of the manuscript: CJC, DL, RP, KJA. Agree with manuscript results and conclusions: QZ, CJC, DL, RP, KJA. Jointly developed the structure and arguments for the paper: QZ, KJA. Made critical revisions and approved final version: CJC, DL, RP, KJA. All authors reviewed and approved of the final manuscript.
