Abstract
ALS is a fatal motor neuron disease that displays a broad variety of phenotypes ranging from early fatal courses to slowly progressing and rather benign courses. Such divergence can also be seen in genetic ALS cases with varying phenotypes bearing specific mutations, suggesting epigenetic mechanisms like DNA methylation act as disease modifiers. However, the epigenotype dictated by, in addition to other mechanisms, DNA methylation is also strongly influenced by the individual’s genotype. Hence, we performed a DNA methylation study using EPIC arrays on 7 monozygotic (MZ) twin pairs discordant for ALS in whole blood, which serves as an ideal model for eliminating the effects of the genetic-epigenetic interplay to a large extent. We found one CpG site showing intra-pair hypermethylation in the affected co-twins, which maps to the Glutamate Ionotropic Receptor Kainate Type Subunit 1 gene (GRIK1). Additionally, we found 4 DMPs which were subsequently confirmed using 2 different statistical approaches. Differentially methylated regions or blocks could not be detected within the scope of this work. In conclusion, we revealed that despite a low sample size, monozygotic twin studies discordant for the disease can bring new insights into epigenetic processes in ALS, pointing to new target loci for further investigations.
Introduction
Studies investigating DNA methylation in blood cells in ALS have been accumulating in the last few years, indicating the importance of epigenetic studies in the field of motor neuron diseases (MND).1-3 Despite the presence of numerous genomic sites that show static patterns of DNA methylation, there are also methylation sites in the genome that are regulated dynamically and influenced by lifestyle and environmental factors. Furthermore, comorbidities that are not associated with the neurodegenerative disease can have an impact on the DNA methylome, which makes it rather difficult to detect true changes in methylation associated with neurodegenerative diseases. Additionally, a complex interplay between genetic and epigenetic factors makes it even more difficult to detect the subtle changes in DNA methylation in blood cells associated with neurodegeneration. Our design, given the genetic and clinical nature of the study subjects, provides an ideal setup for matching both genotype and rearing environment within each pair of MZ twins, with the unaffected co-twins being the control group. This study design accounts for not only known potential confounding factors, such as genetics, age, and sex, but latent sources of unwanted variability, as well. 4 Overall, the approach we used in this work, also called the case co-twin design, is superior to the ordinary case-control design in terms of the power of a significance test, 5 providing a unique opportunity to detect subtle DNA-methylation changes associated with the motor neuron disease.
Methods
Study design and sampling of monozygotic twins
Sixteen participants (eg, 8 twin pairs) were initially enrolled in the study. One twin pair was excluded because they were dizygotic. So, the present study was composed of 7 monozygotic (MZ) twin pairs discordant for ALS, all genotyped at the University Clinic of Ulm. All individuals were of European descent without known diabetes or cancer history. The affected co-twins were diagnosed according to the revised El-Escorial criteria 6 for definite ALS. No unaffected co-twins had signs of MND. Zygosity information was further verified using SNP information using targeted sequencing of the known ALS genes and the SNP information provided on the Illumina EPIC DNA methylation arrays.
Ethical approval
All procedures were performed in accordance with the relevant guidelines and regulations stated in the Helsinki II declaration and approved by the local medical ethics committees (ethics committees of Ulm University, approval no 19/12) medical ethics committees. Informed written consent from all study subjects was obtained before enrollment in the study.
Blood collection
Collection of human peripheral venous blood was performed in a non-fasting state using a Monovette™ blood drawing system (Sarstedt, Germany) and EDTA tubes (Sarstedt, Germany) by a standardized procedure.
DNA isolation
5ml EDTA blood was used for DNA extraction. Erythrocytes were lysed in 20 ml lysis buffer (155 mM NH4Cl, 10 mM KHCO3, 0.1 mM Na2EDTA, pH = 7.4) for 15 minutes on ice. The leukocyte pellet was resuspended in 2.5 ml SE buffer (75 mM NaCl, 25 mM EDTA). To release the DNA, Proteinase K digestion was performed using 100 μl 20% SDS+10 mg/ml Proteinase K (Roche, Switzerland) overnight at 37°C. 900 μl of saturated sodium chloride solution (6 M NaCl) was added. The DNA was precipitated in 7 ml of 100% ethanol. Then, the DNA was washed in 70% ethanol, dried at room temperature (RT), and resuspended in 100 μl H2O.
DNA quality control and quantification
Nanodrop (Thermo Fisher Scientific, MA, USA) was used to determine the 260/280 ratio and the 260/230 ratio. All samples had a 260/280 ratio between 1.7 and 2 and a 260/230 ratio greater than 1.5. Nanodrop was used to get a rough estimate of the DNA concentration, then DNA quantification was carried out using a DNA Qubit fluorometer (Thermo Fisher Scientific, MA, USA).
Genotyping
C9orf72 genotyping in all samples was carried out by fragment analysis and repeat-primed PCR (RP-PCR). All normal homozygotes and expanded alleles were confirmed with Southern Blot. For the targeted gene sequencing, a custom panel from Illumina with 36 ALS genes (ie, ALS2, ANG, ARHGEF28, CCNF, CFAP410, CHCHD10, CHMP2B, DCTN1, ERBB4, FIG4, FUS, GLE1, GRN, HNRNPA1, HNRNPA2B1, MAPT, MATR3, NEFH, NEK1, OPTN, PFN1, PRPH, SETX, SIGMAR1, SOD1, SPG11, SQSTM1, TAF15, TARDBP, TBK1, TUBA4A, UBQLN2, VAPB, VCP, VEGFA, VPS54) was used. The subsequent sequencing was performed for 150 cycles by generating 2× 74 bp paired-end reads on the MiSeq (Illumina, CA, USA). The coverage of all genes was at least 20×. Enrichment for targeted gene sequencing was performed with the Nextera Rapid Capture_Custom Kit (Illumina, CA, USA).
Sodium bisulfite conversion
500 ng of DNA from each sample underwent sodium bisulfite conversion using the EZ-96 DNA Methylation kit (Zymo Research, CA, USA).
Illumina EPIC DNA methylation arrays
Samples were measured at the single nucleotide resolution across the human genome for CpG methylation levels using Illumina EPIC DNA methylation arrays (Illumina, CA, USA) at the University Clinic of Ulm. This platform offers simultaneous interrogation of CpG methylation levels of over 850 000 different sites spanning CpG islands, genes, enhancers, and beyond. Sample processing for the assay was done according to the manufacturer’s protocol and finally, fluorescence imaging of the arrays using an Illumina HiScan SQ scanner yielded raw Intensity Data files (.idat) for all samples, following the standard Illumina scanning protocol. To reduce likely batch-to-batch variation in intra-pair CpG methylation differences, samples from co-twins in a pair were loaded together on the same array.
Data preprocessing
Preprocessing of raw data was adapted from the workflow commonly used for the analysis of array-based methylation data. 7 An initial round of quality control (QC) checks, filtering, within- and between-library normalizations, and data exploration were performed using multiple R v4.1.2 8 packages before moving on to any statistical tests for differentially methylated cell types (DMCTs), -probes (DMPs), -regions (DMRs), and -blocks (DMBs). Briefly, detection P-values calculated per CpG in all samples seperately were used to check global signal reliabilities, with a cutoff of 0.05 for filtering out samples and a cutoff of 0.01 for filtering out probes that performed poorly. Also, many other QC plots were generated using the “qcReports()” function in the R package minfi v1.40.0. 9 As we did not expect global differences between affected and unaffected co-twins, data normalization was done using the “preprocessQuantile()” function. 10 To confirm for both sample sexes and twin IDs reported in the sample metadata sheet using the methylation data, we used the “getSex()” together with the “plotSex()” functions and the “getSnpBeta()” function in the R package minfi, respectively. Cross-reactive probes and probes associated with SNPs were removed with the “dropCrossReactiveProbes()” function. 11 Multiple MDS plots were generated to cluster the samples by kinship, batch, disease status, sex, chronological age, and genotype (ie, C9orf72 hexanucleotide expansion (HRE) status) using the “plotMDS()” function in the R package limma v3.50.3, 12 which helped us figure out which covariates to include in the regression analysis. Twin correlation on global CpG methylation levels was plotted as a means of QC at the sample level using the “chart.Correlation()” function in the R package PerformanceAnalytics v2.0.4 13 considering the notion that a high degree of intra-pair correlation on global CpG methylation can be assumed for genetically-matched twins and that any unusual intra-pair dissimilarity might indicate a confounding factor (e. g. altered blood cell distribution). 4 M- and β-values were calculated from preprocessed signal intensities for downstream analyses, ranging from 0 (unmethylated) to 1 (fully methylated), using the “getM()” and the “getBeta()” functions, respectively. Finally, a multi-set Venn diagram was plotted to identify shared probes among all twin pairs with an intra-pair absolute difference in β-value of ⩾ 0.1 using the “ggVennDiagram()” function in the R package ggVennDiagram v1.2.0. 14
Differentially methylated cell type (DMCT) analyses
Reference-based estimations of blood cell type compositions from ALS and non-ALS whole blood samples were made using the “epidish()” function in the R package EpiDISH v2.10.0, 15 with the “centDHSbloodDMC.m” matrix being the whole blood reference of 333 cell-type specific DNAse Hypersensitive Site (tsDHS)-DMCs and 7 blood cell subtypes, and in “constraint projection” mode with “inequality” normalization constraint as implemented in the Houseman et al method 16 for epigenetic dissection of intra-pair cell type heterogeneity. First, a boxplot of estimates grouped by blood cell subtypes with 2 levels (ie, affected and unaffected) was made using the R package ggplot2 v3.3.6 17 for a visual assessment of the estimates as a function of individual cell types and disease status. Then, the “CellDMC()” function from EpiDISH was used to 1. identify differentially methylated cell types, 2. find their directions of change, 3. check if and which specific cell type(s) drives methylation changes observed between ALS and non-ALS samples. By including the statistical interaction terms between disease status and the proportions of underlying cell types within PBMCs in the linear regression (ie, the Robust Partial Correlations (RPCs)) framework normally used to identify DMCs, this method allows for the detection of differentially methylated cytosines in a cell-type specific manner. 15 Eosinophils were excluded from the analysis because their fraction in the estimates was extremely low. Preceding the analysis, all cell type estimates were incremented by 0.001 to deal with zero observations before log transformation of the counts. Lastly, after confirming the non-normality of the data through “shapiro.test()” and “qqPlot()” functions in the R package stats v4.1.2, 8 pairwise comparisons of each cell type between ALS and non-ALS samples were made using the “wilcox.test()” function in the same R package for a non-parametric alternative to the t-test.
Differentially methylated probe (DMP) analyses
For the differentially methylated probe analysis, we used the case co-twin design to capture intra-pair differences in CpG methylation. β-values were logit-transformed and converted into intra-pair differences (ie, values of unaffected co-twin in a pair be subtracted from those of the affected in the same pair for each probe across the twin set). This adjustment assigns a critical task to the intercept in the mixed-effects model (LMM) we used for probe-level differential methylation analysis: the mean level of intra-pair methylation differences per probe, when all fixed effect variables are set to zero, is represented by the intercept. In R 8 we used the lme4 v1.1.29 18 and the lmerTest v3.1.3 19 packages to conduct a linear mixed-effects analysis for the detection of DMPs through the “lmer()” function. As the fixed effects, sex, age, C9orf72 status, CD4+ T cells, and neutrophils were used as covariates while as the random effects, the batch was the covariate incorporated into our random-intercept-fixed-slopes regression model, leading to the best outcome regarding singularity issues. In an attempt to prevent overfitting, underrepresented cell types in the data set and other low-priority predictors identified based on the MDS plots, including BMI, have been excluded from the regression analysis. The predictors such as different cell types in the model that vary between co-twins have also been converted into intra-pair differences as before, and subsequently mean-centered to improve the numerical stability of the regression analysis. The intercepts, together with various statistics including the P-values calculated per probe, have then been collected as model fitting which was accomplished iteratively for the entire data set. Nominal P-values were adjusted using a cutoff of FDR ⩽ 0.05 for multiple testing corrections through the “p.adjust()” function. 20 The “getAnnotation()” function has been used together with the R package IlluminaHumanMethylationEPICanno.ilm10b2.hg19 v0.6.0 9 for genomic annotation. To assess the distribution of all DMPs across the genome at the chromosome resolution, a Manhattan plot was generated using the “mhtplot()” function in the R package ENmix v1.30.3. 21 Finally, a selected subset of probes was used to assess indices of model performances (the “compare_performance()” function) and the underlying assumptions, including normality of residuals and random effects (the “check_distribution()” and the “shapiro.test ()” functions), homogeneity of variance (the “check_heterogeneity_bias()” and the “leveneTest()” functions), linear relationship and multicollinearity (the “compare_model()” function), and autocorrelation (the “compare_autocorrelation()” and the “durbinWatsonTest()” functions), using the R packages performance v0.9.1, 22 stats v4.1.2, lmtest v0.9.40, 23 and car v3.1.0. 19 Also, it has been assumed that the data have been generated exogenously and that the twin sampling, as well as the entire set of probe-level observations on the EPIC arrays, were made independently. The “pheatmap()” function in the R package pheatmap v1.0.12, together with the color palettes in the package RColorBrewer v1.1.3, was used to generate heatmaps. The annotation track option of the UCSC Genome Browser 24 was enabled to illustrate a selected DMP along the CpG islands and gene bodies.
Functional enrichment analysis
After the DMP-annotated genes were extracted, a STRING v11.5 25 protein association network analysis was done to investigate functional interactions in this final list of genes with a minimal interaction score of 0.400. A gene list enrichment platform, EnrichR v15.06.2022 26 at https://amp.pharm.mssm.edu/Enrichr/ has produced the pathway- and the ontology-level functional annotation results using the following libraries: KEGG Human PA (2021) and GO Biological Process (2021). The enrichment P-values were calculated using a Fisher’s Exact Test or a hypergeometric test, which were then corrected for multiple testing using the Benjamini-Hochberg (BH) FDR-controlling method. 20
Differentially methylated region (DMR) and -block (DMB) analyses
The data have been grouped by disease status, instead of the twin structure used for the DMP analysis, to estimate “regions” of the genome binned into 1.5 kb windows for which a CpG methylation profile deviates from its baseline value between 2 (ie, affected and unaffected) populations. Detection of DMRs was done on the β-values through the “bumphunter()” function in the R package minfi v1.40.0 with default settings, except a genomic profile cutoff of .2 to identify candidate regions and a resampling number of 1000 to assess uncertainty. The “DMR.plot()” function in the R package DMRcate v2.8.5 27 was used to visualize the potential regions within the scope of this analysis. On the other hand, to explore the methylation status of genomic “blocks,” which is based on a metric devised to segregate hundreds of neighboring intergenic CpG sites into large-scale genomic regions with similar methylation profiles, 28 the probes associated only with OpenSea clusters (ie, the CpG-sparse regions outside the CpG islands) were used per the developer’s guidelines. 29 In this context, a differentially methylated block (DMB) analysis was performed using the “champ.Block()” function in the R package ChAMP v2.24.0 29 between affected and unaffected samples with maximum block and between-cluster gap sizes of 250 kb, a threshold of 10 as a minimum number of probes per block, and a resampling number of 500.
Copy number variations (CNV) analysis
To detect copy number variations between affected and unaffected co-twins using genome-wide CpG methylation profiles, 30 we used the “champ.CNA()” function in the same R package ChAMP with default parameters. Two sets of plots were returned, including the aberrations of each sample group and the aberrations of each affected sample with respect to the mean of the unaffected sample group.
DNA methylation (DNAm) age analysis
DNA methylation age estimates were generated using the online calculator 31 at https://dnamage.genetics.ucla.edu/, as described by Horvath (2013). Briefly, any probes missing in our data set that was originally used in the 27 K arrays to develop the underlying algorithm were appended row-wise to obtain the input data set for the analysis, with any missing value replaced by “NA.” A matching sample annotation file was generated for an in-depth analysis of DNAm age. After the analysis, the samples that fail to correlate well (ie, r ⩽ .80) with the standard samples were excluded from further consideration. Plots for data visualization were generated using base R graphics or the R package ggplot2 v3.3.6.
Power of a sample analysis
Power analysis was done using the “pwrEWAS_shiny()” function in the R package pwrEWAS v1.8.0 32 based on 500 simulations and a target maximal difference in DNA methylation between 20% and 50% at 5% FDR, with limma 12 being the method for DM analysis.
Receiver operating characteristic curve (ROC) analysis
With discriminatory power expressed as the area under the curve (AUC), ROC analysis with sensitivity and specificity calculations were all done using the “plot.roc()” function in the R package pROC v1.18.0 33 to assess how well each DMP identified in this study is capable of discriminating between affecting and unaffecting samples. Efficacy evaluation: AUC = 0.5 means non-efficiency, 0.5 ⩽ AUC ⩽ 0.7 means a modest level of efficiency, AUC ⩾ 0.7 means high efficiency.
Results
Demographics
The current study is composed of 7 pairs of MZ twins (n = 14). Though the initial sample size at the time of study design was 16 (ie, 8 twin pairs), one of the pairs turned out to be dizygotic (DZ) during the verification stage of the zygosity information, and was thereby excluded from the downstream analyses. Of these 7 pairs (Table 1), 3 were female and 4 were male with chronological ages ranging from 38 to 74 years (
Demographics for the MZ twin set.
Abbreviations: HRE, hexanucleotide repeat expansion; wt, wild type for known ALS genes; BMI, body mass index.
Intra-pair methylation profiles of the mutated ALS genes in the corresponding twin pairs
To determine whether there is any discrepancy in the methylation status of an upstream CpG island spanning the promoter region and exon 1 of each of the mutated ALS genes between affected and unaffected co-twins in the corresponding twin pair, we used the associated probes in the EPIC data and observed a highly similar pattern in the methylation profile of each of these regions per intra-pair comparison (Figure 1). As expected, there is a lack of DNA methylation at CpG islands encompassing the first exons of the TBK1 and NEK1 genes (Figure 1b and c). Importantly, as the probes mapping to CpG islands associated with the C9orf72 gene were filtered out during preprocessing (Supplemental Figure 1), we confined our analysis to this subgroup of ALS genes mutated in our MZ twin set (Table 2). Overall, there appears to be a clear consistency in the methylation profiles of the mutated ALS genes between co-twins in the mutation-positive twin pair, in line with the previous literature. 34

The promoter proximal sites of neither NEFH nor NEK1 or TBK1 are differentially methylated between mutation-positive ALS-discordant twins. The methylation β-values of both the relative location of the promoter proximal regions and exon 1 for NEFH ((a), upper left), NEK1 ((b), upper right), and TBK1 ((c), bottom) plotted as a function of corresponding probe IDs. Methylation status of each gene was identified using EPIC arrays. Methylation profile of the promoter regions and exon 1 of NEFH, NEK1, and TBK1 does not indicate any pattern of differential methylation between affected and unaffected co-twins in the corresponding twin pairs, concordant for NEFH p.R352S, NEK1 p.M545T, and TBK1 p.I73L, respectively.
Pairwise correlations between individual samples and the gold standard.
The gold standard is defined by averaging the beta values across the samples from the largest blood data set. The accepted threshold of R2 = 0.800.
Global changes in CpG methylation across the twin pairs
After confirming the twin correlation on global CpG methylation levels using the methylation data, as well as sample sex and twin IDs (Supplemental Figure 2), we explored any relationship between genome-wide DNA methylation and chronological age within- or between MZ twin pairs (Figure 2a). With the mean levels of global DNA methylation being almost the same across all 14 samples, there appeared to be no particular trend associating genomic methylation on a broad scale with age, which also seems to be independent of disease status. Then, we sorted the data by sex and repeated the experiment, only to see the same outcome (Figure 2b): there is likely no relationship between sex and CpG methylation at the genome level, with or without disease status involved. Taken together, our observations on global alterations in CpG methylation across the twin pairs are mostly in agreement with the current literature.

Global changes in CpG methylation across the twin pairs. (a) An attempt to explore any relationship between genome-wide DNA methylation and chronological age within- or between MZ twin pairs revealed that there is no particular trend associating genomic methylation on a broad scale with aging regardless of disease status. All twin pairs are ranked by chronological age. N(un)affected = 7. (b) It is also clear that there is no relationship between gender and CpG methylation at the genome level, with or without disease status involved. N(un)affected, female = 3, N(un)affected, male = 4. (c) Intra-pair variability analysis performed to identify probes with a sufficient (ie, Δβ-value ⩾ 0.10) intra-pair variability shared by all twin pairs in our data set revealed altered DNA methylation associated with the GRIK1 gene. Also, the twin pairs 1, 6, and 7 accommodate the highest number of pair-specific probes with sufficient intra-pair variability, with 6 being the most extreme. (d) Comparisons of blood cell type composition estimates using neither basic univariate (ie, wilcoxon test) nor more advanced multivariate statistics (ie, the Robust Partial Correlations (RPCs)) approaches reported differentially methylated cell types by disease status. Yet, an increase in B cell and granulocyte levels is apparent in ALS-affected co-twins. ns: not significant. N(un)affected = 7.
Intra-pair variability analysis revealed altered DNA methylation associated with the GRIK1 gene
A consistent body of previous literature revealed that intra-pair methylation differences in disease-discordant MZ twins are generally low (⩽0.20) at the probe level due to the matched genetic make-up and shared rearing environment between co-twins.1,4,5,34 Based on this observation, we tested for probes with a sufficient (ie, Δβ-value ⩾ 0.10) intra-pair variability shared by all twin pairs in our data set (Figure 2c). We identified cg26023019 with a Δβ-value above the threshold, which maps to the GRIK1 gene, which was previously associated with ALS in a group of independent ALS-CNV studies.35,36 Lastly, after estimating blood cell type compositions from ALS and non-ALS whole blood samples, we performed a pair of DMCT analyses (Figure 2d). Using neither a particular linear regression framework described in the Methods section nor individual pairwise comparisons of each cell type between ALS and non-ALS samples we could detect DMCT by disease status. Nevertheless, elevated levels of B cells and granulocytes were observed in ALS-affected co-twins, as published before. 3 As a final point, we checked to see whether the intra-pair correlations computed on CpG methylation decline with biological age, as previously reported in the literature4,37 for MZ twin pairs. As expected, a downtrend line suggesting a decrease in the correlation with biological aging was observed (Supplemental Figure 2c), implying the importance of adjusting for age in estimating the relationships between CpG methylation patterns and disease status.
Probe-level differential methylation analysis
To analyze the methylation data from ALS-discordant MZ twins at the probe level using the case co-twin design, we used a mixed-effects model with fixed effects of age, sex, C9orf72 status, and cell type estimates, with random effects of batch (Supplemental Figure 3). As indicated above for biological age, though discordant MZ twins are matched for biological factors specific to individual pairs, the effects of these factors on the within-pair difference in CpG methylation patterns should be accounted for,4,5 which is addressed in the Discussion section. Considering that the intercept is the key term of the analysis as the mean level of intra-pair methylation differences per probe when all fixed effect variables are set to zero, we plotted the frequency distribution of the intercept estimates obtained from the regression analysis. In line with the literature, 38 they followed a clear pattern of normal distribution (Figure 3a), just as the M-values. Ultimately, a total of 35 probes were found significant at an FDR ⩽ 0.05 with varying levels of significance (Supplemental Figure 4). In an attempt to visually assess their degree of variation in methylation levels by disease status, we observed a heterogeneous profile of CpG methylation for the DMPs (Figure 3b). Subsequently, each of the underlying assumptions of our linear model was evaluated using a selected subset of probes for the quality of the model fit, performance characteristics, and diagnostic accuracy. From a technical aspect, a remarkable robustness of LMMs has been confirmed in the literature,39,40 except for small biases in estimated parameters stemming from missing predictors or random effect components, even if the majority of the assumptions are violated. Our results revealed, though using a subset of probes, that the majority of the assumptions were met (Supplemental Figures 5 and 6). Next, we checked to see whether these DMPs are capable of distinguishing between ALS and non-ALS samples. The results of different methodologies converged on a low power of discrimination between sample groups given just these significant probes (Figure 3c and d). When the DMP-annotated genes were clustered based on the direction of the change in methylation, 2 groups emerged: (a) ATG9A, BCAT1, BARX2, C1orf103, CYGB, FES, MYH9, PARVB, and RNF4 as hypomethylated genes, (b) ATP6V0B, BOC, BUD31, C3orf75, FEZ2, IL17D, KILLIN, MEGF6, OSBPL10, PDAP1, PTEN, RSPO4, SLX4, UBLCP1, and USP20 as hypermethylated. Though not significant (FET P-value: .0699), protein association network analysis of these 24 genes performed using the STRING platform revealed the enrichment of a part of the ubiquitin (Ub)-proteasome pathway (UPP) of protein degradation, which is known to be dysfunctional in ALS.41,42 On the other hand, “autophagy” and “axon guidance” were identified as the top deregulated signaling pathways (q-value = 0.012 and 0.020, respectively), and “actin cytoskeleton reorganization” was identified as the top enriched biological process (q-value = 0.003) in these DMP-annotated genes which has been strongly associated with ALS pathogenesis.43-45 To detect differences up to 20% and 50% in CpG methylation between affected and unaffected co-twins with a minimum number of 7 subjects in a sample group based on the power of sample calculations, the calculated power of making a statistically significant comparison at an FDR ⩽ 0.05 turned out to be low, as expected. Yet, it was also observed for target differences of 20% and 50% that the probability of detecting at least 1 true positive significant probe out of 35 DMPs identified between affected and unaffected co-twins is about 10% and 61%, respectively given the current sample size (Supplemental Figure 7). In other words, approximately 4 of the DMPs identified within the scope of this study are expected to be true positive results with ⩾20% methylation difference between sample groups. Interestingly, our ROC analysis revealed using a classical AUC threshold of 0.70 that there are exactly 4 probes (Figure 4a) out of 35 DMPs (Supplemental Figure 8) that appear to classify the sample groups well. While 3 of these candidate probes are located in the “OpenSea clusters” of the genome, one of them with an AUC value of 0.755 is located near the TSS of the IL17D gene, which is implicated in positive regulation of cytokine production during inflammatory response 46 (Figure 4b). Overall, our probe-level differential methylation analysis appears to provide a small but biologically relevant group of genes that are differentially methylated within the context of ALS.

Probe-level differential methylation analysis identified 35 DMPs with varying levels of methylation changes between conditions. (a) As a QC step for the statistical approach (ie, a linear mixed-effects model) used in the differentially methylated probe analysis, the frequency distribution of the intercept estimates obtained from the regression analysis follows a clear pattern of normal distribution. (b) The degree of variation by disease status in methylation levels of the 35 DMPs identified within the scope of this work is highly variable: while there are DMPs with clear differences in methylation levels between ALS and non-ALS samples, there are also DMP that have roughly similar levels of methylation by disease status when raw βvalues are used for visual assessment, implying the importance of adjusting for key covariates such as age and gender in estimating the relationships between CpG methylation patterns and disease status. N(un)affected = 7. (c and d) When we tested whether these DMPs are capable of distinguishing between ALS and non-ALS samples, the results of different methodologies (ie, a heatmap (c) or a PCA (d) of raw methylation levels) indicated a low power of discrimination between sample groups given just the significant probes. In other words, methylation was roughly similar across samples for most DMPs, and samples did not seem to cluster by disease status. PC1 in (d) explains 34% of the variation in the input data of DMPs.

In silico validation of DMP analysis results indicate differential methylation of IL17D in ALS blood. (a) The ROC analysis results of 35 DMPs using a classical cutoff value at AUC ⩾ 0.70 identified 4 probes that appear to classify the sample groups well, just as expected after the results of the power of sample calculation. The corresponding probe IDs are shown in red at the top of each plot. N(un)affected = 7. (b) Graphical representation of the DMP with the ID number “cg13993853” overlapping the IL17D promoter region, as indicated by the adjacent activating epigenetic marks (eg, CpG islands, H3K27Ac, and DNaseI hypersensitivity), as well as the neighboring TF binding sites. Unlike the other 3 candidate probes that are located in the “OpenSea clusters” of the genome, this probe with an AUC value of 0.755 is the only candidate probe that is located near the TSS of a gene.
Region- and block-level differential methylation analyses
After the data have been regrouped according to their disease status, ignoring the twin structure used for the DMP analysis, we also conducted differential methylation analyses at 2 different levels, namely genomic region (ie, a 1.5 kb genomic window) and genomic block (ie, a genomic bin up to 250 kb). In this context, each block contains hundreds of neighboring intergenic CpG sites with similar methylation profiles. As can be seen in Supplemental Figure 9, the lack of significant deviations from the baseline value between affected and unaffected populations is apparent even when the smoothed group means for the top-ranking region are compared. The same holds for the DMB analysis: no genomic blocks were identified differentially methylated between the sample groups. Overall, ALS-discordant twin studies with relatively small sample sizes seem to work better for probe-level than region-level methylation analysis.
Copy number variation analysis
Finally, we performed a copy number variation (CNV) analysis using the genome-wide CpG methylation profiles at the probe resolution between ALS and non-ALS samples. The copy number variations (gains or losses) both between sample groups and of each affected sample with respect to the mean of the unaffected sample group failed to yield a consistent pattern for disease status (Supplemental Figure 10).
Discussion
In this study, our goal was to provide further biological insight into ALS pathogenesis at the DNA methylation level using whole blood samples of ALS-discordant MZ twins. We have shown that a few CpG locations in the genome carry differential methylation marks associated with ALS. In addition, we observed differential methylation of neither genomic bins of different sizes nor blood cell types associated with ALS in our data set. Nevertheless, global patterns of genomic methylation across the twin pairs, including an increasing intra-pair discrepancy in methylation age and a decreasing intra-pair correlation on CpG methylation with chronological aging, a low degree of variation in within-pair CpG methylation, and elevated levels of B cells and neutrophils in ALS-affected co-twins, were all in agreement with the current literature. Our results imply that the methylation landscape of whole blood in ALS is a dynamic and highly heterogeneous network at the genome level, with only some commonly found perturbations and affected pathways contributing to the disease.
As rarely as these study subjects can be found, our twin set presumably harbors invaluable information on the molecular etiology of ALS, independent of the genetic mutations in the ALS gene each twin pair carries. More specifically, the ultimate reason why we chose a study design as such, that is, with identical twins that have remarkably different phenotypes, was because co-twins in a pair are matched for familial factors, including but not limited to baseline genetic sequence, age, sex, and rearing environment, thereby reducing the variability between pairs of subjects, but are still discordant for the disease. This leaves room for epigenetic components that might be involved in the disease pathogenesis. We observed rather consistent methylation profiles of the mutated ALS genes between co-twins in the mutation-positive twin pair, implying causative mutations detected in each twin pair are independent of differential CpG methylations identified within the scope of this study.
Another limitation of the current work relates to the LMM that was found to produce heavily inflated P-values (likely a high number of false positives 47 ) within the context of the probe-level differential methylation analysis. As statistically precarious as this situation may be, such inconsistencies can be explained in the context of our experimental design, as well as our analysis workflow, independent of modeling errors. However, it should also be noted that as we preferred the case co-twin design to the classical twin design for DMP analysis chiefly due to our initial experimental design, the whole procedure of data preparation and processing mentioned in the “Differentially methylated probe (DMP) analyses” part of the Methods section, composed mainly of transformations, value adjustments, selection of predictor variables (both fixed and random effect covariates) in the LMM, model fitting, and extraction of key metrics for downstream analysis, has been planned and performed based on the developers’ guidelines. 5 This is also evident by the total number of DMPs identified in this work, which is in line with many other discordant twin studies in the literature.34,48,49 Back to the potential reasons for inflated P-values, outlier probes in a data set are widely known to cause instability in statistical tests, which can result in exceptionally high standard errors, and thereby, inflated P-values. The preprocessQuantile() function used for data normalizations during preprocessing stage of the analysis workflow also fixes outliers, but only those that have low (ie, close to zero) signal intensities, not high. In this regard, it is likely that the significant results are, to some extent, driven by technical artifacts or outliers. In fact, only 4 of the 35 DMPs identified in this study could be validated using ROC analysis, just as predicted using the power of sample calculations as the total number of true positive findings from DMP analysis with ⩾20% methylation difference between sample groups. This is why we decided to prioritize the genomic locations associated with these 4 DMPs over those associated with all DMPs while making biological interpretations of our results. Another point is that we already performed a comprehensive search trying most of the combinations of predictor variables to find the subset with the best evaluation criterion using not only simple linear model but also mixed-effects model, with the MDS/PCA results in mind, before constructing the final version of our multivariate regression model. In this respect, singularity issues and multicollinearity were also taken into account as selection criteria. Lastly, our power calculations demonstrated that the small sample size we had for the EWAS presented here made it rather difficult to obtain any significant results (Supplemental Figure 7). So, there is also a chance, though not the most likely scenario, that small sample sizes lead to such statistical biases as an inflated type-1 error, as reported in many different clinical trial studies before. 50 Taken together, the statistical approach used in this work, though not minimizing the risk of false positive findings being reported together with true positive results, allowed us to identify a small subset of DMPs that were later validated bioinformatically.
An overly underestimated concept in discordant MZ twin studies is the intra-pair dynamics of epigenetic drift. Epigenetic drift can be defined as the divergence of the epigenome with biological age due to stochastic alterations in methylation. 51 This paradigm has often been overlooked especially when the methylation changes are investigated, with chronological age being part of the equation, in identical twins who do not share the disease phenotype between co-twins. It is critical to note that the effect of the age-associated intra-pair difference in DNA methylation status must not be disregarded in such studies though the biological factors specific to individual pairs are matched out in MZ twins. 52 Empirical evidence confirming this effect came from the observation that twin correlation on genome-wide DNA methylation declines with age (Supplemental Figure 2). If this increasing divergence of the epigenome with age actually roots from stochastic alterations, then co-twins in identical twin pairs should be affected differently along the natural course of epigenetic drift, leading to diminishing within-pair twin correlations on global DNA methylation. From a technical perspective, this approach might as well serve as a QC check in MZ twin studies and be used to monitor sample compliance.
Considering our analysis approach for probe-level differential methylation, one could easily claim that age and sex are matched out in the case co-twin design and that including them as predictor variables in the final model seems excessive. Yet, such an argument does not apply to the EWAS as: (a) intra-pair differences in the epigenome are likely to be higher in older twin pairs, just as mentioned above, (b) it has been known that intra-pair epigenetic differences in particular regions of the genome vary by sex. 5 As a result, fixed effect covariates as such can improve the regression analysis considerably when included in the model. From a practical standpoint, the regression models we tested in the absence of such predictors did not seem to yield any better results, either. Further examples can be found in other studies with similar experimental designs in the literature.4,34,48
The only probe with sufficient (ie, Δβ-value ⩾ 0.10) intra-pair variability shared by all twin pairs in our data set, cg26023019, was found to be hypermethylated in affected co-twins and map to GRIK1 which is involved in the excitatory neurotransmitter receptors in the mammalian brain. The glutamate signaling pathway is known to be intimately related to both acute and chronic neurodegenerative disorders, including ALS. 53 In this regard, several lines of evidence also suggest that excessive activation of the glutamate receptors results in excitotoxicity.53-55 With the GRIK1 gene being hypermethylated (or presumably transcriptionally suppressed) in ALS samples, our findings implicate the activation of a compensatory mechanism responsible for restoring the overly active glutamate signaling back to normal levels in ALS. Further research is needed to confirm this hypothesis and to understand the potential role of GRIK1 in ALS biology.
Finally, 1 of 4 DMPs identified within the scope of this study and later verified by ROC was found to be hypermethylated in affected co-twins and located near the TSS of the IL17D gene, which is implicated in positive regulation of cytokine production during an inflammatory response. Interestingly, recent evidence has shown that IL17 directly affects human motor neuron survival and directly contributes to motor neuron degeneration in ALS.56,57 Considering that IL17D is vastly expressed in various brain regions 58 and that blood is a reasonable surrogate for brain tissue in DNA methylation studies, 59 IL17D has the potential to serve as a blood-based biomarker.
Conclusion
Despite the small sample size of the current study, we identified a group of DMPs that might serve as potential epigenetic loci for future ALS studies. Especially for the studies investigating different phenotypes (eg, fast vs slow ALS progressors) irrespective of the genetic cause underlying the disease, testing for these DMPs might provide meaningful biological outcomes and could further validate the findings of the current study.
Supplemental Material
sj-docx-1-gae-10.1177_25168657231172159 – Supplemental material for DNA Methylation Analysis in Monozygotic Twins Discordant for ALS in Blood Cells
Supplemental material, sj-docx-1-gae-10.1177_25168657231172159 for DNA Methylation Analysis in Monozygotic Twins Discordant for ALS in Blood Cells by Volkan Yazar, Wolfgang P Ruf, Antje Knehr, Kornelia Günther, Ole Ammerpohl, Karin M Danzer and Albert C Ludolph in Epigenetics Insights
Footnotes
Acknowledgements
We thank Ms. Julia Kühlwein (M.Sc.) for the assistance with preparation of patient samples and data generation.
Declaration of Conflicting Interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Deutsche Forschungsgemeinschaft (DFG) Emmy Noether Research Group DA 1657/2-1.
Authors’ Contributions
VY—methodology, data analysis in all aspects, visualization, data interpretation, writing original draft (major), writing review and editing. WPR—medical diagnosis of patients, study design, data generation, data interpretation, writing original draft (minor), writing review and editing. AK—acquisition and preparation of patient samples. KG—acquisition and preparation of patient samples, study design. OA—genotyping of patients, conceptualization, intellectual input. KMD—intellectual input, funding acquisition, study design. ACL—conceptualization, funding acquisition, intellectual input, study design. All authors reviewed and approved the manuscript.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
