Abstract
Background:
Blood plasma proteins have been associated with Alzheimer’s disease (AD), but understanding which proteins are on the causal pathway remains challenging.
Objective:
Investigate the genetic overlap between candidate proteins and AD using polygenic risk scores (PRS) and interrogate their causal relationship using bi-directional Mendelian randomization (MR).
Methods:
Following a literature review, 31 proteins were selected for PRS analysis. PRS were constructed for prioritized proteins with and without the apolipoprotein E region (APOE+/–PRS) and tested for association with AD status across three cohorts (n = 6,244). An AD PRS was also tested for association with protein levels in one cohort (n = 410). Proteins showing association with AD were taken forward for MR.
Results:
For APOE ɛ3, apolipoprotein B-100, and C-reactive protein (CRP), protein APOE+ PRS were associated with AD below Bonferroni significance (pBonf, p < 0.00017). No protein APOE- PRS or AD PRS (APOE+/–) passed pBonf. However, vitamin D-binding protein (protein PRS APOE-, p = 0.009) and insulin-like growth factor-binding protein 2 (AD APOE- PRS p = 0.025, protein APOE- PRS p = 0.045) displayed suggestive signals and were selected for MR. In bi-directional MR, none of the five proteins demonstrated a causal association (p < 0.05) in either direction.
Conclusion:
Apolipoproteins and CRP PRS are associated with AD and provide a genetic signal linked to a specific, accessible risk factor. While evidence of causality was limited, this study was conducted in a moderate sample size and provides a framework for larger samples with greater statistical power.
Keywords
INTRODUCTION
Over 50 million people currently live with dementia worldwide, a figure forecast to rise to 152 million by 2050 as global populations live longer [1]. The most common form of dementia is late-onset Alzheimer’s disease (AD) [2], where individuals suffer severe, progressive cognitive decline and a range of neuropsychiatric symptoms [3] from their mid 60s until death. AD is a highly heritable [4], polygenic trait [5, 6] with a wide range of known genetic and environmental risk factors [7, 8]. However, the precise etiology of AD remains unexplained [9] and no disease altering treatments exist [10, 11].
Endophenotypes representing traits closer to a hypothesized biological risk factor can help unpack AD etiology and provide accessible targets for intervention. For example, a wide range of blood plasma proteins have been associated with AD [12] and provide a potential avenue for disease diagnosis and treatment. Promisingly, prediction of AD diagnosis using plasma levels of amyloid-β (Aβ) [13] and tau [14] is improving toward clinical level. However, measuring known AD protein neuropathological end-products (Aβ and tau) provides limited explanation of how other plasma proteins may mediate AD risk. For example, Kiddle et al.’s systematic review identified that four proteins (apolipoprotein E (APOE), alpha-2-macroglobulin, complement C3, and alpha-1-antitrypsin) were associated with AD in at least five cohorts [12]. APOE and the complement pathway have been consistently implicated in functional and genetic studies of AD risk [15, 16] and recent Mendelian randomization (MR) studies suggest lower levels of APOE and complement C3 in plasma may be causal for AD [17, 18].
One way to explore the role of plasma proteins further is to assess their genetic overlap with AD. Recent improvements in protein assay technology have enabled the creation of a genetic atlas for plasma proteins [19]. Over 3,000 proteins now have publicly available genetic summary statistics [19], allowing polygenic risk scores (PRS) to be constructed for individual proteins. PRS represent aggregate genetic propensity for a trait and so if associated with another trait imply a degree of shared genetics influences both traits. For example, a higher AD PRS has been associated with lower cognitive ability in individuals without dementia [20] and with increased levels of the promising AD biomarker p-tau181 [21]. If a protein PRS is associated with AD, this provides a genetic signal linked to a specific, accessible biological risk factor, something which remains a challenge for genome wide association studies (GWAS) [22]. PRS can also be calculated for individuals meaning protein PRS associated with AD could inform AD diagnosis prediction.
However, testing the association of PRS with a trait does not demonstrate causality. For example, a plasma protein PRS associated with AD may simply indicate shared genetic variants which effect traits or pathways unrelated to disease pathogenesis [23]. MR provides a method to test whether an exposure causally effects an outcome by using genetic variants as instrumental variables in a construct similar to a randomized control trial. This works because an individual’s genes are effectively randomized at birth enabling the creation of a quasi-intervention group who have a genetic disposition for an exposure [24]. In AD, MR has demonstrated the protective effect of higher cognitive ability and educational attainment in two large scale studies [7, 25] and has indicated several blood metabolites are on the causal disease pathway [26]. For plasma proteins specifically, MR has primarily been deployed as part of large scale non-targeted, phenome-wide MR analysis [27, 28]. For example, Zheng et al. identified 111 causal associations between 65 proteins and 52 disease-related phenotypes (p < 3.5×10–7) including sialic acid binding Ig-like lectin 3 (CD33) with AD, supporting previous GWAS results [29, 30]. While phenome-wide MR designs excel at providing a multi-trait matrix of potential causal signals, they lack the flexibility to unpack disease specific relationships. For example, 61 out of 62 proteins associated with AD at a p-value less than 0.05 in Zheng et al. were tested using the Wald Ratio with only 1 SNP as an instrumental variable [27]. Disease specific approaches have more flexibility to relax assumptions which can help increase statistical power and enable more robust statistical sensitivity analyses [23].
The primary objective of this study was to explore the genetic overlap between AD and plasma proteins using PRS and to assess whether individual plasma proteins play a causal role in AD etiology using MR (see Fig. 1 for illustrative overview of study design). This study identified a shortlist of plasma proteins from existing literature that have been robustly associated with AD or AD endophenotypes and have publicly available genetic summary data. PRS models were then created for each shortlisted protein and tested for association with AD across three consortium cohorts, Genetic and Environmental Risk in Alzheimer’s Disease (GERAD1), Alzheimer’s Disease Neuroimaging Initiative (ADNI), and AddNeuroMed (ANM). An AD PRS was also constructed with publicly available genetic summary data from the largest case ascertained AD GWAS to date [8] and used to test each protein for bi-directional association in ANM where individual level plasma protein data was available. Lastly, for plasma proteins with PRS that demonstrated significant associations with AD (see Materials and Methods) in one or both PRS analyses, two sample bi-directional MR was conducted to test for causality (Fig. 1).

Illustrative overview of study design outlining the four key steps in the study workflow: literature review, plasma protein PRS, AD PRS, and bi-directional MR.
MATERIALS AND METHODS
Plasma protein candidates
A shortlist of plasma proteins was compiled for analysis from previous AD discovery studies to represent known protein candidates for genetic exploration. A panel of 163 proteins previously associated with AD or AD related phenotypes from a systematic review of 21 non-targeted human AD blood protein discovery studies up to 2014 was used as the baseline [12]. An additional literature review was then conducted to cover the period from 2014–2019 and applied the same screening criteria (non-targeted discovery studies, except for panel based studies with > 100 candidates) [12] to generate the candidate list (further details in the Supplementary Material). Proteins were then selected for the shortlist if they passed two criteria. Firstly, the association replicated in > 1 study and, secondly, genetic summary data was available for the protein from Sun et al.’s human plasma protein GWAS [19]. At the time of analysis, Sun et al. was the largest, plasma protein GWAS using SomaLogic with publicly available data (downloaded and cross referenced from http://www.phpc.cam.ac.uk/ceu/proteins/).
Plasma protein data preparation
Plasma protein genetic summary data was acquired from Sun et al.’s GWAS of 3,622 proteins in 3,301 healthy participants from the INTERVAL study (further details in the Supplementary Material), a randomized trial of 45,000 blood donors across 25 centers in England [31]. Summary association results were made available on http://www.phpc.cam.ac.uk/ceu/proteins/ and were downloaded for proteins on the shortlist. Files were then uploaded to Rosalind, King’s College London’s high-performance computing facility, for further quality control (QC) and analysis. Further QC was conducted using R.3.6.0 and was based primarily on aligning to the protocol recommended for PRS analysis [32]. Specifically, remaining duplicates, non bi-allelic and non-target data overlapping variants were removed and SNP rsIDs were added and aligned with the target dataset (see https://github.com/AlexHandy1/ad-genetic-overlap-analysis for analysis scripts). Observed SNP heritability (h2SNP) and cross protein genetic correlation (rg) was estimated for each protein using linkage disequilibrium score regression (LDSR) with the Python LDSC package (Version 1.0.0) sourced from https://github.com/bulik/ldsc.
AD data preparation
Individual level genotype data for AD cases and controls was acquired from three consortium studies: GERAD1, ADNI, and ANM.
GERAD1 is a European consortium (https://gtr.ukri.org/projects?ref=G0902227) [33]. The GERAD1 sample comprised up to 3,292 AD cases and 1,223 controls. Genetic data from a subset of 4,515 participants was made available for this analysis prior to QC (further details in the Supplementary Material).
ADNI is a longitudinal, multi-center North American study initiated in 2004 [34] and now in its fourth wave (ADNI1, ADNI-GO, ADNI2, and ADNI3) collecting clinical, imaging, genetic, and biomedical biomarkers for AD. Genetic data from 1,674 participants (not including ADNI3) was made available for this analysis prior to QC (further details in the Supplementary Material).
The ANM is a European consortium, initiated in 2008 with the aim to establish biomarkers for AD [35]. The Dementia Case Register (DCR) is a follow-up of ANM, with UK subjects recruited from the Maudsley and King’s Healthcare Partners Dementia Case Register [36]. Genetic data from 1,063 participants from ANM (including DCR) was made available for this analysis prior to QC (further details in the Supplementary Material). In addition to full clinical and demographic data, 410 participants had plasma protein data. Plasma protein data was collected using SomaLogic’s multiplexed, aptamer-based assay (SOMAscan) with SOMAmers for 1,016 proteins. Normalized, log2 transformed residuals were used as the phenotype after regression with age, sex, batch, and 10 principal components.
For all three consortium datasets, standard genetic QC procedures were applied (removal of non-autosomal chromosomes, alleles with minor allele frequency < 1%, genotypes with call rate < 98%and Hardy Weinberg deviations at 1×10–5) based on protocol by Coleman [37] (further details in Supplementary Figure 1). The sample size remaining for analysis after these QC procedures were applied is reported in the results (see Tables 2 3).
Summary characteristics of GERAD1, ADNI, and ANM participants post QC
Summary characteristics of ANM participants with plasma protein data post QC
Plasma protein PRS
A PRS was constructed with the post QC Sun et al. GWAS data (“base data”) for each shortlist protein using PRSice-2 software (version 2.3.1e) [38]. PRSice automatically removes strand ambiguous SNPs and SNPs that are not present in both base and target data. SNPs were further “clumped” if LD (measured in r2) was > 0.1 within a window of 250 kilobases with the SNP with the lowest GWAS p-value within each region retained. After clumping, PRS were calculated with SNPs under 10 pre-defined p-value thresholds (5e-08, 5e-05, 5e-04, 0.0001, 0.001, 0.01, 0.05, 0.1, 0.2, 0.5, 1). PRS at all 10 thresholds for each protein were then independently tested for association with each individual AD dataset (GERAD1, ADNI and ANM) using logistic regression. Age, sex, and 7 principal components to control for population stratification were included as covariates to create a null model (covariates only), which was subtracted from the full model (covariates and PRS), to provide a Lee adjusted r2 [39] assuming an AD prevalence of 7%[40].
Protein PRS, for each selected protein, were tested for association with AD with the APOE region (992 SNPs within 750 kilobases of rs429358 on chromosome 19) included (APOE+) and excluded (APOE-) for all participants. APOE SNPs were removed to test whether protein PRS associations held without the known, strong effects of APOE alleles on AD risk [41]. As a secondary analysis to explore sex and age specific associations, PRS were constructed and tested for subgroups stratified by male, female, and 70 years and over (≥70) to test for gender and age specific effects. Lastly, random-effects meta-analysis was conducted on the PRS results at each p-value threshold across the three AD datasets using R and a restricted maximum likelihood (REML) estimator from the metafor R package [42]. Meta-analyzed results were ranked by lowest p-value to identify the most significant PRS model threshold for each protein-AD association. A Bonferroni corrected p-value of 0.00017 (0.05 / (number of proteins x number of PRS p-value thresholds tested)) was estimated and used as the primary threshold for significance. Given this is a conservative threshold, if no proteins pass this threshold in all participants (APOE+ and APOE–), nominally significant proteins (p < 0.05) will be considered for MR analysis based on an assessment of their association strength (e.g., how close to Bonferroni threshold) and consistency (e.g., association across subgroups).
AD PRS
An AD PRS was constructed using the meta-analyzed stage 1 discovery results from Kunkle et al.’s AD GWAS [8] of 21,982 AD cases and 41,944 cognitively normal controls (“base data”) downloaded from https://www.niagads.org/datasets/ng00075. Kunkle et al. was selected to provide statistical power and a clinical phenotype as the largest case ascertained AD GWAS to date. QC was applied to the base data of 11,480,632 variants to remove NAs, variants without an rsID and non bi-allelic variants leaving 10,528,610 variants for PRS analysis. As rsIDs were already provided in the AD base data and aligned to GRCh37, non-target data overlapping variants were removed with PRSice during the PRS analysis. p-value thresholds and clumping configuration settings in PRSice were kept constant with the protein PRS analysis. No covariates were included as age, sex, batch, and population stratification were already controlled for in the production of the protein residuals phenotype which was used as the target data. This decision was taken to align with other proteomic studies integrating genetic and proteomic data [19] and to enable comparability. AD PRS were tested for association with each individual shortlist protein using linear regression and the PRS model with the best fit (measured by lowest p-value) was presented for each protein. As with the protein PRS, AD PRS were tested for association with and without APOE SNPs and secondary analysis was conducted on subsets stratified by gender (male, female) and age (≥70). Bonferroni correction (and a more liberal assessment of association strength) was estimated and applied to control for multiple testing as described in the protein PRS analysis (with number of tests adjusted for the number of proteins with individual level blood data available for analysis).
Bi-directional MR
MR analysis was performed using the MR Base R package [43] on a subset of proteins from the PRS analyses. Proteins were selected for MR if they had a p-value below Bonferroni significance in either PRS analysis (protein PRS to AD APOE+/–or AD PRS to protein APOE+/–) or if they were nominally significant (p-value < 0.05) in both directions (protein PRS to AD APOE+/–and AD PRS to protein APOE+/–). Given Bonferroni is a conservative threshold, if no protein passed Bonferroni significance, the strongest protein association with a p-value below 0.05 was also considered. Univariate MR was performed with each protein as the exposure and AD as the outcome. Genetic instrument SNPs for each protein were selected from Sun et al. at two p-value thresholds for analysis (5×10–8 and 5×10–6). The less stringent 5×10–6 threshold was applied to ensure SNPs were available for all proteins with the noted limitation of introducing potential weak instrument bias. Selected SNPs were then clumped within a 250 kb window at LD r2 < 0.001. F statistics were generated for each SNP (SNP-exposure effect size∧2/SNP-exposure standard error∧2) to test for weak instrument bias and excluded if < 10 [44]. The remaining SNPs were further pruned if they were associated with any of the other proteins or with AD directly (p < 5×10–8 in Kunkle et al. GWAS). This was implemented to exclude SNPs that may affect AD through a pathway other than the exposure protein (horizontal pleiotropy) [45]. SNPs were also removed in the APOE region (chromosome 19, base-pairs 4500000–4580000) as potential confounders that violate MR’s core assumptions, given their known association with AD [46]. Protein exposure SNPs and AD outcome SNPs were harmonized and tested with alleles assumed to be on the forward strand (no palindromic SNPs removed). As a secondary test, the MR analysis was also run with palindromic SNPs flipped and removed if non-inferable. Causal estimates were estimated using inverse variance weighted (IVW) two sample MR and sensitivities were tested with MR-Egger, weighted median and leave one out analysis (further details in the Supplementary Material). To test for causality in the opposite direction, this analysis pipeline was repeated with AD as the exposure (using AD SNPs with p-value < 5×10–8 from Kunkle et al. stages 1, 2, and 3) and each protein as the outcome.
An interactive web dashboard was built with R Shiny to present the full PRS and MR results (available at https://alexhandy1.shinyapps.io/ad-genetic-overlap-web-results/) with the key results reported herein.
RESULTS
Data preparation
The literature review provided 4 new studies [36, 47–49] adding 14 new proteins and bringing the total candidate protein list to 175. From the 175 candidate proteins, 31 passed the shortlist inclusion criteria (> 1 study replication, GWAS data available, see Table 1).
Protein shortlist for analysis ordered by number of studies replicated in from literature review
Sun et al.’s GWAS provided summary statistics for 3301 participants (see characteristics in Supplementary Table 1) covering 10,572,788 variants for each protein, with 5,210,103 variants included for PRS analysis after additional QC for this study (see Supplementary Table 2). Average h2 SNP across the proteins was 0.10; however, results were treated as indicative given the average standard error was 0.16 (including 8 proteins with h2 SNP less than 0) (see the Supplementary Material for further details).
For the AD cohort, 6,244 participants were available for analysis from GERAD1, ADNI, and ANM (see Table 2) with 5,218,413 overlapping variants included for PRS analysis after QC (see the Supplementary Material for further details).
In the AD PRS analysis where only ANM had plasma protein data, 410 participants with plasma protein data remained for analysis after QC (see Table 3).
PRS results
Plasma protein PRS
In all participants, APOE+ PRS for APOE ɛ3 (p = 6.5×10–21), apolipoprotein B-100 (APOB-100, p = 6.7×10–20), and C-reactive protein (CRP, p = 1.5×10–8) were associated with AD at Bonferroni significance (p < 0.00017) (see Table 4 and full results at https://alexhandy1.shinyapps.io/ad-genetic-overlap-web-results/). No other protein APOE+ PRS passed Bonferroni significance in all participants or subgroups. For APOE- PRS (992 SNPs within 750 kilobases of rs429358 on chromosome 19 removed), no proteins, including APOE, APOB, and CRP, passed Bonferroni significance in all participants or subgroups. In all participants, APOE- PRS for vitamin D-binding protein (VDBP) presented the strongest signal (p = 0.009) with 7 other proteins passing nominal significance (p < 0.05). The 3 proteins passing Bonferroni significance in APOE+ PRS (APOE ɛ3, APOB-100, and CRP) and the strongest APOE- PRS signal in all participants (VDBP) were deemed signals warranting further exploration in MR analysis.
Summary of proteins with APOE+ PRS associations from meta-analyzed logistic regression with AD in all participants below Bonferroni significance
*p-value from protein GWAS below which SNPs were included in PRS. PRS p-value threshold for most significant PRS model (based on meta-analyzed p-value) presented. **Estimated based on normalized, per standard deviation of PRS. ***Estimated from meta-analysis outputs using the formula
AD PRS
AD PRS models were tested for association with 26 of the 31 shortlist proteins in 1 ANM cohort (due to data availability) and in the same 3 subgroups (males, females and ≥70). In all participants and subgroups, no proteins were associated with AD APOE+ or APOE- PRS (992 SNPs within 750 kilobases of rs429358 on chromosome 19 removed) at Bonferroni significance (p < 0.00019, view full results at https://alexhandy1.shinyapps.io/ad-genetic-overlap-web-results/). In all participants, Haptoglobin presented the strongest association with AD APOE+ PRS (p = 0.0107), with CRP (p = 0.0108) the other protein to pass nominal significance. For AD APOE- PRS, Complement factor H was the strongest signal (p = 0.021) with 6 other proteins passing nominal significance. Insulin-like growth factor-binding protein 2 (IGFBP2) was one of these proteins (p = 0.025) and came close to Bonferroni significance in ≥70 (p = 0.00026). Given IGFBP2 was also the only protein to display a nominally significant association for APOE- PRS in both directions (AD APOE- PRS p = 0.025, protein APOE- PRS p = 0.045) it was selected for MR analysis.
Bi-directional MR results
MR analysis was conducted with five proteins, APOE ɛ3, APOB-100, CRP, IGFBP2, and VDBP to test for casual associations. Only two proteins (CRP and VDBP) had valid SNP instruments at 5×10–8 but all proteins had available SNP instruments at 5×10–6. No proteins passed nominal significance (p < 0.05) using IVW with SNPs selected at 5×10–8 or 5×10–6 (see Fig. 2 and view full results at https://alexhandy1.shinyapps.io/ad-genetic-overlap-web-results/). Overall, exclusion of non-inferable palindromic SNPs produced similar results across all methods. All instruments selected had an F statistic > 10 and there was limited evidence of horizontal pleiotropy (no Egger intercept p < 0.05) or heterogeneity between SNPs (no Cochran’s Q p < 0.05 for protein across multiple methods). There was also no evidence of reverse causality when AD was tested as an exposure.

MR results with protein as exposure and AD as outcome for IVM, MR-Egger and weighted median with alleles assumed on forward strand for harmonization with SNP instruments selected at 5×10–6.
DISCUSSION
Main findings
This study set out to identify if a shortlist of plasma proteins, previously associated with AD in observational studies, genetically overlap with AD by testing if PRS for these proteins were associated with AD. For proteins with evidence of genetic overlap, MR was conducted to test whether exposure to the protein was causal. The findings provide strong evidence that APOE ɛ3, APOB-100, and CRP genetically overlap with AD and, therefore, identifies a genetic signal linked to a set of specific, accessible risk factors. Moreover, this overlap appears to be driven by the APOE genotype, providing evidence that APOE’s strong genetic effect on AD [41] may be partially linked to plasma proteins. However, our MR analysis found no evidence of causality for APOE ɛ3, APOB-100, or CRP.
Apolipoproteins in plasma have been linked to AD risk through their role in regulating cholesterol [50], with increased total cholesterol (TC) associated with higher risk [51, 52] while increased high-density lipoprotein cholesterol (HDL-C) appears protective [53]. CRP has also been suggested as a causal factor in AD through its pro-inflammatory role in the immune response [54, 55]. Observational evidence is nuanced, with increased levels of CRP in midlife associated with multiple dementias in later life [56] contrasting with lower levels of CRP found in individuals who actively have AD [57].
The key advantage of MR is its ability to help disentangle causal associations from associations which could be triggered by numerous confounders [24]. Our null MR results for APOE ɛ3, APOB-100, or CRP align with larger MR studies that found no causal associations between AD and CRP [25], TC or HDL-C [58, 59], the hypothesized mechanisms of action for apolipoproteins. However, these results are not yet conclusive. A more recent MR meta-analysis found that both TC and HDL-C showed causal associations with AD [60] with Rasmussen et al.’s larger MR study also suggesting APOE in plasma is causal for AD [18]. Unfortunately due to data availability, our study was only able to measure APOE isoform ɛ3 rather than the higher effect isoforms ɛ2 and ɛ4 [61] suggesting our null MR result may be due to isoform rather than a lack of causality. All these findings support the need for a larger scale MR study measuring a consistent and historically comparable set of plasma protein exposures.
Independently of APOE, VDBP and IGFBP2 presented suggestive signals of genetic overlap with AD in our study. Research investigating the effect of VDBP and IGFBP2 on AD is less well developed. VDBP has been shown to attenuate Aβ aggregation in vitro and in mice [62] and two recent MR studies have identified a causal relationship between increased levels of VDBP and reduced AD risk [63, 64], with Larsson et al demonstrating particularly strong evidence from two large samples. In our MR analysis, VDBP did not demonstrate evidence of causality, however, our sample is smaller than Larsson et al and may have lacked statistical power. IGFBP2 has been shown to restrict the neuroprotective effects of insulin growth factors [65] and increased levels have been associated with higher AD risk and lower cognitive performance [66]. In our MR analysis, IGFBP2 did not demonstrate evidence of causality. To the best of our knowledge, this is the first study to conduct a targeted MR on IGFBP2 and AD, suggesting further research is required before making a strong inference on causality.
Limitations and future directions
Our study has three key limitations. Firstly, the initial proteins considered for analysis had to demonstrate prior association with AD and, therefore, represent only a subset of proteins in plasma. Whilst a targeted design was deliberate to lower the risk of false positives and support interpretability, it does increase the probability that causal proteins for AD were not included in the analysis. This exclusion risk was amplified by data availability constraints and a reliance on the SomaLogic platform, with only 53%of candidate proteins from the literature review possessing publicly available genetic summary data at the time of analysis. Most notably, alpha-2-macroglobulin and alpha-1-antitrypsin were each replicated in 6 studies in the literature review (the same number of replications as APOE) but were excluded due to a lack of genetic data. Moving forward, a planned outreach effort to aggregate summary data from other research groups and assay platforms (as demonstrated by [27]) supported by integrating new data [67] could improve protein coverage and increase statistical power.
Secondly, the statistical power to correctly identify true positives was limited across multiple steps of the analysis. Studies are now routinely conducted with samples of 100,000 + individuals for common human traits [68–70] in order to capture the small effect sizes of individual SNPs [71]. Sample size is particularly important for traits with low to moderate heritability where the effect size of individual SNPs is likely to be lower [72]. In this study, estimates of observed SNP heritability (h2 SNP) for proteins were themselves hampered by sample size (see https://github.com/bulik/ldsc/wiki/FAQ), but a median h2 SNP of 0.09 (interquartile range 0.006–0.22) suggests analyzed proteins were at most, moderately heritable. Moderate heritability weakens PRS and MR [73], given PRS and MR instrumental variables are a composite of genetic liability which is dependent on the heritability of a trait. This may partly explain the limited genetic overlap and low variance explained (R2) of AD by individual protein PRS (see https://alexhandy1.shinyapps.io/ad-genetic-overlap-web-results/). Low R2 may also reflect that proteins often exert their effects as part of large molecular networks rather than as individual entities [74]. Therefore, larger sample sizes and multivariate statistical tests are required to robustly corroborate the lack of causal protein-AD associations in this study.
Lastly, the nature of the phenotype data collected possesses several inherent limitations. Both protein and AD case target data were collected at a single point in time. For AD this means a proportion of controls may have developed into cases and for proteins it means observed variability over time [75, 76] is not captured. For AD there is the added limitation of using a clinical diagnosis as a phenotype. Clinical manifestation is heterogenous [77, 78] and in-life diagnosis can be inaccurate [79, 80] implying that adding endophenotype measures closer to causal biology (e.g., hippocampal volume) and symptomology (e.g., cognitive performance) may allow more precise association analysis.
Future analyses should, therefore, aim to expand protein coverage, increase sample sizes, and expand PRS and MR analyses to AD endophenotypes.
CONCLUSION
This study provides evidence that apolipoproteins and CRP PRS are associated with AD and identifies a genetic signal linked to a specific, accessible risk factor. However, none of the proteins tested in MR demonstrated evidence of causality. This study was conducted in a moderate sample size and may have lacked the statistical power to identify true causal associations. Therefore, this study provides a framework for future research to robustly interrogate protein causality in larger samples.
Footnotes
ACKNOWLEDGMENTS
This work was also made possible only through generous funding from key funding bodies. Petroula Proitsi is funded by Alzheimer’s Research UK. Jodie Lord is funded by the van Geest endowment fund. Richard Dobson is supported by the following: (1) NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London, London, UK; (2) Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome Trust; (3) The BigData@Heart Consortium, funded by the Innovative Medicines Initiative-2 Joint Undertaking under grant agreement No. 116074. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA; it is chaired by DE Grobbee and SD Anker, partnering with 20 academic and industry partners and ESC; (4) the National Institute for Health Research University College London Hospitals Biomedical Research Centre; (5) the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London; (6) the UK Research and Innovation London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare; (7) the National Institute for Health Research (NIHR) Applied Research Collaboration South London (NIHR ARC South London) at King’s College Hospital NHS Foundation Trust.
This study represents independent research additionally funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.
The authors acknowledge use of the research computing facility at King’s College London, Rosalind (
), which is delivered in partnership with the National Institute for Health Research (NIHR) Biomedical Research Centres at South London & Maudsley and Guy’s & St. Thomas’ NHS Foundation Trusts, and part-funded by capital equipment grants from the Maudsley Charity (award 980) and Guy’s & St. Thomas’ Charity (TR130505). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, King’s College London, or the Department of Health and Social Care.
GERAD1 Acknowledgments: Cardiff University was supported by the Wellcome Trust, Medical Research Council (MRC), Alzheimer’s Research UK (ARUK) and the Welsh Assembly Government. Cambridge University and Kings College London acknowledge support from the MRC. ARUK supported sample collections at the South West Dementia Bank and the Universities of Nottingham, Manchester, and Belfast. The Belfast group acknowledges support from the Alzheimer’s Society, Ulster Garden Villages, N. Ireland R&D Office, and the Royal College of Physicians/Dunhill Medical Trust. The MRC and Mercer’s Institute for Research on Ageing supported the Trinity College group. The South West Dementia Brain Bank acknowledges support from Bristol Research into Alzheimer’s and Care of the Elderly. The Charles Wolfson Charitable Trust supported the OPTIMA group. Washington University was funded by NIH grants, Barnes Jewish Foundation, and the Charles and Joanne Knight Alzheimer’s Research Initiative. Patient recruitment for the MRC Prion Unit/UCL Department of Neurodegenerative Disease collection was supported by the UCLH/UCL Biomedical Centre and NIHR Queen Square Dementia Biomedical Research Unit. LASER-AD was funded by Lundbeck SA. The Bonn group was supported by the German Federal Ministry of Education and Research (BMBF), Competence Network Dementia and Competence Network Degenerative Dementia, and by the Alfried Krupp von Bohlen und Halbach-Stiftung. The GERAD1 Consortium also used samples ascertained by the NIMH AD Genetics Initiative.
A proportion of data collection and sharing for this project was also funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Abbott; Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Amorfix Life Sciences Ltd.; AstraZeneca; Bayer HealthCare; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals Inc.; Eli Lilly and Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (
). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory of Neuro Imaging at the University of California, Los Angeles.
