Abstract
This commentary discusses the strengths and limitations of utilizing the Mendelian randomization (MR) approach in Parkinson’s disease (PD) studies. Epidemiologists proposed to employ MR when genetic instruments are available that represent reliable proxies for modifiable lifelong exposures which elude easy measurement in studies of late onset diseases like PD. Here, we are using smoking as an example. The great promise of the MR approach is its resilience to confounding and reverse causation. Nevertheless, the approach has some drawbacks such as being liable to selection- and survival-bias, it makes some strong assumptions about the genetic instruments employed, and requires very large sample sizes. When interpreted carefully and put into the context of other studies that take both genetics and the environment into consideration, MR studies help us to not only ask interesting questions but also can support causal inference and provide novel insights.
When in 1986 Katan first suggested using genetic information on polymorphisms of apolipoprotein E (
In the early 2000 s, with the advent of affordable genotyping, the MR approach gained momentum in applied research. More recently, it evolved to an approach using combinations of multiple single nucleotide polymorphisms (SNP) as genetic markers because it became possible to generate genome-wide association (GWA) data for tens of thousands of individuals. Epidemiologists were the early champions of the MR approach [2, 3] as they saw the promise of genetic instruments representing reliable proxies for modifiable exposures that elude easy measurement and are sorely needed to study late onset diseases like PD. Furthermore, gene-environment interaction studies were already in vogue among epidemiologists and made it easier to conceptually integrate MR in observational studies [4, 5].
The assumptions necessary for MR to allow us to use genes as proxies for environmental exposures as well as the limitations of doing so have been widely discussed in the literature and by now are fairly well understood [2, 6–8]. Briefly it assumes that 1) SNPs used as instrumental variables are indeed associated with the exposure and preferably serve as a strong instrument; 2) SNPs should not be associated with genetic (confounding) factors known as population stratification; and 3) SNPs should only affect the outcome of interest through the exposure, i.e., there should be no pleiotropy of the genetic variants causing multiple outcomes. The last assumption is the one that recent studies have paid attention to most extensively as it is not only most likely to be violated but a number of statistical tests have been developed to help examine whether pleiotropic effects exist, e.g., MR-PRESSO [9], MR GENIUS [10], and MR-RAPS [11]. It is also important to remember that MR studies are data greedy needing tenth of thousands of subjects to generate instruments and as many subjects to test associations with the disease as the genetic instrument is usually very weak and explains no more than a few percent of the variance of the true exposure [8, 12]. Thus, this approach relies on enormous consortium efforts in data sharing that only recently have become available in PD and are generally limited to the sharing of genetic data as environmental data may not exist and also are harder to harmonize across studies.
The roles that various exposures play in PD have been studied with MR, including but not limited to body mass index [13, 14], urate [15–17], dairy intake [14, 18], alcohol and/or coffee consumption [13, 20], menopausal age [21], and vitamin D [22]. Here, we will discuss the example of smoking and PD as smoking has been the factor most studied with MR in relation to PD and illustrates some of the issues encountered when interpreting MR results. Observational studies have consistently shown an inverse association between smoking status and PD, where smokers show a decreased PD prevalence, with the strongest associations seen for current smoking [23–26]. Ease of quitting to smoke has been suggested as a potential prodromal feature in PD, thus invoking the potential for reverse causation [27]. In addition, there is always a potential for confounding and survival bias even though these biases seem less likely to explain these negative associations [24]. As conducting a randomized clinical trial of smoking would certainly not be ethical, MR seems a possible solution to gain additional insights from human studies. A majority of MR smoking and PD studies suggested protective effects for the genetic instruments with ORs around 0.75 for smoking versus non-smoking status, thus concurring with the observational study results [13, 28–30]. Studies unable to find an association were either underpowered [31] or reported smaller size negative associations that were not statistically significant [20, 32].
So, can we stop here and conclude that being a smoker is causally linked to PD? Not if we want to better understand what about the act of smoking may confer protection. The agent in cigarettes hypothesized as most likely protective of PD is nicotine, as animal studies suggested increased neuronal survival with nicotine treatment [33–36]. However, while some observational studies suggested protective effects for nicotine [37, 38], none of the randomized human trials conducted with PD cases have been encouraging as treatment with nicotine did not decrease PD symptoms or delay progression [39–43]. Several of the same MR studies that found smoking status to be negatively associated with PD also investigated smoking intensity or initiation, but, confusingly, they did not find associations for these smoking-related characteristics and PD [13, 32]. One study was unable to exclude non-smokers [13], which may have obscured associations as we would expect smoking intensity to be linked to PD only among actual smokers but for many PD studies used to conduct MR analyses smoking data does not exist. Also, when conducting subset analyses by smoking status, statistical power is reduced. Moreover, stratification by smoking status may even introduce another (collider) bias, although this is not considered a major problem [6, 44–46]. Additional support for a lack of an association between smoking intensity or initiation and PD came from two MR-phenome wide association (PHEWAS) studies performed with genetic instruments for smoking initiation [47], and smoking intensity [48], and those also did not see associations with PD. This raises the question of why the implied protection from PD is not dose dependent in smokers, as one would expect if components in cigarette smoke in fact contribute to neuroprotection.
To shed further light on this question, some MR studies also assessed the role of smoking cessation in PD. One study reported negative associations between genetic variants linked to smoking continuation and PD [20], another found a suggestive positive association between smoking cessation (former vs. current smoking) and PD [32]. A very large observational cohort study found that the inverse association between smoking status and PD became null among those able to quit smoking more than 20 years before diagnosis [25]. Does this suggest that the negative association between smoking status and PD is specific to those unable to quit, or does smoking mainly exhibits its effects on PD risk in the 20 years before diagnosis? Unfortunately, the two MR studies did not stratify by smoking status, which would help shed light on whether the association between the genetic instrument for smoking cessation and PD is specific to smokers as we would expect or are affected by pleiotropy after all.
It has been suggested that smoking status is a proxy for other behavioral traits related to PD risk. Thus, one MR study examined whether smoking reflects a propensity to engage in “risky behaviors” and identified a positive association between a genetic predisposition for high risk tolerance and PD but a negative association between ever smoking and PD [32]. Another study reported both a positive association of neuroticism on smoking initiation and a protective effect of smoking initiation on the risk of PD [29]. However, they were unable to identify an association between neuroticism and PD [29]. Thus, overall, these findings do not indicate that individuals who according to genetic instruments are more likely to initiate smoking or be ever smokers to also have genetic liability for certain behavioral traits that are associated both with smoking and also with PD risk.
MR analysis based on SNPs that are thought to be related to levels of a chemical agent can lead to biased results if exposure to the substance triggers paradoxical avoidance behaviors [49]. For example, a SNP associated with the conversion of alcohol to acetaldehyde (a recognized mutagen and animal carcinogen) that causes high acetaldehyde levels would be expected to lead to higher risk of head and neck cancer. However, individuals with the genetic phenotype that increases acetaldehyde levels are less likely to actually drink alcohol due to the unpleasant effects of high acetaldehyde levels, effectively suggesting that high levels of acetaldehyde may be protective of the cancer if one misses to control for the amount of alcohol intake [50]. Here, the genetic variants influence the environment, and thereby the overall exposure level by interacting with each other. Within MR analysis, however, a major assumption is that the genetic variants influence the exposure, independently from the environment. This might be especially problematic for PD in a situation where genetic variants influence nicotine and/or dopamine receptor genes [6, 49].
A benefit of performing an MR analysis is that contrary to most observational studies, it is relatively straightforward to replicate findings as more genetic data are generated, including summary statistics, that have become widely available in easy searchable databases (e.g., the NHGRI-EBI catalog) [51]. False-positive findings or methodological errors will become easier to identify as these data repositories grow. Raw data available to researchers already through portals such as NIH’s dbGaP [52] would become even more valuable resources, if demographic and other phenotypic or environmental data were available on request for additional subset analyses. While MR studies can help with identifying exposures that are potentially causally related to PD, underlying mechanism can generally not be derived from this. This is because, MR analyses are best-powered with genome-wide summary statistics where a large number of SNPs represent the exposure of interest. Most of these SNPs, however, have unknown functionality and often are within non-coding regions. MR analysis restricted to functional SNPs only can further the understanding of underlying mechanisms. A recent study relied on SNPs associated with gene expression for druggable proteins to predict potential treatment efficacy in PD [53]. However, this type of analysis needs extensive prior information on molecular and cell-biological mechanisms for each genetic variant employed.
Another issue that may affect MR studies of smoking, is that there have been dramatic changes in smoking behavior as policies against smoking have been implemented. Hence, the associations between the genetic variants and smoking status might change over time or remain more valid in certain subgroups of the population.
Has MR helped us lift the smoke on whether or not smoking protects against PD? The answer is unfortunately not, as they have not been able to provide conclusive evidence that smoking is causally related to PD. While MR analysis can address confounding and reverse causation, it is still liable to selection- and survival biases as well as paradoxical changes in behavior caused by genetic variations, and the ever-looming problems of pleiotropy. Nevertheless, when employed carefully and environmental in addition to genetic data become available, MR studies can be very helpful in answering interesting questions and providing novel insight. We are looking forward to more studies that combine the information on both genes and the environment to enlighten us about modifiable environmental exposures causing PD.
Footnotes
ACKNOWLEDGMENTS
This work was supported by the National Institutes of Health (NIH) grants R01-ES010544, R01-ES013717, U54-ES012078, P01-ES016732, and P50-NS038367; a Community Fast Track grant by the MJFox Foundation; a pilot grant by the American Parkinson Disease Association; and from the Parkinson’s Alliance. CK is supported by the NIH grant F32AG063442.
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
