Abstract
Neuropsychiatric disorders are complex conditions with poorly defined neurobiological bases. In recent years, there have been significant advances in our understanding of the genetic architecture of these conditions and the genetic loci involved. This review article describes historical attempts to identify susceptibility genes for neuropsychiatric disorders, recent progress through genome-wide association studies, copy number variation analyses and exome sequencing, and how these insights can inform the neuroscientific investigation of these conditions.
Keywords
Introduction
Neuropsychiatric disorders, such as schizophrenia (SZ), bipolar disorder (BD), major depressive disorder (MDD) and attention deficit hyperactivity disorder (ADHD), are cumulatively common but highly debilitating conditions. Although they can be assumed to reflect changes in brain function, they are not characterised by obvious neuropathology, and the underlying biological mechanisms are largely unknown. It is, however, clear that most neuropsychiatric disorders are at least moderately heritable, and it has long been hoped that the identification of susceptibility genes will provide much needed insights into their molecular aetiology, which could lead to more effective treatments. In the past decade, technological developments in genome analysis combined with large sample sizes have led to significant advances in our understanding of the genetic architecture of major neuropsychiatric disorders and the genetic loci involved. This article will describe historical attempts to identify susceptibility genes for these conditions, recent successes in the field, future directions and how these advances can inform neuroscience research.
The heritability of neuropsychiatric disorders
It has been known for over a hundred years that mental illness can run in families. The extent to which this is attributable to genetic factors, rather than familial environment, can be explored through twin studies, which compare the rate of trait sharing between monozygotic, or identical, twins (who share all of their genetic variability) and dizygotic twins (who share half of their genetic variability on average). As environmental effects are assumed to be largely the same for monozygotic and dizygotic twins, the difference in trait concordance between the two types of twin can be used to estimate the ‘heritability’ of the trait; that is, the proportion of trait variance (or disease liability) that is due to genetic factors. Twin studies have decisively shown that most neuropsychiatric disorders have a substantial genetic component: for SZ, BD and ADHD, heritability is approximately 75%–80% (McGuffin et al., 2003; Rietveld et al., 2003; Sullivan et al., 2003), while the heritability of MDD is lower but non-trivial at ~40% (Sullivan et al., 2000). These robustly replicated observations provide a strong empirical foundation for studies seeking to identify genetic variants conferring risk to these disorders.
Genetic linkage studies
One of the earliest strategies for identifying genetic risk loci for psychiatric disorders was through genetic linkage. Linkage studies are typically performed in large families in which several individuals are affected and are predicated on the fact that genetic markers that are within a few million nucleotide bases of a disease allele tend to be inherited with it. Co-segregation of the disease and a particular marker allele within a family thus implicates the chromosomal region in which the marker is located in the condition. Linkage studies are best suited for Mendelian diseases where there are one, or few, genetic loci exerting a strong effect on risk, having notable success in localising the Huntington’s disease gene (Gilliam et al., 1987; The Huntington’s Disease Collaborative Research Group, 1993) and those causing early onset forms of Alzheimer’s disease (Goate et al., 1991; Sherrington et al., 1995). However, despite considerable efforts over several decades, linkage studies have not reliably identified risk loci for common neuropsychiatric disorders, indicating that the genetic contribution to these conditions does not adhere to relatively simple monogenic or oligogenic models.
Classic cytogenetic approaches
Another early strategy for exploring genetic causes of neuropsychiatric disorders was to search for major chromosomal abnormalities in affected individuals. Such cytogenetic anomalies are present in ~7% of autism cases (Xu et al., 2004), but are uncommon in neuropsychiatric disorders with a less obviously developmental basis. Perhaps the most notable finding with regard to the latter is of a balanced t(1;11)(q42; q14) translocation disrupting the
Candidate gene association studies
The third main route to risk gene identification is based on association; here the aim is to identify susceptibility variants that are not on their own sufficient to cause the disorder, and which, therefore, elude detection by linkage. The most common design is the ‘case–control’ study, in which the frequency of individual DNA variants is compared between people with and without the condition. Ignoring technical errors and poor study design, significant case–control differences in allele or genotype distributions suggest either direct effects of the associated allele on susceptibility to the disorder (e.g. by altering amino acid sequence or gene expression) or correlation within the population between such a risk allele and the assayed variant (a phenomenon known as ‘linkage disequilibrium’). Technological limitations meant that early studies necessarily focused on a limited number of variants within candidate genes that were selected on the basis of their known biological roles (e.g. genes involved in dopamine function as candidates for SZ). Numerous reports of candidate gene association exist in the literature, but none are sufficiently replicable to be considered robust. In retrospect, this lack of consistency can be readily explained by the small effects on susceptibility that are now known to typify common risk alleles, the low probability that any selected candidate allele is a true risk allele, and small sample sizes.
Genome-wide association studies
The development of genotyping arrays in the early 2000s made it possible to simultaneously genotype 100,000s of DNA variants, known as single-nucleotide polymorphisms (SNPs), and to do so cheaply in large numbers of individuals. At the same time, increased knowledge of the patterns of linkage disequilibrium in the human genome made it possible to infer (or ‘impute’) genotypes at millions of other SNPs, thereby capturing the majority of common DNA variation (i.e. variants with population allele frequencies > 5%) in each individual’s genome. It thus became possible to perform genome-wide association studies (GWAS) of neuropsychiatric disorders. The scale of coverage, and the ability to test large sample sizes, effectively addresses the main limitations of candidate gene approaches (bias towards existing hypotheses, low probability of selecting a true risk allele from the millions present in the genome, low statistical power from small sample sizes) and thereby allow comprehensive and unbiased assessments of the genome that might provide new insights into biology.
It is now clear that DNA variants that are common in the general population individually confer only a small increase in risk for neuropsychiatric disorders (odds ratios of associated variants typically <1.1). Very large sample sizes are, therefore, required in order to detect them at a significance threshold that controls for testing millions of DNA variants (based on 1 million independent tests in a comprehensive GWAS, the generally accepted threshold for ‘genome-wide significance’ is
It should be noted that, while GWAS identify genetic risk loci, additional, functional studies are usually required to confidently identify the susceptibility genes within them. The vast majority of common risk variants for neuropsychiatric disorders do not appear to change the protein coding sequence of genes, and are, therefore, likely to alter regulatory regions of the genome (e.g. binding sites for transcription factors) that can be several hundred kilobases (kb) from the genes they regulate. In addition, linkage disequilibrium makes it difficult to distinguish between the functional risk variants and variants that are correlated with them, resulting in association signals that often span multiple genes. Functional interrogation of GWAS risk loci has already yielded important insights; for example, association between SZ and a broad region at the major histocompatibility complex (MHC) locus on chromosome 6 has been shown to partly reflect structural variation at the
Polygenic risk scores and pleiotropy
Since the early years of GWAS, it has been apparent that common risk loci for psychiatric disorders identified at genome-wide levels of significance constitute only the ‘tip of the iceberg’, with thousands of other variants conferring weak effects on risk falling short of this stringent significance threshold. Evidence for the highly polygenic nature of psychiatric disorders was first provided by a study from the International Schizophrenia Consortium (2009), in which the summation into a ‘polygenic risk score’ of thousands of DNA variants exhibiting at least minimal association with SZ was found to account for a significant proportion of the risk in an independent SZ sample.
The amount of liability captured by polygenic risk scores is a function of the GWAS discovery sample size and the liability to the disorders captured by SNPs on genotyping arrays, which is typically around 30%–50% of the heritability. Although the information content and predictive power of polygenic risk scores is not useful diagnostically, the approach provides the first quantitative biomarker of genetic liability that can be applied to any individual regardless of psychiatric status. The availability of such a biomarker has numerous potential applications in neuroscience, including examining the validity of intermediate cognitive, behavioural and neuroanatomical phenotypes for these conditions (e.g. Riglin et al., 2017; Terwisscha van Scheltinga et al., 2013). However, to date, the most influential application of polygenic risk scores has been in exploring the genetic relationship between neuropsychiatric disorders.
In the earliest study (International Schizophrenia Consortium, 2009), polygenic risk for SZ was shown to be associated with risk for BD, but not non-psychiatric diseases, providing evidence of a genetic overlap between the two disorders. Subsequent studies using risk scores, as well as other polygenic methodologies, have now clearly demonstrated substantial genetic sharing across many psychiatric disorders (Bulik-Sullivan et al., 2015; Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013). For example, the common variant contribution to SZ overlaps with that for ADHD, MDD, autism spectrum disorder, obsessive-compulsive disorder and anorexia nervosa, as well as BD (O’Donovan and Owen, 2016). The clear evidence for pleiotropic effects in psychiatry that has come from common variation is mirrored by similar findings from rare genetic variation. As discussed later, rare variants conferring risk for SZ also increase risk for other disorders of neurodevelopmental origin, and even in individuals with no known clinical syndrome, cognitive function is often affected (Kendall et al., 2017; Stefansson et al., 2014). It is important for neuroscientists to take note of pleiotropy when interpreting studies of endophenotypes in humans and when modelling mutations in animal and cellular systems (O’Donovan and Owen, 2016).
Copy number variants
It is now established that, in addition to common variants of weak effect, the genetic architecture of neuropsychiatric disorders includes rarer variants that potentially have a much greater impact on risk. It has been known since the 1990s that high rates of SZ occur in people with velocardiofacial (or DiGeorge) syndrome, a condition resulting from large deletions on chromosome 22q11.2 (Murphy et al., 1999). These deletions, which occur in roughly 1 in 4000 births and typically encompass at least 40 genes, are now recognised as the first example of copy number variants (CNVs) associated with the disorder. With the development of genotyping arrays, it became clear that CNVs, which are usually defined as deletions, duplications or insertions larger than 1 kb, are far more frequent in the human genome than previously assumed. Genome-wide CNV scans have revealed that rare (population frequencies < 1%) or de novo CNVs occur in people with autism and SZ at a rate that is more than twice that of controls (Kirov et al., 2012; Sebat et al., 2007), and at an even higher rate (approximately 14%) in idiopathic developmental delay/intellectual disability (Cooper et al., 2011). Rare CNVs are also enriched in other neurodevelopmental disorders, including ADHD (Williams et al., 2010) and Tourette Syndrome (Huang et al., 2017), but appear to contribute less to psychiatric disorders that are not commonly conceptualised as neurodevelopmental in origin, such as BD and MDD (Green et al., 2016; O’Dushlaine et al., 2014; Rucker et al., 2016).
Given that pathogenic CNVs are individually rare, tests of association between neuropsychiatric disorders and CNVs affecting any specific region require very large sample sizes. In a recent analysis of 21,094 SZ cases and 20,227 controls, 8 CNV loci (on chromosomes 1q21.1, 2p16.3, 3q29, 7q11.2, 15q13.3, distal 16p11.2, proximal 16p11.2 and the velocardiofacial syndrome region on chromosome 22q11.2) were associated with the disorder at a genome-wide significant threshold (CNV and Schizophrenia Working Groups of the Psychiatric Genomics Consortium, 2017). While most of these SZ-associated CNVs encompass several genes (with effects on some or all contributing to the disorder), pathogenic deletions on chromosome 2p16.3 appear to specifically disrupt the gene encoding the synaptic cell adhesion molecule, neurexin-1 (
Exome sequencing
The past decade has witnessed major developments in sequencing technology, permitting rapid and increasingly economical screens for rare DNA variants (e.g. point mutations) that are not captured by current SNP genotyping arrays. To date, work on psychiatric populations has largely focused on sequencing the ~1% of the genome that encodes proteins (i.e. coding exons), collectively known as the exome. The anticipated benefits of this approach are threefold: First, exonic mutations point to specific genes (cf. GWAS). Second, for mutations that introduce premature stop codons, the consequences for gene function can be largely predicted. Third, like rare CNVs, individual coding mutations that are rare or de novo potentially have large effects on risk. These benefits make rare coding mutations particularly attractive for neuroscientists seeking to generate cellular or animal models.
On average, each individual carries one germline exonic de novo mutation. This de novo rate is increased in people with intellectual disability/developmental delay (Rauch et al., 2012), and to a lesser extent in those with autism spectrum disorder (Sanders et al., 2012). Exome sequencing studies have indicated an increased abundance of very rare (population frequency < 0.01%) disruptive coding mutations in SZ, distributed across many genes (Genovese et al., 2016; Purcell et al., 2014), and there is also evidence that such variants contribute to BD (Goes et al., 2016). However, like pathogenic CNVs, the rarity of the individual mutations, and their broad distribution, has meant that very large sample sizes are required to implicate specific genes. This approach has proven highly successful for autism spectrum disorder (De Rubeis et al, 2014; Sanders et al., 2015) and has started to yield for SZ, where a recent analysis of exome sequencing from 4264 SZ cases, 9343 controls and 1077 SZ parent–proband trios revealed a genome-wide significant association between the disorder and rare loss-of-function variants in the
Pathway analyses
Given difficulties in implicating specific genes in neuropsychiatric disorders, a complementary and potentially highly informative approach is to test the extent to which identified risk variants converge on defined biological processes. For example, CNVs associated with SZ have been found to be enriched for genes encoding members of NMDA receptor and GABAA receptor complexes (Pocklington et al., 2015). The importance of synaptic processes in SZ is also highlighted by pathway analyses of smaller de novo mutations identified in patients by exome sequencing and of common variation identified through GWAS, which show particular enrichment at gene loci encoding post-synaptic proteins (Fromer et al., 2014; Network and Pathway Analysis Subgroup of the Psychiatric Genomics Consortium, 2015). In a pathway analysis of GWAS data for SZ, BD and MDD, genes involved in histone methylation processes were found to be enriched for genetic associations with all three conditions, and BD in particular (Network and Pathway Analysis Subgroup of the Psychiatric Genomics Consortium, 2015). Future pathway analyses are likely to benefit from greater understanding of the genes affected by genetic risk variation as well as their biological functions.
Conclusion
In recent years, there has been considerable progress in our understanding of the genetics of common neuropsychiatric disorders, for which neurobiological leads have been elusive. It is now clear that these disorders are highly polygenic, involving thousands of common as well as rarer genetic variants that, together with environmental risk factors, collectively increase an individual’s chances of developing such a condition. It is also apparent that many of these risk variants are shared between neuropsychiatric diagnoses. As sample sizes have grown, both common and rare genetic risk loci for neuropsychiatric disorders have been identified with high confidence. Associations between neuropsychiatric disorders and common variants identified by GWAS appear to largely reflect regulatory genetic variation, which might operate on specific gene transcripts, in circumscribed cell populations and at particular developmental stages. For some neuropsychiatric phenotypes, particularly those with clear neurodevelopmental features, stronger effects on risk may be conferred by rare and de novo CNVs and exonic mutations that can result in hemizygous loss of gene function. With even greater sample sizes, and comprehensive genotyping through whole genome sequencing, many more genetic risk loci for neuropsychiatric disorders will be identified in coming years. Translating these discoveries into an understanding of molecular, cellular and neurophysiological mechanisms underlying neuropsychiatric conditions will require the expertise of researchers in many areas of neuroscience.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
