Abstract
Problem
Most psychiatric disorders are complex genetic traits involving both genetic and environmental risk factors. Although genetic techniques have been very successful for identifying the genetic and molecular basis of rare, single gene (Mendelian) disorders, these techniques have been slow in determining the genetic basis of complex (non-Mendelian) disorders such as psychiatric illnesses. However, with recent technological advances and a massive increase of publicly available genetic resources (e.g. Human Genome Project [HGP]), several genes have recently been implicated in the susceptibility to psychiatric illness. Psychiatric genetics appears to be poised for significant advances in our knowledge and understanding of the molecular genetic basis of mental illness.
At the end of each year, the News and Editorial staff of Science vote to decide what scientific advances made that year deserve recognition as ‘Breakthrough of the Year’. In December 2003, the top ‘Breakthrough’ for life sciences was ‘genes for mental illness and mood disorders’ [1], [2]. This acknowledgement was in recognition of the recent exciting discovery of several genes that confer risk to various psychiatric illnesses and related mental functions. These included the association reported between the serotonin transporter gene and depression [3], between the catechol-O-methyltransferase (COMT) gene and schizophrenia [4], [5] and between the brain-derived neurotrophic factor (BDNF) gene and memory and hippocampal function [6]. Additional recent reports of note have identified several other genes that appear to confer a moderate risk of schizophrenia. These include the neuregulin 1 (NRG1) gene, identified by deCODE Genetics, a company that is attempting to identify the genetic basis of common diseases in Iceland [7]. Other genes include DISC1, DISC2, G72 and DTNBP1 8– [10]. Some of these gene associations have been supported by follow-up studies, though others have not and may represent false positives. Nevertheless, after many years of fruitless endeavours, these recent reports indicate that the labours of researchers in psychiatric genetics are beginning to show exciting results, with real and reproducible associations of genes with mood disorders. Identification of these susceptibility genes holds great promise, with the unravelling of the molecular and biochemical basis of some conditions now being a more realistic and tangible goal. The increasing number of genes being identified augers well for the future treatment of psychiatric disorders. The genes identified, and the pathways of genes and proteins that they implicate, will provide potential novel targets for new therapeutic drugs.
Many advances in the techniques of gene discovery, and the increasing resources available, are rapidly being adopted by researchers and applied to the complex problem of identifying susceptibility genes for mental illnesses. Perhaps the single most important advance to date is the HGP and all that has stemmed from the vast quantity of information that this endeavour has provided. In this context, this paper aims to review the gene identification strategies being applied by molecular geneticists in their efforts to elucidate the genetic basis of psychiatric disorders.
Review of genetic techniques for identifying susceptibility genes
Various approaches have been developed for identifying genes for complex traits, which may be used alone or in combination depending upon the clinical resources that are available. These are described here, including a discussion of their advantages and disadvantages.
Cytogenetic abnormalities
Perhaps the most straightforward means of identifying a candidate susceptibility gene for a complex trait is through the identification of a chromosomal abnormality that disrupts one or more genes. Unfortunately, these are rarely found. Nevertheless, one recent example indicates that this approach should not be dismissed. St Clair et al. [11] originally identified a balanced translocation between chromosomes 1 and 11 (t(1;11)(q43;q21)) in a family that included 23 cases of mental and/or behavioural disorders. In particular, the translocation was most significantly associated with family members with schizophrenia, schizoaffective disorder, recurrent major depression and adolescent conduct and emotional disorders. Ten years later, Millar et al. [8] reported the sequencing of the breakpoints on chromosomes 1 and 11 for this translocation. No genes were shown to be disrupted at the chromosome 11 breakpoint; however, two genes were disrupted at the chromosome 1 breakpoint. These two genes were subsequently named ‘Disrupted in Schizophrenia 1 and 2’ (DISC1 and DISC2). Due to their disruption, DISC1 and DISC2 are now major candidates as susceptibility genes for schizophrenia and studies are under way to determine if there is support for this notion.
Another recent example is the identification of a pericentric inversion of chromosome 3 (46N inv(3)(p14q21)) in a child with moderate intellectual disability as well as severe conduct disturbance [12]. This cytogenetic abnormality was subsequently found in other members of the child's extended family who have developmental– behavioural problems. Although no gene has been identified to date, genes that lie at or near the chromosomal breakpoints are candidates for involvement in intelligence and/or behaviour.
Linkage analysis and positional cloning
Positional cloning is an approach to disease gene identification that locates the responsible gene solely on the basis of its chromosomal position. It uses linkage analysis to identify and narrow an interval that harbours a disease gene on a particular chromosome, after which transcripts within the interval are identified and investigated for a pathogenic role in the disease in question. This approach requires no functional information and is used when the underlying disease biology is unknown or poorly understood. Positional cloning has been widely used to identify the disease genes that underlie Mendelian (single gene) disorders. Indeed, since the first disease gene was successfully identified by positional cloning in 1986 (X-linked chronic granulomatous disease) [13], hundreds of Mendelian disease genes have been identified via this strategy [14]. However, it is only in recent years that the techniques of positional cloning have also been applied in the search for genes that underlie non-Mendelian traits, such as psychiatric disorders. This has largely come about as a direct result of the outcomes of the HGP. The progress in genetic mapping, physical mapping and gene hunting technologies arising from the HGP has seen positional cloning become increasingly used as a search strategy for identifying disease genes for complex traits.
Linkage analysis is the first stage of the positional cloning process. If a genetic marker locus on a chromosome is sufficiently close to a gene involved in the disease aetiology, then genetic linkage should exist between the marker and the gene in question. Linkage analysis uses polymorphic DNA markers and their corresponding genetic maps to localize a disease locus. Multi-allelic short tandem-repeat (microsatellite) polymorphisms have proven to be informative and easy to type (genotype) DNA markers for linkage analysis as they can be efficiently analyzed using the polymerase chain reaction (PCR). These polymorphic DNA markers are analyzed in families with disease. In the case of complex genetic traits, these familiesmay be in the form of either extended pedigrees, or affected sib-pairs.
Linkage analysis follows the transmission of marker alleles within families to identify those alleles that cosegregate with the disease. For complex disorders, chromosomal regions that harbour a disease gene will show a higher than expected incidence of shared alleles among affected individuals in a family. A statistical evaluation is carried out to determine the likelihood or probability that the disease and particular marker alleles show linkage. This analysis relies on meiotic recombination events to locate the chromosomal regions that harbour the disease genes. Follow-up studies with additional markers can refine these chromosomal regions. When several markers within a candidate region show linkage to the disease, it may be possible to determine the linkage phase for that chromosome segment. These particular chromosomal combinations of alleles are known as haplotypes. Haplotypes can be viewed as a single unit of transmission, similar to an allele. It therefore becomes possible to trace specific chromosome regions through pedigrees without losing their identity. Haplotype analysis is particularly useful in determining recombination boundaries by observing theminimum haplotype that segregates with the disease.
Examples of linkage studies for complex traits that have been successfully reproduced include the identification of regions on chromosomes 6 and 8 that show linkage with schizophrenia. Wang et al. [15] and Straub et al. [16] initially described the results of linkage analysis in 186 and 265 families, respectively, which showed evidence for the linkage of a segment of chromosome 6 (6p24– p22) with schizophrenia. Support for this susceptibility locus was provided by the follow-up linkage studies of Moises et al. [17], Maziade et al. [18], Lindholm et al. [19], Schwab et al. [20] and amulticentre analysis [21]. A segment of chromosome 8 (8p21–22) has similarly been implicated by linkage analysis as harbouring a susceptibility gene for schizophrenia. Blouin et al. [22] reported the results of linkage analysis in 54 multiplex pedigrees that implicated this region. The follow-up linkage studies of Gurling et al. [23], Kendler et al. [24], a multicentre analysis [21] and Stefansson et al. [7] all provided support for the presence of a schizophrenia risk gene within this region of chromosome 8. A meta-analysis of these combined linkage analyseswas performed by Lewis et al. [25], which provided strong evidence for the presence of schizophrenia susceptibility genes within both of these regions on chromosomes 6 and 8.
Our group has also successfully used linkage analysis to identify a segment of chromosome 4 (4q35) that harbours a susceptibility gene for bipolar disorder. We initially reported evidence for a novel bipolar disorder susceptibility locus on chromosome 4q35 following linkage analysis in a large bipolar pedigree [26]. Analysis was subsequently extended to our cohort of 55 multigenerational bipolar pedigrees and evidence for linkage to chromosome 4q35 was significantly strengthened. Analysis of haplotypes allowed us to define a candidate interval of 5Mb extending to the telomere of chromosome 4q35 [27]. Other groups have now independently published support for this region, including the linkage studies of the NIMH (NIH) [28], Stanford University and Johns Hopkins University [29], [30] and the Wellcome Trust funded UK–Irish Bipolar Sib-pair Study [31].
Once identified, these linked chromosomal segments are investigated to identify the genes that lie within. In the case of complex traits, these regions are typically large and may contain tens to hundreds of genes, most of which require testing for sequence variations that are associated with the disease. Most genes also have multiple alternate transcripts that code for various protein isoforms and these alternate transcripts also require identification and screening. Identifying all genes within a candidate region is a huge task, but one that has been aided by the genome-wide sequence annotation projects that have stemmed from the HGP, including the UCSC's Genome Browser, Sanger's Ensembl and NCBI's Mapviewer (http://genome.ucsc.edu/; http://www.ensembl.org; http://www.ncbi.nlm.nih.gov/). An example of gene identification within a linked chromosomal region is our own study of the bipolar susceptibility locus on chromosome 4q35. In order to identify candidate genes from this region, we used the HGP data in combination with laboratory techniques to establish a comprehensive gene map of the linked region on chromosome 4q35 [32]. This map includes 22 genes and provides a collection of candidate genes for investigation for a pathological association with bipolar disorder.
Although some sequence variations do not produce actual biological changes in individuals, other variants predispose subjects to disease and/or influence their response to therapeutic drugs. The most widely accepted hypothesis for the molecular basis of complex traits is the common disease/common variant (CDCV) hypothesis [33–35]. The CDCV hypothesis proposes that there are a limited number of relatively frequent alleles and that each confers a modest risk of disease susceptibility. However, identifying and testing all the sequence variations within the genes from a candidate region is a huge task. In the case of complex genetic traits, single nucleotide polymorphism (SNP) identification is being aided by another resource that has stemmed from the HGP, namely the SNP database (dbSNP) which catalogues SNPs identified from populations worldwide. However, testing those SNPs is another challenge, as many SNPs probably have no biological effect. Although a tested SNP may show a higher than expected incidence among affected individuals in a family, this test SNP may merely be linked to the causative variation. The biological effect of an SNP may be subtle and an SNP location may provide little indication of its functional effect. It is, therefore, unrealistic to conduct functional assays to determine the direct biological consequences of the hundreds of SNPs typically identified from a linked chromosomal region. However, determining whether an SNP has biological significance can be achieved via another approach, known as association analysis.
Association analysis
Two linked alleles on the same chromosome will continue to be transmitted together with each meiosis until such time as crossing over occurs between them. They will then be distributed into separate gametes and associate independently. This random association will occur rapidly unless the two loci are very closely linked. If the two loci are in very close proximity to each other, random assortment may be very slow, perhaps taking hundreds of generations or more. This non-random association of alleles at linked loci is termed linkage disequilibrium (LD).
Association studies look for marker–disease correlations by testing for LD on a population level. This involves looking for associations between particular marker alleles or haplotypes, and a disease, in unrelated individuals. These associations may exist if the disease in these apparently unrelated individuals has arisen from a common ancestral founding mutation. As LD only occurs between very tightly linked loci, this approach is generally only powerful when markers are particularly close to a disease locus, typically less than a few kilobase pairs away. Consequently, analysis of LD is most effective in fine-scale genetic mapping. In essence, these studies test the association of a marker allele or genotype with the disease. This is achieved by comparing the frequency of the alleles or genotypes in a sample of unrelated affected individuals (cases) with the frequency in a sample of unaffected individuals (controls). Associations between loci are traditionally tested statistically by Fisher's exact test and the chi-squared test [36], [37] although other tests have also been described [38]. Alternatively, a cohort of unrelated cases are studied together with their parents (typically parent-proband trios) in what is known as family-based association analysis. In short, the transmission of SNP alleles from heterozygous parents to affected offspring is determined and the whole cohort analyzed to establish whether any allele is overtransmitted. If there is no linkage, the transmission of each allele from a parent to affected offspring should be equal. If there is linkage and association, then one allele would most likely be on the chromosome with the disease susceptibility allele and they would be inherited together more than 50% of the time. These family-based associations are traditionally tested using the transmission disequilibrium test (TDT) [39].
Association analysis has a greater power for detecting minor or moderate disease risk alleles than linkage analysis [40]. Hence, it is not surprising that the number of association studies to identify genetic loci for complex traits has increased rapidly over the last decade [41]. In addition, recruitment of large numbers of unrelated affected individuals is an easier task than identifying and collecting large numbers of multigeneration pedigrees that include sufficient numbers of affected individuals for linkage analysis. However, association analysis comes with its own challenges. Because LD only extends short distances, a far greater density of polymorphic markers is required to be genotyped than is necessary for linkage analysis. Single nucleotide polymorphisms are most typically used in association analyses as they are the most frequent polymorphisms in the genome and are relatively simple to analyze. Association analysis has become the mainstay of genetic analysis in complex traits as the db-SNP has grown, high throughput genotyping techniques have improved and LD maps have become publicly available. In particular, the International Haplotype Mapping (HapMap) Project has contributed significantly to the content and resolution of publicly available LD maps. The HapMap Project aims to catalogue human SNPs, including their location and how they are distributed among individuals within populations and among populations worldwide [42]. This information is being made publicly available for the purpose of providing researchers with data to be used in linking SNPs to the risk for specific illnesses.
Strategies for association analysis may vary from direct (gene-based) tests for association of SNPs located in candidate genes, to indirect tests for association of multiple SNPs distributed at high density across an entire chromosomal segment or the whole genome. The direct gene-based approach is powerful as it captures all potential risk-conferring variations [43]. However, the indirect approach is also attractive as it does not require the time-consuming process of identifying and characterizing genes and the SNPs therein.
Positional candidate association studies
Gene-based tests of direct association may be applied following linkage analysis in a process known as positional candidate association studies. In this case, linkage analysis has previously identified a chromosomal region and genes within that region have been found. Populationbased SNPs within those genes are identified and subsequently analyzed among a cohort of unrelated cases and control individuals. This approach has the benefit of limiting the number of genes to be screened. Furthermore, the genes can be prioritized for investigation on the basis of their expression in tissues affected by the disease, or by their known or inferred biological function.
One of the best examples of the success of the positional candidate association approach was the discovery of APOE as a susceptibility gene for late-onset Alzheimer's disease. Traditional linkage analysis initially identified a region on chromosome 19 as harbouring an Alzheimer's susceptibility gene [44]. Subsequent association analysis of the candidate gene APOE located within the region on chromosome 19 identified the risk allele APOE-4 [45]. It has since been estimated that the proportion of patients with dementia that is attributable to APOE-4 is about 20% [46].
Other examples include the association studies of candidate genes from the regions on chromosomes 6 and 8 that were implicated in schizophrenia following linkage analysis. Straub et al. [10] recently reported evidence of family-based association analysis that supports dysbindin (DTNBP1) as a schizophrenia susceptibility gene in the region implicated on chromosome 6p22.3. Stefansson et al. [7] also presented evidence that NRG1 is a schizophrenia candidate gene in the region implicated on chromosome 8p21–22. Both of these genes are currently the subjects of follow-up studies to determine whether their involvement in schizophrenia can be supported in other patient cohorts.
Functional candidate association studies
An alternative to the linkage-driven selection of candidate genes for association analysis is the functional or biological candidate gene approach. No knowledge of chromosomal location of the gene is necessary. Selection of functional candidate genes generally requires a functional understanding, or at least a limited knowledge of the biochemical defect underlying a disease. With this knowledge, researchers can identify genes that they consider likely to be involved in the disease aetiology. Population-based SNPs are then identified within these genes and subsequently analyzed for association with the disease among a cohort of unrelated cases and control individuals.
An example of the functional candidate association approach is the link reported between genes of the serotonergic system and mood disturbance. The serotonergic system is implicated in the regulation of mood, anxiety responses and aggressive behaviour. Serotonin (5-HT) is a neurotransmitter, the synaptic concentration of which is determined by the functional activity of the serotonin transporter (5-HTT). The serotonin transporter is a known target of antidepressant drugs and was therefore considered a functional candidate for association studies. Initial association studies of the gene-encoding 5-HTT showed a tenuous link with depression. However, recent studies have identified that an allele of the 5-HTT gene confers an increased risk to depression only in interaction with stressful life events [3]. Thosewho suffered bereavement, romantic rejection, or job loss were found to be significantly more likely to become depressed if they carried the particular 5-HTT risk allele. Subsequent studies have supported this link by showing that the 5-HTT risk allele modulates mental and physical response to social stressors and chronic disease burden [47]. The involvement of the 5-HTT risk allele is an example of the significant interaction that may occur between genotype and environment in susceptibility to various psychiatric disorders.
Indirect association analysis
Indirect association analysis relies on a highly dense coverage of SNPs across a chromosomal region, or even the entire genome, to detect LD with a causative sequence variation, if not the causative variation itself. This approach does not differentiate between gene-based and non-gene-based SNPs. The indirect approach is assisted by the existence of definable LD regions within the genome. Studies of the genome-wide structure and extent of LD has indicated that the LD that exists along chromosomes may be broken down into a series of discrete blocks [48–50]. Genes may lay wholly within an LD block, or may span multiple blocks. It is not necessary to genotype every SNP that lieswithin an LD block. Instead, representative SNPs may be selected and designated as haplotype tag SNPs (tagSNPs) [51], [52]. In this way, fewer SNPs are required than originally predicted to identify the majority of genetic variation along a chromosome. This has particular significance for whole-genome SNP association studies, which require massive, high-throughput genotyping strategies that have proven to be prohibitively costly with current technologies for large-scale association studies.
Whole-genome association studies using tagSNPs will require whole-genome association maps. Establishing these maps is a goal of the HapMap Project and the first generation whole-genome association map is expected to be released shortly. This will define a non-redundant set of tagSNPs that cover the entire genome. However, the optimal set of tagSNPs will not necessarily be the same for all association studies because of differences in the expected effect size of a risk allele and the genetic diversity of the study population [53]. Cost will also continue to be a significant factor until SNP genotyping technologies advance to the point where large-scale association studies are feasible. Current estimates for the number of SNPs required to reasonably cover the entire genome vary widely. The HapMap Project aims to genotype at least 600 000 SNPs to define LD patterns across the genome. The number of non-redundant tagSNPs required to achieve a reasonable likelihood of identifying a common risk allele is estimated at between 200 000 and 1 million [42], [54]. Apart from cost, a genome screen of this magnitude poses an enormous statistical problem with regard to multiple testing. Only those risk alleles with the strongest effects will achieve statistical significance [53].
No examples have been described of whole-genome association analysis of complex traits of the magnitude described earlier. These studies await the release of comprehensive whole-genome association maps. However, recently a whole-genome scan of a complex disease was published that directly compared SNPs with microsatellites [55]. The study involved a whole-genome screen of 157 families with rheumatoid arthritis, using 11 245 SNPs. The SNP analysis detected the major rheumatoid arthritis susceptibility locus, HLA-DRB1, on chromosome 6p21, a result that was consistent with their microsatellite linkage scan.
To alleviate the problems associated with indirect whole-genome analysis, it has been proposed that the direct association approach be applied to whole-genome association studies [14]. These proposals advocate using only functional coding SNPs (cSNPs) that lie within genes and suggest that between 50 000 and 100 000 such SNPs are necessary for entire genome coverage [56]. However, as yet there is no large-scale public effort to identify a comprehensive collection of cSNPs for wholegenome analysis of complex traits.
Conclusion
The techniques for genetic analysis of common complex disorders have been established and continue to be improved. Public resources for genetic analysis of complex disorders are burgeoning. Together, these tools have begun to bear fruit in elucidating the molecular basis of psychiatric disorders. There has been a trickle of genes identified to date and only timewill tell if that trickle leads to a flood. Much has been learned from the experience and strategies of identifying genes for rareMendelian disorders. From a population health perspective, identifying the moderate risk alleles of common complex disorders will be of far higher importance. Once identified, new challenges will arise including the elucidation of gene– gene and gene–environment interactions. Nevertheless, the genes and pathways that will be identified will lead to improved diagnostic and prognostic tests, and offer targets for novel therapeutic drugs.
