Abstract
Alzheimer’s disease (AD) is a devastating disease mainly afflicting elderly people, characterized by decreased cognition, loss of memory, and eventually death. Although risk and deterministic genes are known, major genetics research programs are underway to gain further insights into the inheritance of AD. In the last years, in particular, new developments in genome-wide scanning methodologies have enabled the association of a number of previously uncharacterized copy number variants (CNVs, gain or loss of DNA) in AD. Because of the exceedingly large number of studies performed, it has become difficult for geneticists as well as clinicians to systematically follow, evaluate, and interpret the growing number of (sometime conflicting) CNVs implicated in AD. In this review, after a brief introduction of this type of structural variation, and a description of available databases, computational analyses, and technologies involved, we provide a systematic review of all published data showing statistical and scientific significance of pathogenic CNVs and discuss the role they might play in AD.
Keywords
INTRODUCTION
Alzheimer’s disease (AD) is the most common form of dementia affecting between 24 to 35 million people worldwide [1] and mainly afflicting elderly people. It is a devastating disease characterized by decreased cognition, loss of memory, and lastly death. Cholinergic neurons, particularly those of the cortical and subcortical areas, including hippocampal areas, are the most affected by this disease process. Since the hippocampus plays a key role in learning processes and memory, neurodegeneration in this area is considered the main cause of memory loss.
Neuropathologically, AD is mainly characterized by the presence of abnormal aggregates of amyloid-β (Aβ) peptide in the form of extracellular senile plaques and hyperphosphorylated tau protein in the form of intracellular neurofibrillary tangles (NFTs), microvascular damage, including vascular amyloid deposits, and pronounced inflammation of the affected brain regions. AD is often anticipated by mild cognitive impairment (MCI), a clinical condition characterized by cognitive deficit. It is estimated that the progression from MCI to AD occurs approximately in 10–15% of patients [2].
The disease can be classified into two types, depending on the age of onset: early-onset (EOAD) that occurs before 65 years of age and accounts for less than 5% of all cases [3], and late-onset (LOAD, >65 years) which is the most common form. AD is highly heritable, with heritability estimates ranging from 58 to 79% [4]. Having a family history is the second greatest risk factor for the disease and this occurs both in EOAD and in LOAD.
In about 13% of EOAD familial patients, the disease is inherited in an autosomal dominant manner with full penetrance (at least three cases in three generations) [5] and is caused by mutations in three genes, APP (amyloid precursor protein, chr.21q21) [6], PSEN1 (presenilin-1, chr.14q24) [7], and PSEN2 (presenilin-2, chr.1q42) [8]. The APP gene encodes the transmembrane protein AβPP that can be cleaved by different cellular proteases: α-, β-, and γ-secretases. PSEN1 and PSEN2 encode essential components of the γ-secretase complex.
Overall, the Mendelian form of the disease is very rare and occurs in a small percentage of AD patients (<1%). The majority of cases probably result from a combination of non-genetic factors and genetic susceptibilities. No causative genes have been identified for LOAD, which appear to be heterogeneous and multifactorial. The greatest known risk factor is aging. Other potential non-genetic risk factors include sex, trauma brain injury, diabetes mellitus, cigarette smoking, and alcohol consumption. Epigenetic mechanisms, such as abnormal DNA methylation and histone modification, can also modulate AD risk.
The main gene involved in AD susceptibility is APOE on chromosome 19q13.2 encoding the protein APOE which is found in senile plaques, cerebral vessels, and NFTs in AD brains [9]. APOE influences the formation of neuritic plaques in mouse models [10] and binds Aβ in vitro [11]. The APOEɛ4 allele is the strongest genetic risk factor for LOAD in a gene dose-dependent manner [12], as confirmed by several genome wide-association studies (GWAS) (for a meta-analysis see Lambert et al. [13]), and is also associated with an earlier age of onset of the disease [14]. However, this polymorphism accounts for less than half the genetic variance in AD risk, and the presence of the ɛ4 allele is neither necessary nor sufficient by itself for the development of the disease. This evidence strongly suggests the existence of additional genetic risk factors as supported by several recent large GWAS (also see http://www.alzgene.org/, for a complete list of the candidate genes). Several reviews of the genetics of AD are available [15–18].
Taken together, the previously discussed findings account only for a fraction of the estimated heritability. Recent studies have found that duplications or deletions of DNA fragments, known as copy number variants (CNVs), may play a role in missing heritability. CNVs cause both normal and pathogenic genetic variation [19], modulate gene expression, change gene structure, and promote significant phenotypic variations [20]. Moreover, some CNVs, encompassing genes encoding drug-metabolizing enzymes, cause different responses to certain drugs [21].
This review aims to analyze the currently available literature on CNVs in AD in order to provide a better understanding of the role CNVs may play in this pathology.
COPY NUMBER VARIANTS
CNVs are DNA segments that vary from one kilobase (kb) to several megabases (Mb) and present variable copy number in comparison with a reference genome [22]. They include deletions or duplications of DNA (Fig. 1) and represent the most prevailing types of structural variations in the human genome [23]. Deletions concerning certain categories of genes, such as dosage-sensitive genes, are under-represented in CNV regions and could undergo negative selection [24], while duplications are less likely to be pathogenic and are often under positive selection, which favors the evolution of many gene families like those encoding immunoglobulins, globins, and olfactory receptors [25, 26].
CNVs may involve one or more genes and are distributed in a non-uniform manner; in fact, they are found mainly toward centromeres and telomeres [26], probably because these genomic regions have a repetitive nature. CNVs are related to the presence of exons, segmental duplications [27] also called low copy repeats (LCRs), microRNAs [28], and repetitive elements such as Alu sequences.
CNVs constitute approximately 12% of the human genome [27] and are responsible for an important proportion of normal phenotypic variation [29]. They are divided into two main groups: recurrent CNVs and non-recurrent CNVs. Recurrent CNVs are probably due to homologous recombination between repeated sequences during meiosis; non-recurrent CNVs instead are often caused by non-homologous mechanisms that happen throughout the genome and occur at sites of homology of 2 to 15 base pairs [30]. These errors can be either simple where a segment of DNA is cut from its original position and the ends are joined, or complex if a deletion is followed by an insertion or a duplication of DNAat breakpoints.
CNVs can be large or small: the first ones are often found in regions containing large homologous repeats or segmental duplications while small CNVs may due to non-homology driven mutational mechanisms. CNVs can affect gene expression and induce phenotypic variation by altering genome organization itself and gene dosage [31]. Therefore, they can also influence the susceptibility of an individual to disease and drug response [32].
In the human genome, CNVs are also classified as benign CNV (normal genomic variant), likely benign CNV, variant of uncertain significance (VOUS), CNV of possible clinical relevance (high-susceptibility locus/risk factor/likely pathogenic variant), and clinically relevant CNV (pathogenic variant) [33]. CNVs can be familial or de novo, with de novo mutation rate being higher than single base-pair mutation rate [34] and contributing to the development of sporadic genomic disorders [35]. CNVs are often associated with several complex and common disorders including nervous system disorders. Indeed, several studies have shown that susceptibility to late-onset complex disease such as amyotrophic lateral sclerosis, Parkinson’s disease, and AD is linked to thepresence of CNVs that also increase the risk for other diseases such as schizophrenia, autism, and mental retardation [36].
METHODS FOR CNVs ANALYSIS
Methods for CNVs analysis include CNV detection, CNV genotyping, and CNV association analysis. CNV detection and genotyping is performed by a mix of biological and data analysis tools, while CNV association analysis can be done by data analysis methods, algorithms, and software. We describe methods for CNV detection and genotyping first, and CNV association analysis afterwards.
Methods for CNV detection and genotyping
CNV detection concerns the identification of CNV loci by comparing multiple genomes. CNV genotyping focuses on uncovering the variations of an individual, usually by comparing it to a reference genome. The most common methods for CNV detection and genotyping can be classified in four main groups, which are based on comparative genomic hybridization (CGH) [37], single nucleotide polymorphism (SNP) genotyping [38], next generation sequencing, and quantitative PCR, respectively. Next we discuss each of these categories of methods in detail.
CGH-based methods
Array comparative genomic hybridization (aCGH) [37] represents one of the most used methods, although it has recently been superseded by methods based on SNP-arrays. It is based on the quantitative comparison of differentially labeled test and normal reference DNAs, which are co-hybridized to an array (Fig. 2). The fluorescence intensity ratio obtained on each spot provides a locus-by-locus measure of DNA copy number changes. This technique is able to analyze the whole genome in a single test but its resolution is low [39].
Several types of DNA sequences are used to construct arrays. They include bacterial artificial chromosomes (BACs) (40–200 kb in size), small insert clones (1.5–4.5 kb), cDNA clones (0.5–2 kb), genomic PCR products (100 bp–1.5 kb), and oligonucleotides (25–80 bp). Although arrays that use BAC clones provide the most comprehensive coverage of the genome, they cannot identify CNVs smaller than 50 kb. Higher-resolution analysis can be obtained by spotting shorter DNA molecules on the array.
Recent high-throughput techniques for identifying CNVs use CGH arrays with a high number (hundreds of thousands or millions) of probes that cover a large part of the genome [40]. A CNV is recognized when a significant variation with respect to a reference genome is identified on a number (typically more than 5–10) of consecutive probes. A limit of this method is the low resolution (typically fives to tens of kilobases).
SNP-based methods
Methods based on SNP-arrays use probabilistic models to infer CNVs from SNP array data and other available information, such as SNP population allele frequencies, distance between adjacent SNPs, and information from related family members, when available. These methods perform higher resolution (in the order of kilobases) than CGH-based methods.
SNP-arrays are used to evaluate the intensity of the hybridization signal of genomic DNA with the average value of control DNA. In contrast with aCGH, SNP arrays are designed for genotyping and use single-source hybridization instead of competitive hybridization [41]. Both CGH and SNP arrays are able to detect submicroscopic CNVs that are not recognized by the routine karyotype analyses.
There are several algorithms and tools available for inferring CNVs from SNP data [42]. The most known are PennCNV [43], QuantiSNP [44], iPattern [45], and some proprietary software (e.g., CNVpartition, implemented in Illumina BeadStudio, and Affymetrix Genotyping Console). PennCNV is an algorithm based on Hidden Markov Models widely used to detect CNVs with high resolution using the Illumina Infinium assay platform, and can be adapted to other platforms (e.g., Affymetrix SNP array). QuantiSNP [44], used to identify chromosome aberrations, is an algorithm that obtains high-resolution CNV/aneuploidy detection and improves the accuracy of segmental aneuploidy identification. QuantiSNP can perform joint inference across samples to improve resolution in locating CNV boundaries. iPattern shows a better reproducibility in breakpoint estimation for common CNVs by performing clustering across samples [43].
NGS-based methods
In the last years, NGS techniques for high resolution (<10 kb) CNVs detection have become popular [46]. The development of NGS platforms, such as Illumina [47], the SOLiD system from Applied Biosystems [48], and Roche 454 Life Sciences [49], has facilitated CNVs detection. One of the main advantages of these techniques is that they are able to identify CNV breakpoints at specific base pair resolution. Unlike array-based platforms, NGS-based methods are able to detect smaller structural variations (in the range of 10 to 2,000 bp), inversions, intrachromosomal translocation, and de novo CNVs. The main categories of NGS based CNV detection methods are pair-end mapping (PEM) and depth of coverage (DOC)-based method [50]. PEM-based methods detect balanced structural variations such as inversions and small CNVs, while DOC methods are the most commonly used by CNV detection tools. The main advantage of NGS is the capacity to sequence many reads in a single run at an inexpensive cost if compared with traditional Sanger sequencing [51]. In relation to CNVs detection, advantages are higher coverage and resolution, more accurate estimation of copy numbers, widest range of detection of breakpoints, higher capability to identify new CNVs [52, 53]. Major NGS platforms for DNA-sequencing can be categorized in whole-genome sequencing (WGS) and whole-exome sequencing (WES). WGS can define the full spectrum of variants in the whole genome, while WES techniques focus on coding regions of the genome with high coverage [54].
WES is employed to identify genes associated with Mendelian diseases including AD [55, 56] and is less costly since the exomes represent only about 1% of the genome [57], while WGS is preferred to identify the breakpoints in chromosome translocations and inversions [58].
PCR-based techniques
Variation of specific genomic regions can be performed by quantitative PCR (qPCR) [59], which allows the user to monitor the amplification in “real-time” as an increase of PCR products is highlighted by an increase in fluorescence. qPCR enables the identification of individual deletions or duplications and is a valid tool in large-scale association studies. Other advantages of this technique are the short time (few hours) required from sample preparation to get the results, the small DNA quantities needed for high throughput, and the low cost per sample. On the other hand, it is not suitable for detecting CNVs simultaneously in different genome regions. For this purpose the following alternative methods are used: multiplex amplifiable probe hybridization (MAPH) [39], multiplex ligation-dependent probe amplification (MLPA), quantitative multiplex PCR of short fluorescent fragments (QMPSF), and multiplex amplicon quantification (MAQ) [60]. However they can target a limited number of regions simultaneously, and hence they cannot be employed for genome-wide CNV studies.
In MAPH, multiple loci can be detected together by using sets of different probes flanked by the same primer binding sites [39]. The test DNA is firstdenatured and bound to a nylon filter and then hybridized with specific probes of different length. These probes have identical tail sequences and can be amplified with universal primers. After amplification, the products are separated according to size and quantified by comparing the fluorescence with that of control regions.
MLPA is performed in solution [61]. Pairs of probe are designed to hybridize adjacent areas and a contiguous probe molecule is created. The resulting products can be separated and quantified as in MAPH. This technique allows the use of up to 40 probes in one experiment and, as with MAPH, the probes can be used to screen large cohorts of samples [62]. In comparison, MAPH and MLPA have different advantages and disadvantages [63]. The generation of probes is simpler for MAPH than MLPA but MAPH has a higher contamination risk. Throughput is higher for MLPA. MAPH requires 1 μg of DNA while MLPA requires 100–200 ng to obtain reproducibleresults [63].
QMPSF is a method in which short genomic sequences are simultaneously amplified using dye-labeled primers [64]. The fragments obtained after PCR are separated by capillary electrophoresis. Peak areas of patients and controls are compared and variations in the peak areas are evaluated.
The copy number status can be also determined using MAQ, able to determine the copy number status of multiple loci in a single assay [65]. This technique quantifies fluorescently labeled test and control amplicons, obtained by a single multiplex PCR amplification and separated on a capillary sequencer. The comparison of normalized peak areas between the target amplicons and the control amplicons results in a dosage quotient that indicates the copy number of the CNV in the test sample.
Summary of methods for CNVs analysis
A comparison among the various methods for CNVs identification and genotyping is reported in Table 1. An exhaustive discussion of advantages and disadvantages of each method can be found in Cantsilieris et al. [66]. In general, SNP-array-based methods and aCGH are more convenient for identification of CNVs in the whole genome. They have similar characteristics but SNP-array-based methods perform a slightly higher resolution. NGS methods have higher resolution but they are generally more expensive, require more time (2-3 days) for getting results, and have a low/moderate throughput compared to array-based methods. PCR-based methods have the highest resolution, but they have limited applicability since they can target single locus or a small number of loci.
With the exception of NGS, the discussed methods are mainly quantitative, i.e., they are able to identify if variations in the copy number occur (with some likelihood), but not the exact number of copies. This is often satisfactory, although sometimes a more accurate analysis is needed. In this case, NGS-based methods are better choices.
Methods for CNV association analysis
The software most widely used in the analysis of association studies (including CNVs and AD) is PLINK [67] a free, open-source toolset, which performs various types of analyses, including statistics for quality control, population stratification detection, and case/control association testing. It also includes specific analysis tools for CNV analysis. Other software tools include EIGENSTRAT (http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm), for detecting and correcting for population stratification in GWAS, and the SNP & Variation Suite of GoldenHelix (http://goldenhelix.com/), which contains analytic tools that perform quality-assurance and statistical tests for genetic association studies. Some studies implement their own software by using statistical tools, including ANCOVA [68] and PCA [69] for multivariate analysis, and Bonferroni [70] for multiple testing correction.
DATABASES OF CNVs
Generally, for the analysis of CNVs, the following categories of databases are used: “in-house”, “theme”, and “data aggregators” [33]. “In-house” databases are used to analyze the cases treated by each laboratory itself; “theme” databases refer to CNVs related to particular control populations; “data aggregators” integrate collections of data from different sources.
Several public Internet databases can be used for array data interpretation. The human CNVs are mostly catalogued in the Database of Genomic Variants (DGV) (http://projects.tcag.ca/variation/). DGV was created in 2004 and provides a useful catalogue of control data for studies aiming at correlating genomic variations with phenotypic data. It is freely accessible and is continuously updated with new high quality data, including samples analyzed in different studies. DGV contains a summary of genomic alterations involving segments greater than 50 bp and less than 3 Mb. A new version of DGV in which the majority of CNVs were detected by NGS platforms and methods has been recently developed. Zarrei et al. [71] considered recent high-resolution studies that maximize sensitivity and minimize false discoveries. In the new version, uncertain results from previous studies have been removed. They include CNVs detected from platforms such as BAC array, which overestimate the breakpoints [72], have low resolution, and miss many small variants. Some individual CNVs were removed since previous studies had stated that they were very rare or due to false discoveries. In the new DGV version, Zarrei et al. have also combined the variants of different studies in merged CNVRs (copy number variation regions) and used a CNVR-clusteringalgorithm to identify groups of variants that have at least 50% of reciprocal overlaps [40].
Other databases available are: Database of Chromosomal Imbalance and Phenotype in Humansusing Ensembl Resources (DECIPHER) (http://decipher.sanger.ac.uk/), European Cytogeneticists Association Register of Unbalanced Chromosome Aberrations (ECARUCA) (http://www.ecaruca.net), and International Standards for Cytogenomic Arrays (ISCA) (http://www.iscaconsortium.org). DECIPHER provides information on chromosomal microdeletions and duplications and facilitates the search for genes that influence human development and health. ECARUCA collects cytogenetic and clinical data on rare chromosome disorders. ISCA contains whole genome array data from a subset of clinical diagnostic laboratories.
Other tools are often adopted for obtaining information related to genes included in detected CNVs. Diagnostic laboratories primarily use UCSC Genome Browser (http://genome.ucsc.edu/) since it enables connection to various databases described above [73]. To improve the classification of CNVs, analytical tools are employed. One of the most common is Genomic Classification of CNVs Objectively (GECCO) (http://sourceforge.net/projects/genomegecco/) that includes functionalities for analyzing genomic characteristics as repetitive elements inside CNVs and aids in confirming the pathogenicity of de novo CNVs.
CNVs AND ALZHEIMER’S DISEASE
Several authors have performed studies to identify the potential role of CNVs in the genetic basis of AD. Most of them focused on CNVs longer than 100 kb. Table 2 lists all the genes reported in the studies reviewed, excluding those whose association with the disease was not found significant (p-value >0.05) in GWAS and summarizes the data obtained by the authors.
Based on encompassed genes, CNVs can be classified into different types: CNVs causing Mendelian EOAD; CNVs in high risk AD groups; CNVs in known AD risk genes; and CNVs in genome-wide studies. We discuss each of these types separately.
CNVs causing Mendelian EOAD
To date, duplications in the APP gene are the only pathogenic CNVs found in EOAD families with autosomal dominant transmission. Since the first report [74], this variation has been found almost exclusively in affected subjects (http://www.molgen.ua.ac.be/ADMutations/) and its frequency has been estimated to be 8% in the Mendelian families [74]. A recent study has suggested a limited contribution of APP duplication in familial EOAD and extremely rare in LOAD [75]. These findings have been confirmed by Chapman et al. [76] who found only one event over 3,260 AD patients (average age at onset = 72.91, SD = 8.49) and no events in 1,290controls.
Two different small deletions (<10 kb) of exon 9 of another Mendelian gene, PSEN1, firstly identified by Crook et al. [77] and Smith et al. [78], have been found in some patients with familialEOAD.
CNVs in high risk AD groups
Two studies were performed on EOAD patients for whom mutations in the known genes had been excluded. Rovelet-Lecrux et al. [79] assessed the presence of rare CNVs in familial and sporadic EO patients. The genome-wide study detected seven CNVs encompassing some genes, four of which encoding proteins involved in Aβ peptide metabolism or signaling: KLK6, SLC30A3, MEOX2, FPR2. KLK6 (kallicrein related peptidase 6, chr. 19q13) encodes neurosin, localized in senile plaques and NFTs of AD brains. SLC30A3 (solute carrier family 30 member 3, chr 2p23) encodes the ZnT3 synaptic vesicle zinc transporter. Zn2+ promotes Aβ aggregation in senile plaques or in cerebral amyloid angiopathy. MEOX2 (mesenchyme homeobox 2, chr 7p21) encodes a regulator of vascular differentiation whose expression is low in AD. FPR2 (formyl peptide receptor 2, chr19q13) encodes for a receptor used by Aβ42 to chemoattract and activate mononuclear phagocytic cells.
Hooli et al. [80] conducted a genome-wide CNV study on EO familial AD and early/mixed-onset pedigrees and found 12 novel CNV regions co-segregating with the disease within families. The genes involved (see Table 2) take part into neuronal pathways crucial to brain functioning. In addition, they also detected CNVs encompassing known frontotemporal lobar dementia genes: CHMP2B (charged multivesicular body protein 2B, chr.3p11.2) and MAPT (microtubule associated protein tau, chr.17q21.1).
Altogether, these findings support a possible causal role of some genes in addition to those already known.
CNVs in known AD risk genes
Some authors found CNVs associated to AD in genes previously identified as risk genes for AD in GWA studies (http://www.alzgene.org/). Brouwers et al. [60] evaluated a common LCR-associated CNV in the CR1 gene (complement receptor 1, chr. 1q32) and found that carriers of three LCR1 copies have an increased risk for developing AD compared with individuals with two copies. They also confirmed that LCR1 CNV dosage correlates with the different isoforms (mainly CR1-F and CR1-S) produced by the gene. This study was performed in a Flanders-Belgian cohort and replicated in a French cohort. Chapman et al. [76] identified duplications, that may be pathogenic overlapping CR1 in two patients from a large association study of the Genetic and Environmental Risk for Alzheimer’s disease Consortium (GERAD). On the contrary, Szigeti et al. [69], who analyzed 375 AD patients and 180 controls from the Texas Alzheimer Research and Care Consortium (TARCC), found rare CNVs overlapping BIN1 (bridging integrator 1, chr. 2q14) and the LCR region of CR1 with opposite dosage in cases and controls.
Swaminathan et al. analyzed the role of CNVs in AD and MCI using data from non-Hispanic Caucasian participants in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) [81–83] and the National Institute of Aging-LOAD/National Cell Repository for AD (NIA-LOAD/NCRAD Family Study) [82]. In addition, they also used samples from the Translational Genomics Reasearch Institute (TGen) [83]. They examined a set of candidate genes previously related to AD and identified from the AlzGene database. They detected in two patients a CNV in another known AD risk gene, PICALM (phosphatidylinositol binding clathrin assembly protein, chr. 11q14). Furthermore, they found that the genes RELN (reelin, chr. 7q22), and DOPEY2 (dopey family member 2, chr. 21q22.2) were overlapped by CNVs only in cases (AD and/or MCI) and not in controls in the three studies.
CNVs in genome-wide studies
A number of potentially interesting gene regions have been identified by means of CNV GWAS.
Among the CNVs found in their case-control studies using ADNI participants, Swaminathan et al. [81, 82] analyzed only those present in cases (AD and/or MCI) but not controls (Table 2). However, the data obtained were not significant after correction for multiple testing. In a third study, Swaminathan et al. [83] analyzed TGen and ADNI cohorts and selected a number of genes overlapped by CNVs in at least four cases but not in controls. Among them, only the HLA-DRA (major histocompatibility complex, class II, DR alpha chr.6p21.3) gene showed a significant association with the disease (uncorrected p = 0.0144). Deletions and duplications in the fusion gene CHRFAM7A (CHRNA7 - cholinergic receptor, nicotinic, alpha 7, exons 5–10, chr.15q13.3 - and FAM7A - family with sequence similarity 7A, exon A-E fusion, chr.15q13.1) were found both in cases and in controls (corrected p = 0.0198). A meta-analysis performed for this gene in the same paper, using the findings from the three studies [81–83], revealed a significant association with AD and /or MCI risk (p = 0.006).
Swaminathan et al. [82] also found two AD participants having a CNV > 2 Mb. The first AD participant had a 2.4 Mb deletion on chromosome 11 that overlapped some member of the olfactory receptor genes, a multigene family involved in odorant discrimination [84]. The second AD participant had a 3.2 Mb duplication on chromosome 3 including the GBE1 (glucan (1,4-alpha-), branching enzyme 1,chr.3p12.3) gene. This gene encodes a protein involved in glycogen biosynthesis and it has not been previously associated with AD susceptibility.
A study performed by Ghani et al. [42], in a dataset of AD patients and normal controls of Caribbean Hispanic origin, identified a duplication on chromosome 15q11.2 encompassing up to five genes on chr.15q11.2: TUBGCP5, CYFIP1, NIPA2, NIPA1, and WHAMML1. This duplication showed association with the disease (uncorrected p = 0.037). CYFIP1 and NIPA1 may be important in neurological development [85]. NIPA1 encodes a magnesium transporter associated with early endosomes in neuronal and epithelial cells [86] while CYFIP1 forms a complex at synapses with the fragile X mental retardation protein (FMRP) and eIF4E (FMRP-CYFIP1-eIF4E complex) is involved in synaptic stimulation [87]. In this study a number of rare CNVs were also detected. Furthermore, these authors did not replicate the borderline association (uncorrected p = 0.053) reported by Heinzen et al. [70] between a duplication on chromosome 15q13.3 affecting the CHRNA7 locusand AD.
Chapman et al. [76] carried out a large CNV genome-wide association study on AD patients and normal controls coming from European countries and from USA. They investigated the loci which had been previously highlighted in other smaller studies [70, 42] but they failed to replicate any findings. They only found an excess of CNVs in AD samples in the 15q11.2 region identified by Ghani et al., but this excess was not significant. The lack of significance might depend on the rarity of the involved CNVs, which are observed in a small number of cases, insufficient to establish statistical significance.
Some authors have focused on CNV-Regions (CNVRs), which are union of CNVs that may affect the same biological function. Using gene expression data from pathologically ascertained AD cases, Li et al. [68] identified the following five genes which were both differentially expressed between cases and controls and had over 50% of the variance explained by the cis-CNVstate: ARL17P1 (ADP-ribosylation factor-like 17A, chr.17p21.31), CREB1 (cAMP responsive element binding protein 1, chr.2q34), FAM119A, also known as METTL21A (methyltransferase like 21A, chr.2q33.3), NBPF10 (neuroblastoma breakpoint family, member 10, chr.1q21.1), and SDF4 (stromal cell derived factor 4, chr.1p36.33). The authors identified an 8-kb deletion containing a PAX6-binding site on chr2q33.3 upstream of CREB1, which could explain the altered gene expression. They also performed a case–control study on 1,230 AD subjects and 936 normal controls to test the association of the probes with AD. After multiple testing correction, the 8-kb deletion was found significantly associated with the disease (p = 0.008). It is noteworthy that disruption of CREB1, encoding for a transcription factor, causes neurodegeneration in hippocampus in a mouse model [88]. The potential role in AD of FAM119A, adjacent to CREB1, cannot be excluded.
Guffanti et al. [89] analyzed the distribution of CNVs in ADNI samples that includes 146 AD cases, 313 MCI cases, and 181 controls genotyped using the Human-610 Quad BeadChip. They found large heterozygous deletions in cases (p < 0.0001) and identified 44 copy number variable loci. The number of AD and/or MCI subjects with more than one CNVR deletion was significantly greater in cases than in controls (p = 0.005). Seven out of 44 CNVRs were significantly associated to AD and/or MCI (p-value<0.05). These deletions were present in AD and MCI cases and only in one control.
A duplication and a deletion were found respectively in the 16p13.11 and 17p12 regions in two AD patients [82]. These variants have previously been associated with schizophrenia [90, 91], but notwith AD.
Szigeti et al. [69] in the genome-wide study performed on TARCC participants, also carried out a cases-only analysis to test the CNV association with age at onset (AAO) of AD. The authors confirmed their previously reported chromosome 14 olfactory receptor cluster association with AAO of AD (uncorrected p = 0.03) [92] and identified five CNV regions, with size ranging from 3.6 kb to 24.8 kb suggesting that small and rare events could contribute to the heritability of AAO of AD. They also attempted to replicate these results by analyzing the NIA-LOAD Familial Study dataset probands but failed to reach their goal because of the limited availability of probes in the regions of interest in the platform used. The CNV regions identified by this study overlap with BIN1 (bridging integrator 1, chr. 2q14), the LCR region of CR1 with opposite dosage in cases and controls and the gene CPNE4. However, their results are inconclusive since their findings are supported by CNVs reported in an old version of DGV, which have not been considered in the new version.
DISCUSSION
Several studies have highlighted the role of CNVs in the pathogenesis of neurological diseases. Presently, APP duplication is the only recognized CNV causing AD. Many CNVs have been found in patients but not in controls, or in both these groups; however, further investigations would be appropriate in order to verify their effective correlation with the pathology. The only significant results that survived correction for multiple testing included the CREB1, HLA-DRA, and CHRFAM7A genes, together with a duplication on chromosome 15q11.2 encompassing TUBGCP5, CYFIP1, NIPA2, NIPA1, and WHAMML1. Analysis of CNVRs identified seven of them that were significant for association with cognitive impairment. Deletions within these CNVRs encompass genes involved in biological pathways like axonal guidance, neuronal morphogenesis and differentiation.
In some cases, study findings are not concordant. These discrepancies may be due to different study design, different clinical ascertainment criteria, stringency of the quality control criteria used for sample selection, population origin, and small sample size. Some results might be biased due to batch effects [93], which occur because measurements are affected by laboratory conditions, reagent lots, and personnel differences. Batch effects can be critical in long studies since conditions might vary during thestudy.
Furthermore, it is important to note that sometimes a direct comparison of CNV calls from different studies is difficult if different genotyping platform are used because the location of the probes may not correspond.
Some criticism could be made of those studies that have used the same dataset (ADNI) [81–83, 89]. Replication in independent samples would be useful to overcome a possible circularity of the results previously obtained. It is noteworthy that results from GWAS should not be considered definitive. Indeed they often produce false associations because of multiple testing, and they cannot establish causality [94], but it can simultaneously exclude many true-positive loci. There is no widely recognized multiple-testing correction approach for CNV analysis, and standard thresholds for SNP GWAS are likely to be too stringent because of the strong dependency of overlapping regions in the search space. Correction for multiple testing can discard many false associations, but it can simultaneously exclude many true-positive loci. The conventional threshold of 5×10-8 for measuring significance in GWAS has been recently criticized [95] and is not applicable here since it has been designed mainly for SNPs (hundreds of thousands to millions tested in one experiment). Further work for designing suitable correction methods in genome-wide CNVs analysis is necessary.
In conclusion, the studies performed so far suggest a link between CNVs and AD but further investigations, involving also cytogenetic or molecular techniques, are needed to better understand the functional role of these chromosomal structural variations in the development of the disease.
