Newly Identified Genetic Associations of Alzheimer Disease by Conditional Selective Inference: Potential Implications for the Tau Hypothesis

Abstract

Over 6 million people are estimated to have been living with Alzheimer disease (AD) in 2020, with another 12 million living with Mild Cognitive Impairment (MCI). Research has been conducted to evaluate genetic links to AD, but more research is needed to improve early disease detection and improve patient outcomes. Diagnostic, demographic information, and single nucleotide polymorphism (SNP) data were collected by the Alzheimer’s Disease Neuroimaging Initiative (ADNI). We performed LASSO regression with conditional selective inference to perform feature selection on the SNPs and other predictors (which included education, race, and marital status), which reduced the number of SNPs from 55 106 to 13 and removed all non-SNP predictors except years of education and marital status. The included SNPs reside in genes that have clinical significance and may be associated with diseases that affect cognitive performance. The results propose the alternative alleles for 7 SNPs are associated with increased risk of AD/MCI diagnosis, while 6 SNPs are associated with decreased risk of diagnosis. The results point to a new potential pathway of disease regarding the PAK5 gene and the Tau protein hypothesis, which is supported by previous research. This research may have clinical implications and should be further studied.

Graphical abstract

Keywords

Alzheimer disease mild cognitive impairment conditional selective inference LASSO

Introduction

Alzheimer disease (AD) is the most common cause of dementia.¹ The disease is characterized by memory loss, difficulty with problem-solving, delusions, and performing basic tasks. People with AD may also experience behavioral issues, and the late stages of the disease may cause an inability to communicate and seizures.² More than 6 million people are estimated to have been living with AD in 2020 in the United States, and that prevalence is expected to more than double by 2060. An estimated 12 million people had been living with Mild Cognitive Impairment (MCI) in 2020, with a projection of close to 22 million people living with MCI in 2060.³ The MCI may include symptoms of cognitive decline, but the symptoms do not significantly impact the sufferer’s basic functioning.⁴ The MCI may lead to AD, but some individuals with MCI will never develop AD.^2,4 Among Americans aged 65 and older, over 11% had clinical AD in 2020.³ With such increases in prevalence projected, the burden on disease sufferers, their families and caretakers, and the health care system may also increase.

Several theories exist in terms of contributing causes of AD. The most studied theory is the amyloid-beta (Aβ) hypothesis. According to this, Aβ, which forms from the breakdown of amyloid precursor protein (APP) in the brain, aggregates to form plaques. These plaques damage neurons, specifically causing damage to the neuronal dendrites.^5,6 However, the timeframe from the formation of Aβ plaques to the presence of AD symptoms is more than 20 years.⁷ Another theory of the cause of AD is the Tau protein hypothesis. Tau protein, usually involved in the stabilization of microtubules in neurons, can become hyperphosphorylated, causing it to misfold and form paired helical filaments,⁸ leading to disruption of neuronal microtubules. This disruption causes neurons to starve, leading to neuron death.⁵

The AD may also have genetic risk components. The strongest known genetic risk factor involves the Apolipoprotein E gene. The E4 allele, known as APOE4, is thought to influence Aβ plaques, abnormal Tau protein tangles, and brain inflammation. However, unlike APOE4, the related APOE2 allele decreases the risk of AD.¹ Also, some recent research has found a connection between APOE4 homozygosity and AD.⁹

The Alzheimer’s Disease Neuroimaging Initiative (ADNI) is a longitudinal study with the goal of improving early detection of AD through the study of imaging, genetic, and other data collected from participants throughout the United States and Canada. The ADNI receives public and private funding and has been ongoing since 2004. The first phase of the study, ADNI-1, was conducted from 2004 to 2009 with the goal of developing biomarkers that could be used as outcomes for clinical trials and included 800 participants.¹⁰

The ADNI-GO was a 2-year extension of the project started in 2009, and ADNI-2 was a 5-year extension of the project started in 2011. The ADNI-GO and ADNI-2 added a respective 200 and 550 new participants, with both also including participants from previous study versions. The ADNI-GO sought to examine biomarkers in early disease stages, and ADNI-2 sought to use biomarkers as predictors of disease. The ADNI-3 was a 5-year extension started in 2016, although its results have not all been released as of February 2024. It added another 371 participants and began studying Tau scans for use in clinical trials.¹⁰

The study overall includes 1921 participants: 483 healthy elderly control participants; 1001 participants with MCI; and 437 participants with AD. Participants were followed over time to study various diagnostic measurements, genetic testing, and various scans, including cognitive tests, magnetic resonance imaging (MRI) scans, and positron emission tomography (PET) scans.¹¹ Over 3700 articles have been published that use ADNI data as of February 2024.¹² This research has indicated various associations with biomarkers and AD and MCI diagnoses.

Current testing for AD (through PET scans and cerebrospinal fluid tests to test for Aβ, for example) is invasive and expensive, with the potential to produce adverse effects, and is not widely available.⁷ Cheaper, more reliable, and less invasive testing for AD and MCI would therefore be beneficial for providing more widespread screening. Since many clinical trials involving participants who already have clinical AD have led to poor results, it would be beneficial to conduct testing and provide treatment as early as possible.⁷ This provides motivation for genetic testing, as it can be done through easier testing (ADNI mainly used peripheral blood tests)¹³ and can be done at any time.

APOE4, as mentioned, is 1 potential risk factor involved in AD diagnosis. However, it is reasonable to expect that AD and MCI have other genetic risk factors that have yet to be characterized. This research seeks to examine relationships between various single nucleotide polymorphisms (SNPs) and diagnosis of AD or MCI, with further review being conducted on genes that contain any associated SNPs.

Methods

Study design

In ADNI-1 and ADNI-GO/2, genetic data were collected through blood samples (ADNI-GO and ADNI-2 had a combined genetic data collection period, with no participant overlap) in addition to various scans and biospecimen collections. These samples were genotpyed using Illumina bead-based microarrays, although ADNI-1 used the Illumina Human610-Quad BeadChip while ADNI-GO/2 used the Illumina HumanOmniExpress BeadChip.¹⁴ The presence of A/B alleles for SNPs served to help eliminate confusion when working with the different methods used in the ADNI versions. Each ADNI version includes over 600 000 genetic markers,¹⁵ including SNP and copy number variation (CNV) data. Data regarding chromosome and position, B allele frequency, and other data about SNPs and CNVs were collected. In ADNI-1, participants attended a screening visit, a baseline visit, and follow-up visits at 6, 12, 18, 24, and 36 months postbaseline visit.¹⁶ In ADNI-2, new patients attended a screening visit, a baseline visit, and follow-up visits. For controls and patients with MCI, follow-up visits were conducted at 3 and 6 months postbaseline visit and every 6 months thereafter. For patients with AD, follow-up visits were conducted at 3, 6, 12, 18, and 24 months postbaseline visit and every 6 months thereafter.¹⁷

Diagnostic data were collected through various cognitive tests, including memory tests, the Boston Naming Test, and others, and diagnostic summaries were created for patients throughout the study and categorized as being normal controls, having MCI, or having AD.^16,17 In ADNI-2, MCI was subcategorized into early and late MCI.¹⁷

Genetic data were collected for 1550 participants, 757 from ADNI-1 and 793 from ADNI-GO/2. Diagnostic data were collected from 2920 participants, including ADNI-3. Genetic data from ADNI-3 had not yet been released at the time this research was conducted.

Participant inclusion and exclusion criteria

Participants were required to be between the ages of 55 and 90 at baseline, have a Geriatric Depression Scale score of less than 6, have a study partner who accompanies them to visits, have proper visual and auditory acuity, have good general health, have completed at least 6 grades of education or a sufficient work history, and agree to collect blood and other samples for testing. Additional requirements can be found at https://adni.loni.usc.edu/wp-content/uploads/2010/09/ADNI_GeneralProceduresManual.pdf and adni2-procedures-manual.pdf. Further requirements were placed upon participants based on their categorization of diagnosis. For example, normal controls must have been free of memory complaints.^16,17

Certain exclusion criteria were put in place throughout the study. In ADNI-1, the main criteria for exclusion centered around specific medications, including certain antidepressants and analgesics.¹⁶ In ADNI-2, the exclusion criteria included diagnosis of certain conditions, including major depression, as well as a history of alcohol or substance abuse.¹⁷ There were various other criteria for exclusion throughout the study.

Data collection and manipulation

Data used in the preparation of this article were obtained from the ADNI database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The original goal of ADNI was to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. The current goals include validating biomarkers for clinical trials, improving the generalizability of ADNI data by increasing diversity in the participant cohort, and providing data concerning the diagnosis and progression of AD to the scientific community (for up-to-date information, see adni.loni.usc.edu).

All data collected by ADNI abides by all relevant ethical guidelines, and informed consent was obtained from all participants.¹⁸

Permission to access and download ADNI study data was obtained through ADNI directly via an electronic application. Data containing SNP and CNV information were downloaded from the ADNI website, as well as data containing information regarding patient diagnostic and demographic data, including blood pressure, age at the baseline visit, gender, years of education, race, marital status, and handedness.^19,20 Genetic data were separated by ADNI version, with 1 genome-wide association study (GWAS) being held for ADNI-1 and another being held for ADNI-GO/2. Genetic data were filtered to only include SNP data and were used in A/B notation, with the A allele being the reference allele. Data regarding the chromosome and position (location on the relevant chromosome) of each SNP were extracted from the data and kept separately for reference after analysis was completed in order to identify the genes nearest to the SNPs. The SNPs were initially transformed into a numerical representation depending on the number of B alleles (ie, “AA,” “AB,” and “BB” were transformed into 0, 1, and 2, respectively).

The genetic data were then combined with the other demographic and diagnostic data. The various data files used 1 of 2 (or, in some cases, both) versions of the patient ID number, which were matched. The most recent available diagnosis for each participant was used. The SNPs were transformed into a binary format, based on the presence or absence of at least 1 B allele. The SNPs with less than 5% variability (ie, greater than 95% of participants having the same version of the binary SNP) were removed from the data, and only complete cases were kept in the data set. Only SNPs that were present in all ADNI versions were kept in the data set.

Some demographic variables were transformed to a binary format due to a lack of variability of the data, with some factor levels having very few options. Specifically, race was made binary (white vs non-white), as was marital status (married vs unmarried). Diagnosis was transformed into a binary variable as well, with the presence of AD or MCI diagnosis being considered a “1” and the absence of either diagnosis being considered a “0” in order to simplify the interpretability of results and allow for binomial logistic regression to be performed.

After the genetic data were combined with the diagnostic and demographic data, only 1465 patients remained, as an inner join was used so as to have complete data for the demographic and diagnostic variables, representing about 95% of the full genetic data set patients. In total, 55 106 SNPs remained after the binary SNPs were filtered based on B allele frequency. After removing duplicate SNP entries, 55 040 SNPs remained for conditional selective inference analysis.

Data analysis

Elastic net cross-validation was conducted on the diagnostic and demographic data in order to choose the optimal α and λ values for elastic net regression and variable selection using the glmnetUtils package.²¹ This method was used because it regularly outperforms the LASSO, and it encourages a grouping effect, which is useful in this case, as there are far more predictors than observations.²² Based on the results of the cross-validation, LASSO regression was performed using the glmnet package, which selected a subset of the SNPs and other predictors, specifically 13 SNPs, years of education, and marital status.²³ Simple regression was then performed on that variable subset with diagnosis as the outcome variable. To calculate more accurate P-values for the predictors, a separate LASSO regression was conducted utilizing conditional selective inference. This method involved re-running the data through elastic net cross-validation with the 55 040 SNPs and utilizing those results to perform a separate LASSO regression that performed conditional selective inference using the selectiveInference package.²⁴ This approach increased accuracy and decreased bias during the P-value calculations. Then, we performed a test for multicollinearity on the model. Adjusted odds ratios and their corresponding 95% confidence intervals were calculated. Finally, the genes in or near which the SNPs were located were determined using the dbSNP database and the bedtools utility software.^25,26

Aside from the data tidying using Bash scripts and the use of the bedtools software to determine some SNP gene locations, all data manipulation and analysis were performed using R version 4.2.0 with RStudio version 2022.07.2.

Results

Participant demographics

The participant demographic data are shown in Table 1. Of the 1465 participants included in the study, more were male than female, and the vast majority of participants were white. The majority of patients had a diagnosis of either AD or MCI, specifically 72.4%.

Table 1.

Participant demographic data, categorical variables.

Categorical variables		N	%
Gender	Female	635	43.3%
Gender	Male	830	56.7%
Race	White	1362	93.0%
Race	Non-white	103	7.0%
Marital status	Married	1114	76.0%
Marital status	Unmarried	351	24.0%
Handedness	Right-handed	1334	91.1%
Handedness	Left-handed	131	8.9%
Diagnosis	AD or MCI	1060	72.4%
Diagnosis	None/control	405	27.6%
Continuous variables		Mean	Range
Age (years)		73.7	54.4-91.4
Education (years)		15.9	4-20
Blood pressure (systolic) (mm Hg)		134.9	83-201
Blood pressure (diastolic) (mm Hg)		74.4	43-108

Demographic data broken down by diagnosis are shown in Table 2. Diagnosis of AD or MCI was higher among males compared to females. Diagnosis was also higher among white participants compared to non-white participants. Married participants had a higher percentage who were diagnosed with AD or MCI compared to non-married participants. Diagnosis was more common among right-handed participants compared to left-handed participants.

Table 2.

Participant demographic data, by diagnosis, categorical variables.

Categorical variables		Normal control		AD or MCI diagnosis
Categorical variables		N	%	N	%
Gender	Female	203	32.0%	432	68.0%
Gender	Male	202	24.3%	628	75.7%
Race	White	366	26.9%	996	73.1%
Race	Non-white	39	37.9%	64	62.1%
Marital status	Married	277	24.9%	837	75.1%
Marital status	Unmarried	128	36.5%	223	63.5%
Handedness	Right-handed	363	27.2%	971	72.8%
Handedness	Left-handed	42	32.1%	89	67.9%
Continuous variables		Normal control		AD or MCI diagnosis
Continuous variables		Mean	Range	Mean	Range
Age (years)		73.1	55.0-89.3	73.9	54.4-91.4
Education (years)		16.5	6-20	15.7	4-20
Blood pressure (systolic) (mm Hg)		134.1	83-192	135.2	86-201
Blood pressure (diastolic) (mm Hg)		74.3	49-100	74.5	43-108

Elastic net cross-validation and LASSO regression

Elastic net cross-validation was conducted, excluding participant ID as a variable. Multiple values of α were tested, as well as multiple values of λ, to minimize the cross-validation error of the data when performing elastic net regression for variable selection. The ideal value of α was 1, which pointed to performing LASSO regression on the data. The ideal value of λ was 0.0412638. Results of the elastic net cross-validation are illustrated in Figure 1.

Figure 1.

Elastic net cross-validation selects optimal α.

Then, LASSO regression was run on the full data set, excluding participant ID as a variable. The ideal values of α and λ obtained from elastic net cross-validation were used. The LASSO regression left 13 SNPs and 2 other predictors and reduced the coefficients of all other predictors to 0. The 13 remaining SNPs were rs11086694, rs2075650, rs2094277, rs2261682, rs31887, rs4745514, rs4816158, rs4826619, rs6640551, rs6809370, rs7312407, rs919751, and rs9857853. The other remaining predictors were years of education and marital status.

Logistic regression and associations

Simple logistic regression was then performed on the 15 predictors remaining after LASSO regression with the diagnosis of AD or MCI as the outcome variable. The adjusted odds ratios and their corresponding 95% confidence intervals are shown in Table 3.

Table 3.

Adjusted odds ratios and corresponding 95% confidence intervals.

Predictor	Chromosome	Position (GRCh37)	Gene	Location	Adjusted odds ratio	95% confidence interval
rs11086694	20	58653337	LINC02910	Intron	1.59	(1.23, 2.05)
rs2075650	19	45395619	TOMM40	Intron	2.37	(1.81, 3.12)
rs2094277	13	75190387	LINC00347	Intron	0.71	(0.54, 0.93)
rs2261682	2	128620372	AMMECR1L	Exon	1.59	(1.21, 2.09)
rs31887	5	11423592	CTNND2	Intron	0.55	(0.38, 0.79)
rs4745514	9	71629914	PRKACG	Upstream/intergenic	2.03	(1.27, 3.23)
rs4816158	20	9632609	PAK5	Intron	0.31	(0.18, 0.50)
rs4826619	X	50387787	SHROOM4	Intron	0.43	(0.30, 0.60)
rs6640551	X	9872157	SHROOM2	Intron	0.69	(0.53, 0.89)
rs6809370	3	183577904	PARL	Intron	1.95	(1.46, 2.60)
rs7312407	12	68340257	LINC01479	Intron	1.89	(1.22, 2.90)
rs919751	5	149505489	PDGFRB	Intron	1.47	(1.13, 1.90)
rs9857853	3	149838890	LOC105374313	Intron	0.52	(0.37, 0.72)
Education	N/A	N/A	N/A	N/A	0.87	(0.83, 0.91)
Marital status	N/A	N/A	N/A	N/A	0.67	(0.50, 0.89)

The reference level for marital status is “Married.” N/A: not applicable.

Using the predictors found after LASSO regression to inform simple logistic regression is a naïve approach that can lead to biased (artificially low) P-values regarding the observed associations. This is why a separate regression was performed to calculate more accurate P-values utilizing conditional selective inference, which conditions the inference on the LASSO variable selection, which allows for more robust P-values.²⁷ This method’s cross-validation led to the LASSO regression including 12 extra predictors (all SNPs), which may be due to the data set including 66 fewer SNPs initially, which could affect the results. This occurred due to a lack of variability among 67 SNPs, which the software could not handle. Even with this issue, the extra predictors included in the method with conditional selective inference were not significant. The P-values obtained through simple logistic regression and the P-values obtained through conditional selective inference can be found in Table 4.

Table 4.

P-values, before and after conditional selective inference.

Predictor	Chromosome	Position (GRCh37)	Gene	Location	β (coefficient)(logistic regression)	P (logistic regression)	P (conditional selective inference)
rs11086694	20	58653337	LINC02910	Intron	0.46119	<.001	.442
rs2075650	19	45395619	TOMM40	Intron	0.86299	<.001	<.001
rs2094277	13	75190387	LINC00347	Intron	-0.34303	.013	.709
rs2261682	2	128620372	AMMECR1L	Exon	0.46152	<.001	.473
rs31887	5	11423592	CTNND2	Intron	-0.60574	.001	.192
rs4745514	9	71629914	PRKACG	Upstream/intergenic	0.70905	.002	.573
rs4816158	20	9632609	PAK5	Intron	-1.18076	<.001	.026
rs4826619	X	50387787	SHROOM4	Intron	-0.84584	<.001	.003
rs6640551	X	9872157	SHROOM2	Intron	-0.37642	.004	.504
rs6809370	3	183577904	PARL	Intron	0.66735	<.001	.023
rs7312407	12	68340257	LINC01479	Intron	0.63408	.004	.541
rs919751	5	149505489	PDGFRB	Intron	0.38252	.004	.517
rs9857853	3	149838890	LOC105374313	Intron	-0.65483	<.001	.259
Education	N/A	N/A	N/A	N/A	-0.13689	<.001	.003
Marital status	N/A	N/A	N/A	N/A	-0.40626	.005	.372
rs10942262					(Not included in original LASSO regression or logistic regression)		.513
rs12422895							.557
rs1866361							.844
rs1873442							.871
rs323467							.918
rs4936046							.887
rs4977761							.67
rs714180							.909
rs720202							.754
rs7214481							.687
rs7428265							.97
rs7933268							.621

The reference level for marital status is “Married.” N/A: not applicable.

A check for multicollinearity was performed on the simple logistic regression model, using the calculation of the variance inflation factor (VIF) of each model predictor, utilizing the car package.²⁸ After this test, the VIF of each model predictor was close to 1, indicating that there is no multicollinearity in the model. The results of the check for multicollinearity can be found in Table 5.

Table 5.

Variance inflation factors of model predictors.

Predictor	Chromosome	Position (GRCh37)	Gene	Location	Variance inflation factor
rs11086694	20	58653337	LINC02910	Intron	1.010628
rs2075650	19	45395619	TOMM40	Intron	1.017578
rs2094277	13	75190387	LINC00347	Intron	1.018341
rs2261682	2	128620372	AMMECR1L	Exon	1.017602
rs31887	5	11423592	CTNND2	Intron	1.014424
rs4745514	9	71629914	PRKACG	Upstream/intergenic	1.014226
rs4816158	20	9632609	PAK5	Intron	1.016831
rs4826619	X	50387787	SHROOM4	Intron	1.033316
rs6640551	X	9872157	SHROOM2	Intron	1.03213
rs6809370	3	183577904	PARL	Intron	1.01963
rs7312407	12	68340257	LINC01479	Intron	1.015184
rs919751	5	149505489	PDGFRB	Intron	1.022981
rs9857853	3	149838890	LOC105374313	Intron	1.017855
Education	N/A	N/A	N/A	N/A	1.03674
Marital status	N/A	N/A	N/A	N/A	1.02345

The reference level for marital status is “Married.” N/A: not applicable.

Interpretation of genetic predictors

The model found significant associations between the 13 SNPs, years of education, and marital status. However, the presence of B allele in the SNPs rs11086694, rs2075650, rs2261682, rs4745514, rs6809370, rs7312407, and rs919751 was associated with an increased risk of AD or MCI, while the presence of the B allele in the SNPs rs2094277, rs31887, rs4816158, 4826619, rs6640551, and rs9857853 was associated with a decreased risk of AD or MCI. An increase in years of education was associated with a decreased risk of diagnosis, with a 1-year increase in education being associated with an odds ratio of 0.87 for diagnosis. Finally, being married was associated with an increased diagnosis risk, with an unmarried person (compared to a married person) being associated with an odds ratio of 0.67 for diagnosis.

In order to determine the genes in which the SNPs were located, dbSNP was used as a reference, except for rs11086694 and rs2094277, which required the use of the bedtools software. The nearest genes to these intergenic SNPs were identified using SNP coordinates and the GRCh37.75 ENSEMBL genome build using bedtools. Information regarding gene function, gene expression, and potential gene-disease associations was obtained through the National Center for Biotechnology Information’s Gene database.²⁹

The SNP rs2075650 is located in chromosome 19 in the TOMM40 gene.³⁰ The TOMM40 gene has the function of importing protein precursors into mitochondria.³¹ With the relevance of APP in AD, this SNP may play a role in the disease. This is further exemplified by previous research finding an association between rs2075650 and AD.³² This association was found to be significant with conditional selective inference, so its significance is more robust.

The SNP rs4816158 is located in chromosome 20 in the PAK5 gene.³³ This gene induces microtubule stabilization, promotes neurite growth, and regulates cytoskeleton dynamics. It is mostly expressed in the brain.³⁴ Considering the impact of the Tau protein and microtubule stabilization in AD development, this SNP and its gene are particularly in need of further study. Previous research has indicated the role PAK5 plays in microtubule stabilization and its potential impact on AD.^35,36 Notably, this association was found to be significant by conditional selective inference.

The SNP rs6640551, located in the X chromosome, is in the gene known as SHROOM2.³⁷ This gene functions in the formation of new blood vessels and the formation of contractile networks in endothelial cells. It is associated with ocular albinism type 1 syndrome.³⁸ Given the association between AD and the loss of blood flow, a gene that controls the formation of new blood vessels is of interest.

The SNP rs6809370 is located on chromosome 3 and is in the PARL gene.³⁹ It is involved in mitochondrial remodeling and apoptosis, and it has a potential association with Parkinson disease.⁴⁰ This SNP was found to have a significant association under conditional selective inference.

The SNP rs31887 is in chromosome 5 in the CTNND2 gene,⁴¹ which is involved in brain and eye development and is expressed mostly in the brain ⁴². The CTNND2 gene is of interest due to its function in brain development.

The gene, PRKACG, is the location of the SNP rs4745514, which is located on chromosome 9.⁴³ This gene encodes the gamma form of one of the Protein Kinase A subunits, which recognizes and phosphorylates Tau.^44,45

The SHROOM4 gene contains the SNP rs4826619 on the X chromosome⁴⁶ (the same chromosome as the SHROOM2 gene) and may be involved in cytoskeletal architecture,⁴⁷ similarly to the PAK5 gene. The rs4826619 was found to be significant through the conditional selective inference method used.

Similarly, rs919751, located in chromosome 5 and part of the PDGFRB gene,⁴⁸ is involved in actin cytoskeleton, and also involved in the development of the cardiovascular system. It is potentially associated with 5q-syndrome,⁴⁹ which is a condition that affects bone marrow cells and leads to a form of anemia.⁵⁰ The associations regarding the cardiovascular system this SNP may have made it of interest for further study.

The SNP rs2261682 is located in chromosome 2 in the AMMECR1L gene.⁵¹ This gene is expressed fairly evenly throughout most human tissues, but its highest expression is in testis tissue.⁵² This gene is similar to the AMMECR1 gene, which has an unknown function. However, the AMMECR1 gene is associated with AMME Complex,^53,54 which is shorthand for a condition that includes Alport syndrome, intellectual disability (the second “M” in the abbreviation used to stand for “mental retardation,” although the abbreviation has not changed with the use of “intellectual disability”), midface hypoplasia, and elliptocytosis syndrome.⁵⁵ Elliptocytosis is categorized by red blood cells having an elliptical shape instead of having a round shape, which is mediated by cytoskeleton proteins.⁵⁶ The cardiovascular implications of this gene and the similarities among it and the PAK5 and SHROOM4 genes regarding cytoskeletal structure make it worthy of further investigation.

The aforementioned genes have all been protein-coding genes. However, the model included 4 non-coding RNA genes. The rs11086694 is located in chromosome 20 (the same as rs4816158) in or near the LINC02910 gene, and rs2094277 is in chromosome 13 in the LINC00347 gene. Although not much is known about these genes, LINC02910 is expressed mostly in bone marrow.⁵⁷ Notably, LINC00347 is expressed almost entirely in testis tissue.⁵⁸

The SNP rs7312407 is located in chromosome 12 in another non-coding RNA gene, LINC01479.⁵⁹ This gene is expressed mostly in the heart,⁶⁰ and, given the aforementioned impact the cardiovascular system plays on the development of AD, could make it a significant gene to further study. The rs9857853 is located in chromosome 3 (the same as rs6809370) and is located in the non-coding RNA gene LOC105374313,⁶¹ which is expressed mostly in testis tissue.⁶²

Despite the unknown functions of non-coding RNA genes, they may play a role in gene expression regulation and have potential impacts on AD.⁶³ Thus, they should not be discounted in their relationships to AD due to the lack of knowledge regarding their functions.

Interpretation of non-genetic predictors

Years of education was significantly associated with a decreased risk on diagnosis of AD or MCI, based on the simple logistic regression model. While this aligns with current research, which states that higher levels of education are causally associated with reduced risk or delayed onset of AD,⁶⁴ although some research skeptically states that the relationship between education and AD may be affected by intelligence,⁶⁵ it is difficult to interpret the adjusted odds ratio of the variable, due to its continuous nature. However, since this association was found to be significant after performing conditional selective inference, this association should be further studied.

Marital status was associated with AD, with unmarried people being at a lower risk of diagnosis compared to married people. However, this is in contrast to existing literature. One study found that unmarried people were at a greater risk of developing dementia compared to married individuals.⁶⁶ The reason for the finding here is unknown. It is possible that marital status is a confounder for another variable not shown in the data. It also may have been due to the transformation of the variable into a binary format. Regardless, this association, having been found to not be significant using the conditional selective inference method, should be interpreted warily.

Discussion

While ADNI was a thoroughly conducted study, it and the research shown here have limitations. The ADNI may suffer from selection bias, due to the fact that some participants who otherwise would have developed AD or MCI could have died or left the study before being diagnosed or showing symptoms. Also, since late-onset AD is defined as having its onset at age 65 or older,⁶⁷ patients who entered ADNI at age 55 and only stayed in the study for under 10 years would not have even been old enough to develop late-onset AD, further adding to the selection bias for participants who may have developed AD later in life, even if they survived long enough to develop symptoms and be diagnosed, as they would have no longer been in the study. Another consideration is the possibility of issues arising from using different genotyping methods in the different ADNI versions.

The AD disproportionately affects minority groups compared to white individuals in terms of frequency.⁷ Given this, the ADNI study having overwhelmingly white participants (the data here included 93% white participants) points to potential issues with generalizability.

One minor issue in the data is that some of the ages and years of education were slightly outside of the range specified in the inclusion criteria, although those values were not far from that range. The age range was not violated by more than 1.4 years in either direction, and the education range was only violated by 2 years on the lower end.

The education variable (measured in years) was a bit left-skewed, but due to its uneven nature (most people seemed to complete either 0, 2, 4, or 6, or 8+ years of higher education), most transformations of the data were not effective at reshaping it into a normal distribution. This posed a potential issue with the data, as, while logistic regression does not require normality of predictors, it does require continuous predictors to be linear in relation to their logits.⁶⁸ However, after performing a Box-Tidwell test, which is an appropriate method of checking that assumption,⁶⁹ on the education variable, the result showed that the education variable did not violate that assumption (P = .83, showing a small chance of violating the assumption).

The binary transformations involved throughout this process, while allowing for simpler interpretation, inherently also lead to some loss of information. For example, the distinction between AD and MCI is lost in this analysis, although it is important. Further research into this topic could include multinomial regression.

As mentioned previously, the conditional selective inference method led to 12 extra predictors being included after the comparative LASSO regression. While this is likely due to some of the SNPs being removed from the data to allow for the method to be conducted, this does have an impact on the resulting P-values for the SNPs included in the initial LASSO and simple logistic regressions. However, these P-values are still likely more valid in comparison to the P-values conducted directly through the logistic regression method (and any that were conducted through the initial LASSO regression, as those P-values would be inherently biased).

Another issue relates to the LASSO method used. Although the elastic net cross-validation method pointed to an α value of 1 being optimal, there are still inherent issues with the use of the LASSO when analyzing genetic data. Namely, this method may struggle with correlated predictors, meaning that it may have excluded important SNPs that are biologically relevant to the outcome measure or included predictors that are truly irrelevant to AD diagnosis. Further work is needed here to refine the methods used, especially in attempting to determine causal associations between genetic and other predictors and AD diagnosis.

This research points to potential new insight into the pathways of disease for AD. These pathways include the TOMM40, PAK5, SHROOM4, and PARL genes. The potential association of the PARL gene and another neurodegenerative disease (Parkinson disease) hints at a potential association with AD. There is potential that the TOMM40 gene, influenced by the B allele of the SNP rs2075650, leads to issues regarding amyloid precursor protein and, potentially, Aβ in the brain, leading to the formation of Aβ plaques and associated microglial-driven inflammation. However, this pathway is not clear from this research. What is clearer, however, is the potential PAK5-related pathway of disease. It is possible that an issue in the PAK5 gene due to the lack of the B allele in the SNP rs4816158 (as the B allele was shown to be protective in this research) leads to a higher likelihood of Tau proteins within neurons misfolding and leading to neuronal starvation. Previous research indicates that PAK5 plays a role in preventing the destabilization of microtubules.³⁶ Research has found that PAK5 inhibits the process by which Tau is phosphorylated and has linked it to AD.³⁵ The SHROOM4 gene has some similar function to the PAK5 gene, which, especially given its significance in this research, further emphasizes the potential of this disease pathway. While this research does not necessarily point to a causal relationship between these genes and AD, the potential of these pathways of disease, especially for the PAK5 pathway, should be further investigated.

Conclusion

Given the detrimental impact of AD on the physical, emotional, and financial aspects of the lives of sufferers and their caregivers, as well as the impact on the health care system in the United States and beyond, it should be further studied. In addition, the demand for early, accurate, and inexpensive testing and genetic associations of AD and MCI further motivates neurodegenerative research, especially considering the recent research determining APOE4 homozygotes to be a genetic form of AD.⁹ The research presented here points mostly to a need for further research of the SNP rs2075650 with its associated TOMM40 gene and the SNP rs4816158 with its associated PAK5 gene, due to the functions of those genes that have the potential to significantly impact AD risk. Further study of these and other potential associations could lead to improved early detection and, thus, early treatment of AD, and could reduce the burden of the disease.

Footnotes

Acknowledgements

We would like to acknowledge the members of the thesis committee for which this work was initially conceptualized who are not also authors of this work: Chi Hyun Lee and Jing Qian. Their assistance in this work greatly improved its quality. Data collection and sharing for the Alzheimer’s Disease Neuroimaging Initiative (ADNI) is funded by the National Institute on Aging (National Institutes of Health Grant U19AG024904). The grantee organization is the Northern California Institute for Research and Education. In the past, ADNI has also received funding from the National Institute of Biomedical Imaging and Bioengineering, the Canadian Institutes of Health Research, and private sector contributions through the Foundation for the National Institutes of Health (FNIH) including generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc; Biogen; BristolMyers Squibb Company; CereSpir, Inc; Cogstate; Eisai Inc; Elan Pharmaceuticals, Inc; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc; Fujirebio; GE Healthcare; IXICO Ltd; Janssen Alzheimer Immunotherapy Research & Development, LLC; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co, Inc; Meso Scale Diagnostics, LLC; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics.

ORCID iDs

Scott Hebert

Eric Nels Pederson

Ethical considerations

Ethical approval was not required for this work.

Ethics

No artificial intelligence tools were used to write any portion of this work or in any other way throughout the process of conducting this research. In addition, no artificial intelligence tools were used to generate or modify any data in this work.

Consent to participate

Participant consent was not required for this work, as no participants were recruited specifically for this work.

Consent for publication

Not applicable.

Author contributions

SH and ZO conceptualized the work. SH contributed to the writing, data collection, and data analysis. ENP supplied the work of data pre-processing in the early stages of analysis. ENP and ZO were the main contributors to the editing of the work. SH and ZO developed the methodology of the analysis.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

Data used in this work contain data used by ADNI, which prohibits the redistribution of individual-level data in any manner.

References

Parhizkar

Holtzman

DM.

APOE mediated neuroinflammation and neurodegeneration in Alzheimer’s disease. Semin Immunol. 2022;59:101594. doi:10.1016/j.smim.2022.101594

What are the signs of Alzheimer’s disease? National Institute on Aging. Accessed November 28, 2023. https://www.nia.nih.gov/health/alzheimers-symptoms-and-diagnosis/what-are-signs-alzheimers-disease

Rajan

Weuve

Barnes

McAninch

Wilson

Evans

DA.

Population estimate of people with clinical Alzheimer’s disease and mild cognitive impairment in the United States (2020–2060). Alzheimers Dement. 2021;17:1966-1975. doi:10.1002/alz.12362

Gauthier

Reisberg

Zaudig

, et al. Mild cognitive impairment. Lancet. 2006;367:1262-1270. doi:10.1016/S0140-6736(06)68542-5

Kocahan

Doğan

Mechanisms of Alzheimer’s Disease pathogenesis and prevention: the brain, neural pathology, N-methyl-D-aspartate receptors, tau protein and other risk factors. Clin Psychopharmacol Neurosci. 2017;15:1-8. doi:10.9758/cpn.2017.15.1.1

Dorostkar

Zou

Blazquez-Llorca

Herms

Analyzing dendritic spine pathology in Alzheimer’s disease: problems and opportunities. Acta Neuropathol. 2015;130:1-19. doi:10.1007/s00401-015-1449-5

Wong

Economic burden of Alzheimer disease and managed care considerations. Am J Manag Care. 2020;26:s177-s183. doi:10.37765/ajmc.2020.88482

Muralidar

Ambi

Sekaran

Thirumalai

Palaniappan

Role of tau protein in Alzheimer’s disease: the prime pathological player. Int J Biol Macromol. 2020;163:1599-1617. doi:10.1016/j.ijbiomac.2020.07.327

Fortea

Pegueroles

Alcolea

, et al. APOE4 homozygozity represents a distinct genetic form of Alzheimer’s disease. Nat Med. 2024;30:1284-1291. doi:10.1038/s41591-024-02931-w.

10.

ADNI. About. Accessed November 28, 2023. https://adni.loni.usc.edu/about/

11.

ADNI. Study design. Accessed November 28, 2023. https://adni.loni.usc.edu/study-design/

12.

ADNI. Publications. Accessed November 28, 2023. https://adni.loni.usc.edu/news-publications/publications/

13.

ADNI. Genetic data methods. Accessed November 28, 2023. https://adni.loni.usc.edu/methods/genetic-data-methods/

14.

ADNI. Genetic data. Accessed November 28, 2023. https://adni.loni.usc.edu/data-samples/data-types/genetic-data/

15.

ADNI. Methods and tools. Accessed November 28, 2023. https://adni.loni.usc.edu/methods/

16.

Alzheimer’s Disease Neuroimaging Initiative. ADNI procedures manual. Published 2010. Accessed July 2, 2025. https://adni.loni.usc.edu/wp-content/uploads/2010/09/ADNI_GeneralProceduresManual.pdf

17.

Alzheimer’s Disease Neuroimaging Initiative. ADNI 2 procedures manual. Published 2008. Accessed July 2, 2025. https://adni.loni.usc.edu/wp-content/uploads/2008/07/adni2-procedures-manual.pdf

18.

Alzheimer’s Disease Neuroimaging Initiative. Alzheimer’s Disease Neuroimaging Protocol (ADNI). Accessed June 9, 2025. https://adni.loni.usc.edu/wp-content/themes/freshnews-dev-v2/documents/clinical/ADNI-1_Protocol.pdf

19.

University of Southern California Laboratory of Neuro Imaging. Download genetic data. Accessed January 28, 2024. https://ida.loni.usc.edu/pages/access/geneticData.jsp?project=ADNI&page=DOWNLOADS

20.

University of Southern California Laboratory of Neuro Imaging. Download study data. Accessed January 28, 2024. https://ida.loni.usc.edu/pages/access/studyData.jsp?project=ADNI

21.

Microsoft

Ooi H

. glmnetUtils: utilities for “Glmnet.” Published September 10, 2023. Accessed February 22, 2024. https://cran.r-project.org/web/packages/glmnetUtils/index.html

22.

Zou

Hastie

Regularization and variable selection via the elastic net. J R Stat Soc. 2005;67:301-320.

23.

Friedman

Hastie

Tibshirani

, et al. glmnet: lasso and elastic-net regularized generalized linear models. Published August 22, 2023. Accessed February 22, 2024. https://cran.r-project.org/web/packages/glmnet/index.html

24.

Tibshirani

Taylor

Loftus

Reid

Markovic

selectiveInference: tools for post-selection inference. September 7, 2019. Accessed February 22, 2024. https://cran.r-project.org/web/packages/selectiveInference/index.html

25.

Quinlan

. bedtools: a powerful toolset for genome arithmetic. Accessed January 28, 2024. https://bedtools.readthedocs.io/en/latest/

26.

National Library of Medicine, National Center for Biotechnology Information. dbSNP. Accessed January 28, 2024. https://www.ncbi.nlm.nih.gov/snp/

27.

Duy

VNL

Takeuchi

. More powerful conditional selective inference for generalized lasso by parametric programming. Published May 11, 2021. doi:10.48550/arXiv.2105.04920

28.

Fox

Weisberg

Price

, et al. car: companion to applied regression. Published March 30, 2023. Accessed February 22, 2024. https://cran.r-project.org/web/packages/car/index.html

29.

Home. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/

30.

rs2075650 RefSNP Report–dbSNP. NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/snp/rs2075650

31.

TOMM40 translocase of outer mitochondrial membrane 40 [Homo sapiens (human)]. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/10452

32.

Potkin

Guffanti

Lakatos

, et al. Hippocampal atrophy as a quantitative trait in a genome-wide association study identifying novel susceptibility genes for Alzheimer’s disease. PLoS ONE. 2009;4:e6501. doi:10.1371/journal.pone.0006501.

33.

rs4816158 RefSNP Report–dbSNP. NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/snp/rs4816158

34.

PAK5 p21 (RAC1) activated kinase 5 [Homo sapiens (human)]. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/57144

35.

Timm

Marx

Panneerselvam

Mandelkow

EM.

Structure and regulation of MARK, a kinase involved in abnormal phosphorylation of Tau protein. BMC Neurosci. 2008;9:S9. doi:10.1186/1471-2202-9-S2-S9

36.

Matenia

Mandelkow

EM.

The tau of MARK: a polarized view of the cytoskeleton. Trends Biochem Sci. 2009;34:332-342. doi:10.1016/j.tibs.2009.03.008

37.

rs6640551 RefSNP Report–dbSNP. NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/snp/rs6640551

38.

SHROOM2 shroom family member 2 [Homo sapiens (human)]. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/357

39.

rs6809370 RefSNP Report–dbSNP. NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/snp/rs6809370

40.

PARL presenilin associated rhomboid like [Homo sapiens (human)]. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/55486

41.

rs31887 RefSNP Report–dbSNP. NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/snp/rs31887

42.

CTNND2 catenin delta 2 [Homo sapiens (human)]–Gene. NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/1501

43.

rs4745514 RefSNP Report–dbSNP. NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/snp/rs4745514

44.

PRKACG protein kinase cAMP-activated catalytic subunit gamma [Homo sapiens (human)]. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/5568

45.

Jicha

Weaver

Lane

, et al. cAMP-dependent protein kinase phosphorylations on tau in Alzheimer’s disease. J Neurosci. 1999;19:7486-7494. doi:10.1523/JNEUROSCI.19-17-07486.1999

46.

rs4826619 RefSNP Report–dbSNP. NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/snp/rs4826619

47.

SHROOM4 shroom family member 4 [Homo sapiens (human)]. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/57477

48.

rs919751 RefSNP Report–dbSNP. NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/snp/rs919751

49.

PDGFRB platelet derived growth factor receptor beta [Homo sapiens (human)]. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/5159

50.

Definition of 5q minus syndrome–NCI Dictionary of Cancer Terms. NCI. Published February 2, 2011. Accessed November 28, 2023. https://www.cancer.gov/publications/dictionaries/cancer-terms/def/5q-minus-syndrome

51.

rs2261682 RefSNP Report–dbSNP. NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/snp/rs2261682

52.

AMMECR1L AMMECR1 like [Homo sapiens (human)]. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/83607

53.

GeneCards. AMMECR1 Gene–AMMECR Nuclear Protein 1. Accessed July 2, 2025. https://www.genecards.org/cgi-bin/carddisp.pl?gene=AMMECR1

54.

AMMECR1 AMMECR nuclear protein 1 [Homo sapiens (human)]. Gene–NCBI. Accessed March 29, 2024. https://www.ncbi.nlm.nih.gov/gene/9949

55.

Alport syndrome-intellectual disability-midface hypoplasia-elliptocytosis syndrome–NIH Genetic Testing Registry (GTR). NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gtr/conditions/C1846242/

56.

Jha

Vaqar

Hereditary elliptocytosis. Statpearls Publishing; 2023. Accessed November 28. 2023. http://www.ncbi.nlm.nih.gov/books/NBK562333/

57.

LINC02910 long intergenic non-protein coding RNA 2910 [Homo sapiens (human)]. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/284756

58.

LINC00347 long intergenic non-protein coding RNA 347 [Homo sapiens (human)]. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/?term=linc00347

59.

rs7312407 RefSNP Report–dbSNP. NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/snp/rs7312407

60.

LINC01479 long intergenic non-protein coding RNA 1479 [Homo sapiens (human)]. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/?term=linc01479

61.

rs9857853 RefSNP Report–dbSNP. NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/snp/rs9857853

62.

LOC105374313 uncharacterized LOC105374313 [Homo sapiens (human)]. Gene–NCBI. Accessed November 28, 2023. https://www.ncbi.nlm.nih.gov/gene/?term=LOC105374313

63.

Wang

Lemos Duarte

Rothman

Cai

Zhang

Non-coding RNAs in Alzheimer’s disease: perspectives from omics studies. Hum Mol Genet. 2022;31:R54-R61. doi:10.1093/hmg/ddac202

64.

Zhang

Tian

Wang

Tan

JT.

The Epidemiology of Alzheimer’s disease modifiable risk factors and prevention. J Prev Alzheimers Dis. 2021;8:313-321. doi:10.14283/jpad.2021.15

65.

Anderson

Howe

Wade

, et al. Education, intelligence and Alzheimer’s disease: evidence from a multivariable two-sample Mendelian randomization study. Int J Epidemiol. 2020;49:1163-1172. doi:10.1093/ije/dyz280

66.

Liu

Zhang

Choi

won Langa

KM.

Marital status and dementia: evidence from the health and retirement study. J Gerontol Ser B. 2020;75:1783-1795. doi:10.1093/geronb/gbz087

67.

Rabinovici

GD.

Late-onset Alzheimer disease. Contin Lifelong Learn Neurol. 2019;25:14-33. doi:10.1212/CON.0000000000000700

68.

Stoltzfus

JC.

Logistic regression: a brief primer. Acad Emerg Med. 2011;18:1099-1104. doi:10.1111/j.1553-2712.2011.01185.x.

69.

Shrestha

Application of binary logistic regression model to assess the likelihood of overweight. Am J Theor Appl Stat. 2019;8:18. doi:10.11648/j.ajtas.20190801.13