Abstract
Background:
Neuroimaging markers provide quantitative insight into brain structure and function in neurodegenerative diseases, such as Alzheimer’s disease, where we lack mechanistic insights to explain pathophysiology. These mechanisms are often mediated by genes and genetic variations and are often studied through the lens of genome-wide association studies. Linking these two disparate layers (i.e., imaging and genetic variation) through causal relationships between biological entities involved in the disease’s etiology would pave the way to large-scale mechanistic reasoning and interpretation.
Objective:
We explore how genetic variants may lead to functional alterations of intermediate molecular traits, which can further impact neuroimaging hallmarks over a series of biological processes across multiple scales.
Methods:
We present an approach in which knowledge pertaining to single nucleotide polymorphisms and imaging readouts is extracted from the literature, encoded in Biological Expression Language, and used in a novel workflow to assist in the functional interpretation of SNPs in a clinical context.
Results:
We demonstrate our approach in a case scenario which proposes KANSL1 as a candidate gene that accounts for the clinically reported correlation between the incidence of the genetic variants and hippocampal atrophy. We find that the workflow prioritizes multiple mechanisms reported in the literature through which KANSL1 may have an impact on hippocampal atrophy such as through the dysregulation of cell proliferation, synaptic plasticity, and metabolic processes.
Conclusion:
We have presented an approach that enables pinpointing relevant genetic variants as well as investigating their functional role in biological processes spanning across several, diverse biological scales.
INTRODUCTION
As aging populations continue to grow, age-ass-ociated disorders such as Alzheimer’s
disease (AD) have become increasingly prevalent [1, 2]. AD is a slow-progressing, complex,
idiopathic disorder in which early diagnosis is challenging because patients do not
initially present symptoms [3]. Emerging
neuroimaging techniques are a versatile, non-invasive approach for the high-resolution,
Neuroimaging techniques quantitatively measure markers of brain structure and function that are considered as endophenotypes, measurable intermediate phenotypes that link molecular changes to organ-spe-cific pathophysiological contexts [4]. One of the numerous neuroanatomical markers considered as an endophenotype is medial temporal atrophy. This well-established AD marker is an intermediate phenotype that implicates the aggregation of hyperphosphorylated tau protein (a well-known molecular change) as a causative biological process of memory decline [5]. The diversity of markers prompted the cataloging and organizing of their information in order to better link clinical readouts to underlying molecular changes. As a first attempt in addressing this need, Iyappan et al. curated the terms used in the literature to describe structural and functional brain information in the Neuroimaging Feature Terminology (NIFT) [6].
Elucidating the effect of genes and genetic variations (e.g., single nucleotide polymorphisms (SNPs)) on brain structure and function often begins with genome-wide association studies (GWASs). However, this type of study only calculates statistical associa-tions between SNPs and traits and ignores mechanistic insights. More robust approaches aimed at addressing the mechanistic shortcomings of GWAS are referred to as imaging genetics [7]. For example, Wachinger et al. [8] studied genetic influences on neuroanatomical shape asymmetries associated with AD progression. Although their findings on the association of genetic variants (i.e., BIN1, CD2AP, ZCWPW1, and ABCA7 genes) to neuroanatomical structures had been reported in previous studies [9–12], here the authors were able to provide an exp-lanation for the observed effect, specifically that alterations in the expression level of the aforementioned genes can affect cellular homeostasis, thus leading to changes in brain symmetry. A common issue facing many imaging genetics approaches is small sample size, which leads to a lack of statistical power, limited replicability, and stratification effects [13, 14]. Alternatively, Stefanovski et al. [15] studied the connection between molecular changes and neuronal population dynamics using differential equations. For example, this study provided a possible mechanistic explanation of how local amyloid beta-mediated synaptic function disinhibition leads to diminishing neural signaling. However, such mathematical models thus far fail to handle the number of variables that are necessary to represent the pathophysiological phenomenon involved in a multifactorial disorder such as AD.
The limitations and lack of mechanistic insights provided by these previously mentioned techniques prompted us to develop a new approach to interpret how a particular genetic variant may have an impact on neuroimaging feature changes through sequences of molecular causalities in the context of AD. Our approach captures knowledge from the literature pertaining to SNPs and imaging readouts in a causal model encoded in Biological Expression Language (BEL) [16] to support the functional interpretation of SNPs in a clinical context. In a case scenario, we propose KANSL1 as a candidate gene mediating the connection between the genetic variants and hippocampal atrophy. We then hypothesized that variants of this gene dysregulate biological processes related to cell proliferation, synaptic plasticity, and energy metabolism that ultimately leads to hippocampal atrophy. These dysregulated biological processes are early events in AD, and they have been posited as attractive therapeutic targets for pharmaceutical intervention [17, 18]. Thus, by garnering these mechanistic insights, it may be possible to reveal novel therapeutic options in the future.
MATERIALS AND METHODS
In order to support the interpretation of the fun-ctional impact of SNPs on the alteration of neuroimaging features, associations between SNPs and imaging readouts were extracted using natural language processing. Linkage disequilibrium (LD) analysis was used to identify co-occurring SNPs and their corresponding or associated genes. These genes were then ranked by how often they appear in the literature in the context of AD. This workflow is described in Fig. 1A.

The two workflows developed for (A) gene prioritization and for (B) generating the mechanistic knowledge assembly around the effect of genetic variants on neuroimaging features in AD. In workflow A, the first step involves the selection of a corpus of relevant scientific literature. Next, the SNPs extracted from this corpus were subjected to LD block analysis and the subsequently obtained SNPs were mapped to their corresponding or associated genes. KANSL1, a novel AD gene, was selected from this pool of mapped genes for further investigation. In workflow B, corpus for the selected gene is extracted and translated into BEL to generate a knowledge assembly model for hypothesis generation.
Based on this analysis, one gene (KANSL1) was selected for further investigation and a corpus explaining its role in AD was enriched with knowledge pertaining to multi-scale biological processes. To enable computer-aided reasoning, manually-extracted relations from this corpus were encoded in BEL. The resulting KANSL1 knowledge assembly was validated using PyBEL [19] and integrated into NeuroMMSig [20]. Finally, NeuroMMSig was then used to investigate the putative role of KANSL1 in neuroimaging feature alteration, namely hippoc-ampal atrophy. For the sake of reproducibility, we have made the workflow publicly available through GitHub (https://github.com/sepehrgolriz/GeVa_NeIF) under the MIT License. This workflow is described in Fig. 1B. Additionally, to investigate the concordance of knowledge around the KANSL1 gene, pathways from three well-known pathway databases were queried to determine those in which the gene is implicated.
Generation of a SNP-neuroimaging corpus
A corpus enriched with neuroimaging features and SNPs in the context of AD was generated
using SCAIView v0.3.3 (https://academia.scaiview.com) on
MEDLINE using the following query: “
Identification of related SNPs via linkage disequilibrium blocks
Over time, dependencies between genetic variants are developed across populations [21]. This phenomenon, described as LD, implies that correlations between genetic variants and traits are caused by the aggregated effect of multiple variants [22, 23]. However, SNP-trait associations identified in the literature are obtained by analyzing thousands of SNPs individually (the “single-marker” approach). Therefore, we performed LD block analysis using HaploReg v4.1 [24] to identify a total of 6,070 SNPs that occur with the SNPs extracted from the literature and further mapped them to their corresponding or associated genes (Supplementary Table 4).
Gene selection
DisGeNET [25] was used to identify diseases associated with the genes obtained from HaploReg v4.1. After filtering out genes not associated with AD, the remaining genes were categorized as either well-known risk variants (supported by a minimum of 5 literature evidence which are enriched with observational studies, such as case-control studies) or as emerging genetic biomarkers (those supported by few or no published evidence) (Supplementary Table 5). Since the involvement of well-known risk variants has been sufficiently described in the literature, this study investigated novel genes which may contribute to AD development.
The study of genetics in the context of multiple phenotypes, such as physiological traits or diseases, can provide a holistic overview of gene functions in a biological system. For this reason, DisGeNET was used to investigate gene-disease associations of genes that are not clearly linked to AD [26]. Although the genes are associated with a broad range of diseases, from autoimmune disorders to different types of cancer, we focused on enriching the mechanistic context surrounding genes linked to conditions, such as Parkinson’s disease (PD), which have substantial genetic, pathological, and clinical overlap with AD [27]. While it is believed that cancer and autoimmune diseases are less prevalent in AD patients, 25 to 33 percent of AD patients show concomitant PD pathology [28, 29]. Of the 25 PD-associated genes acquired, we selected KANSL1 for further study of its putative pathogenic role in AD as it had the highest number of literature evidence and its functionality can thus be better understood [30, 31] (Supplementary Table 6).
Corpus generation, relation extraction, and mechanism enrichment for KANSL1
Using the same strategy and resources as the pre-vious corpus, a new corpus describing
the role of KANSL1 in the context of AD was generated using the following SCAIView query:
“
Knowledge modeling
Manually generating mechanistic hypotheses by linking genetic variants to neuroimaging markers is a daunting task. Therefore, in order to empower com-puter-aided reasoning, the extracted knowledge ass-embly was encoded in BEL. Both the syntax and semantics of BEL encoded in the knowledge assembly were validated using the PyBEL framework.
Knowledge was extracted from the selected cor-pus using the official BEL curation guidelines from https://biological-expression-language.github.io as well as additional guidelines from https://github.com/pharmacome/curation.
Evidence from the selected corpus was manually translated into BEL statements together with their contextual information (e.g., brain regions, brain cell types). For instance, the evidence “BDNF infusion led to rapid phosphorylation of the mitogen-activated protein (MAP) in the adult hippocampus” corresponds to the following BEL statement:
SET MeSHAnatomy=“Hippocampus”
p(HGNC:BDNF) - - p(HGNCGENEFAMILY: “Mitogen-activated protein kinases”, pmod(Ph))
The resulting knowledge assembly was then int-egrated into NeuroMMSig, a web server for mechanism enrichment that allows querying over genes, SNPs, and neuroimaging features in the context of a specific disease. Finally, NeuroMMSig was used to identify the mechanistic model representing the putative role of KANSL1 in hippocampal atrophy.
Comparison of the mechanistic model to pathway knowledge
Several manually curated and highly-cited pathway databases are available to deduce biologically relevant pathways. We used three major ones, namely KEGG [32], Reactome, [33] and WikiPathways [34], in order to determine whether knowledge on the KANSL1 gene has yet to be integrated into these resources. Hence, we queried KANSL1 as well as all other proteins from our mechanistic model in pathways from the three databases.
RESULTS
While KANSL1 has been associated with changes in gene expression levels in the hippocampus [35], its mechanism of action remains elusive. In order to better understand KANSL1’s involvement in hippocampal dysfunction, we queried NeuroMMSig to investigate the downstream effects of this gene. Then, reasoning over the knowledge assembly led us to the interpretation described below. Finally, we report the results of querying KANSL1 and other genes from our mechanistic model in pathways from three major pathway databases to determine which pathways the gene may be implicated in.
The putative role of KANSL1 in hippocampal atrophy
The transcription and expression of the genes promoting cell proliferation (e.g., BTG2) and synaptic plasticity (e.g., BDNF) as well as metabolic proces-ses (e.g., cell energy production) are both of para-mount importance for hippocampal function [36] (Fig. 3). KANSL1 is a protein-coding gene involved in chromatin modification through histone acetylation [37, 38], one of the mechanisms orchestrating gene transcription and expression [39–44]. While histone acetylation transforms the condensed structure of the chromatin into a relaxed architecture enhancing RNA transcription and gene expression, its hypoacetylation causes it to behave adversely [45–47].

This figure shows the results obtained from the LD block analysis and gene mapping. The generation of the SNP-Neuroimaging corpus yielded 745 SNPs. Following LD block analysis, 6,070 SNPs that occur with the SNPs extracted from the literature were identified and located on 136 unique AD associated genes. These genes were then classified according to the number of evidences which are available in the scientific literature. The first group, incorporating 78 AD associated genes, comprises well-known genes characterized by a high number of publications in the AD context. The second group, that includes 58 AD associated genes, comprises emerging genes in the context of AD. From the latter group, KANSL1 was selected.

The putative role of KANSL1 in hippocampal atrophy. A) KANSL1 role in hippocampal neurogenesis. B) KANSL1 function in hippocampal metabolic processes. C) KANSL1 role in hippocampal synaptic plasticity. https://nbviewer.jupyter.org/github/sepehrgolriz/GeVa_NeIF/blob/master/Semi_automatic_developed_pipeline/Exploring% 20KANSL1% 20putative% 20role% 20graph% 20in% 20hippocampal% 20atrophy.ipynb.
KANSL1 and hippocampal neurogenesis
KANSL1 is required for the acetylation of p53 [41], a transcription factor modulating BTG2 expression and a vital protein for hippocampal neurogenesis (i.e., while KANSL1-dependent p53 acetylation induces BTG2 expression, p53 hyperacetylation leads to the overexpression of BTG2) [42, 49]. BTG2 negatively controls the cell cycle since its overexpression results in cell growth rate decline [42, 50]. Through BTG2 binding to Ras (the signaling event mediator), the Ras/MAPK signaling cascade is activated, leading to tau hyperphosphorylation [48]. Tau is a microtubule-associated protein that promotes the assembly and stabilization of cytoskeleton microtubules, both of which are required for cell di-vision (i.e., mitosis). However, tau hyperphosphoryl-ation reduces its capability to bind the microtubules, giving rise to dynamic instability, mitosis impairment, cell cycle deterioration, elimination of proli-ferating newborn neurons, and ultimately to apoptotic processes [51]. In summary, KANSL1 dysfunction disturbs the expression of cell cycle regulatory genes, leading to the perturbation of cell proliferation processes [46, 50] (Fig. 3A).
KANSL1 and hippocampal metabolic processes
The functional crosstalk between KANSL1 and the metabolic processes occurring in the mitochondria (e.g., oxidative phosphorylation) is key for the regulation of hippocampal synaptic plasticity [52–55]. KANSL1 is highly expressed in the mitochondria, where it regulates mitochondrial DNA (mtDNA) transcription and the subsequent translation of genes involved in Oxidative phosphorylation—a set of complex mechanistic processes that form adenosine triphosphate (cell energy currency) by oxidizing nutrients [36, 56]. Oxidative phosphorylation produces potentially harmful reactive oxygen species whose production and detoxification are balanced in normal mitochondria [57]. However, KANSL1 deficiency promotes the downregulation of mtDNA transcription and translation of genes involved in Oxidative phosphorylation, causing reactive oxygen species accumulation. Oxidative stress then occurs, leading to cholesterol metabolism perturbation [58]. Cholesterol homeostasis dysregulation increases cholesterol concentration in cells, leading to synaptic plasticity impairment and ultimately hippocampal shrinkage [59–63] (Fig. 3C).
KANSL1 and hippocampal synaptic plasticity
Long-term potentiation (LTP) is one of the major cellular processes involved in memory formation [64]. BDNF, a member of the neurotrophin family of growth factors, plays a role in LTP [65–67]. One of the mechanisms governing the regulation of BDNF expression is histone acetylation, where KANSL1 contributes significantly as a histone acetyltransferase complex. KANSL1 deficiency might severely affect BDNF expression, which further promotes long-term potentiation impairment and synaptic plasticity. Both are considered to play an important role in memory formation [68] (Fig. 3B).
Pathways implicating genes from mechanistic model
The investigation on the presence, or lack thereof, of KANSL1 in pathways from KEGG, Reactome, and WikiPathways revealed that the KANSL1 gene is largely absent in the major pathway databases. While KANSL1 does participate in the “Chromatin Organization (Homo Sapiens)” and “Pathways Affected in Adenoid Cystic Carcinoma (Homo sapiens)” pathways from WikiPathways, no interaction information for this gene is provided. Moreover, KANSL1 is altogether absent in pathways from KEGG and Reactome. Similarly, we queried pathways from the three databases for all other genes from our mechanistic model (Supplementary Table 7). Unsurprisingly, well-studied genes yielded a higher number of pathways which they participate in (e.g., BDNF was found in 33 pathways across KEGG, Reactome, and WikiPathways), while genes with fewer literature evidence were scarcely present (e.g., KAT8 was found in one pathway across KEGG, Reactome and WikiPathways, however lacked interaction information). Furthermore, these pathway resources do not yet capture SNPs nor image features. Accordingly, the mechanism by which KANSL1 may be implicated in hippocampal atrophy can thus far only be inferred through dedicated modeling approaches, such as the one we have presented in this work.
Assessment of putative KANSL1-mediated mechanism with experimental databases
The putative KANSL1-dependent hippocampal atrophy mechanisms identified through systematically harvested knowledge is based on qualitative information. To further support the mechanisms of action exerted by KANSL1 in the nervous system, we screened evidence from experimental databases containing data sets on knockout mouse models in the Mouse Genome Informatics database [69]. In this database, we queried for KANSL1 and nervous system and found two knockout mice studies which investigated how KANSL1/MAPT dysregulation may cause hippocampal shrinkage [70, 71]. These studies associated tau hyperphosphorylation coupled with impaired microtubule binding of tau with reduction in synaptic transmission and altered synaptic plasticity. Furthermore, the authors argue that these mechanisms may lead to neuronal apoptosis and hippocampal shrinkage (Fig. 3A).
Additionally, with respect to SNPs that occur in the non-coding regions of the gene, we used RegulomeDB [72] to functionally annotate the 60 SNPs associated with KANSL1. RegulomeDB scores SNPs based on transcription factor binding sites, position weight matrix for transcription factor binding, DNase footprinting, open chromatin and chromatin states, expression quantitative trait loci (eQTL), and validated functional SNPs. Moreover, it calculates a score that represents the probability of being a regulatory variant based on functional genomics features along with continuous values such as ChIP-seq signal, DNase-seq signal, information content change, and DeepSEA scores for each SNPs [73]. From the 60 SNPs, our analysis suggested that 11 of them are located in the functional region of KANSL1 (Supplementary Table 8).
DISCUSSION
While the exact mechanism of action of KANSL1 remains obscure, the proposed methodology was able to identify the mechanisms through which it may have an impact on hippocampal atrophy. This demonstrates how the mechanism enrichment approach offers improved interpretation of molecular mechanisms involved in disease pathobiology. Ultimately, the hypotheses derived from such approaches can foster research by identifying unexplored links that have not been validated in the laboratory.
We observed that the information pertaining to different biological scales is not equally distributed in the literature. For example, there is a paucity of results reported at the phenotypic level, compared to those at the molecular or organ level. Shortcomings in knowledge representation at different scales are also reflected in pathway databases which currently do not contain information on SNPs or neuroimaging features. Consequently, linking molecular mechanisms to clinical readouts is one of the great challenges in biomedical informatics.
The results presented in this work are hypotheses that require further investigation. We have shown that despite the scarcity of knowledge from the literature around KANSL1, our approach was able to reveal interesting hypotheses. This sparsity of information surrounding KANSL1 combined with its manifestation as a novel AD associated gene motivates future updates of the knowledge assembly as new information becomes available. Furthermore, in our attempt to validate our hypothesis, we did not find any of the genetic variations of KANSL1 in major AD cohorts, such as Alzheimer’s Disease Neuroimaging Initiative and AddNeuroMed [74, 75]. Thus, future work can include measurements of these genetic variations as well as their expression in these and other independent cohorts. Using these quantitative measurements, if available, several tools can be employed to elucidate pathway signatures in disease as well drug-perturbed states, which can then be used to prioritize drug candidates relevant to the particular disease under investigation when these signatures are anti-correlated [76]. Similarly, gene expression measurements paired with a network containing prior knowledge on drug-disease data can also be used for drug candidate identification [77]. Finally, looking ahead, the presented strategy can be applied to other AD genes or across disease domains such as psychiatric diseases.
