Sage Journals: Discover world-class research

Abstract

Artificial intelligence (AI)-based drug repurposing is an emerging strategy to identify drug candidates to treat rare diseases. However, cutting-edge algorithms based on deep learning typically do not provide a human understandable explanation supporting their predictions. This is a problem because it hampers biologists’ ability to decide which predictions are the most plausible drug candidates to test in costly lab experiments. In this study, we propose rd-explainer a novel AI drug repurposing method for rare diseases which obtains possible drug candidates together with human understandable explanations. The method is based on graph neural network technology and explanations were generated as semantic graphs using state-of-the-art explainable AI (XAI). The model learns features from current background knowledge on the target rare disease structured as a knowledge graph, which integrates curated facts and their evidence on different biomedical entities such as symptoms, drugs, genes, and ortholog genes. Our experiments demonstrate that our method has excellent performance that is superior to state-of-the-art models. We investigated the application of XAI on drug repurposing for rare diseases and we prove our method is capable of discovering plausible drug candidates based on testable explanations.

Keywords

rare disease knowledge graph drug repurposing graph neural network explainable AI

1. Introduction

Developing new drugs is a challenging effort that often ends with the drug not being able to launch. Recent studies have shown that around 90% of drugs fail to be approved during their clinical development (Sun et al., 2022). This leads to a fruitless expenditure of both time and money that will yield no financial returns. The situation is even worse in the case of rare diseases, as pharmaceutical companies may consider it risky to invest large amounts of resources into developing drugs that only a small percent of the population will need. Nonetheless, in total, human beings are affected by approximately 7,000 rare diseases, of which only 5% have an effective treatment (Haendel et al., 2020); and only in Europe between 27 and 36 million people suffer from rare diseases ( Rare diseases ).

In this scenario, drug repurposing strategies have appeared as a possible approach to solve these issues. By reusing drugs that have already been approved, companies can avoid many of the costly and time-consuming steps of clinical trials. In this context, innovative approaches to drug repurposing, such as computational strategies and artificial intelligence (AI)-driven methodologies, have emerged as promising solutions to address these challenges. Graph-based drug repurposing is another noticeable strategy that has gained attention in recent years. By constructing intricate networks of molecular interactions, genes, proteins, and diseases, this approach unveils hidden relationships and connections that might otherwise go unnoticed (Guney et al., 2016).

Still, many people remain skeptical about AI-driven decisions, especially machine learning (ML) and deep learning, as many of them come with no explanation that can help to understand the reason why they should be trusted (also called black-box AI). This issue is especially significant in the healthcare field, where decisions may have an important impact on people’s lives. Also, giving valid explanations can help researchers to point in the right direction in the generation of hypotheses that are testable in labs and enable a solid knowledge discovery. Furthermore, the EU General Data Protection Regulation is requesting the AI industry to fulfill the “right to explanation” (Goodman & Flaxman, 2017). This “right to explanation” implies that when a decision is significantly affected by an automated process/algorithm, the individual can demand an explanation. In recent years, many different tools have appeared to try and cover this gap in the emerging explainable AI (XAI) research area (Huang et al., 2023; Pezeshkpour et al., 2019; Ribeiro et al., 2016).

In this study, we explore whether AI can be used to produce both predictions and explanations in computational drug repurposing for rare diseases and, if so, how helpful can these explanations be for hypothesis generation. The main objective of this work was to develop and implement a pipeline to find marketed drugs that can be used to treat the symptoms of a rare disease. Our approach is based on cutting-edge AI algorithms used in computational drug repurposing such as graph ML using knowledge graphs (KGs) and graph neural networks (GNNs), and XAI methodology to provide the explanations supporting the drug predictions made by AI models. The approach was evaluated by selecting Duchenne muscular dystrophy (DMD) as a case study, a genetic disorder that is the most common form of muscular dystrophy (Szigyarto & Spitali, 2018). We demonstrate the generalizability of our approach by applying the pipeline to different rare diseases.

2. Related Work

2.1. Knowledge Graph-Based Drug Repurposing

The state-of-the-art of computational drug repurposing approaches makes use of graph-based structures and AI techniques to find potential drug candidates. One of the main advantages of using graph structures is that they can easily incorporate information from different sources. This is especially important in the domain of rare diseases, where information is distributed and often scarce. The ability to integrate as much relevant data as possible can confer a significant advantage. An example of this would be the recent study of Al Al-Saleem et al. (2021), where a knowledge graph was used to discover drug candidates to treat COVID-19.

Different ML algorithms can be used to analyze knowledge graphs, including matrix factorization, random-walk approaches (node2vec; Grover & Leskovec, 2016), geometric embeddings (DistMul; B. Yang et al., 2014), and GNNs; Ferrari et al., 2022; Yue et al., 2020, each one of them with their own advantages and disadvantages, see Table 1. In our study, we used a combination of random-walk approaches and GNNs as in contrast to other methods (like matrix factorization or geometric embeddings) they can easily incorporate new information without the need of retraining the ML model. This is especially relevant in the field of drug repurposing where new information about drugs, genes, and diseases is being published (Hsieh et al., 2021; Sadeghi et al., 2022; Zhang et al., 2022).

Table 1.
Comparison of Different Graph-Based Machine Learning Methods in Drug Repurposing.

Method Example Advantages Disadvantages Applications

Matrix factorization ADA-GRMFC (Zhang & Xie, 2022) Captures global relationships between entities.
Simple and interpretable.
Effective for sparse graphs. Computationally expensive for large graphs.
Difficulty in incorporating new data without retraining. Suitable for large-scale recommendation systems

Random-walks node2vec (Grover & Leskovec, 2016) Efficient for large graphs.
Easy to implement.
Can capture node proximity. Limited to local information; misses long-range dependencies.
Cannot utilize node features or graph structure. Useful for tasks requiring efficient exploration of graph neighborhoods

Geometric embeddings DistMult (B. Yang et al., 2014) Produces interpretable low-dimensional embeddings.
Scalable and efficient for sparse graphs.
Performs well on link prediction tasks. Captures only local information, missing complex graph interactions.
Cannot handle high-order relationships or complex structures. Effective in link prediction or node classification tasks with relatively simple graph structures.

Graph neural networks (GNNs) GraphSAGE (Hamilton et al., 2017) Aggregates local and global node features.
Inductive learning, generalizes to unseen nodes.
Scalable and flexible. Computationally intensive for large graphs.
“Black-box” nature hinders interpretability.
Sensitive to choice of aggregation function. Ideal for large, dynamic graphs in drug repurposing, where new entities are constantly introduced.

Method	Example	Advantages	Disadvantages	Applications
Matrix factorization	ADA-GRMFC (Zhang & Xie, 2022)	Captures global relationships between entities. Simple and interpretable. Effective for sparse graphs.	Computationally expensive for large graphs. Difficulty in incorporating new data without retraining.	Suitable for large-scale recommendation systems
Random-walks	node2vec (Grover & Leskovec, 2016)	Efficient for large graphs. Easy to implement. Can capture node proximity.	Limited to local information; misses long-range dependencies. Cannot utilize node features or graph structure.	Useful for tasks requiring efficient exploration of graph neighborhoods
Geometric embeddings	DistMult (B. Yang et al., 2014)	Produces interpretable low-dimensional embeddings. Scalable and efficient for sparse graphs. Performs well on link prediction tasks.	Captures only local information, missing complex graph interactions. Cannot handle high-order relationships or complex structures.	Effective in link prediction or node classification tasks with relatively simple graph structures.
Graph neural networks (GNNs)	GraphSAGE (Hamilton et al., 2017)	Aggregates local and global node features. Inductive learning, generalizes to unseen nodes. Scalable and flexible.	Computationally intensive for large graphs. “Black-box” nature hinders interpretability. Sensitive to choice of aggregation function.	Ideal for large, dynamic graphs in drug repurposing, where new entities are constantly introduced.

2.2. XAI on Graph ML

One of the graph-based methods that can provide explanations of the predictions, also called local explanations, is (Graph)LIME (Huang et al., 2023), an adaptation of the popular and more general explainability method LIME (Ribeiro et al., 2016). The idea behind this method is the following: when trying to get an explanation for a given prediction, (Graph)LIME performs small perturbations to the features of nodes, and sees how the predictions vary with respect to the initial prediction. The more the prediction changes, the more the model is relying on that feature to obtain its prediction. This way, explanations in this model are given in the form of a set of node features. Among its drawbacks, this method can only be used in node classification tasks. Another explainability method is CRIAGE (Pezeshkpour et al., 2019) where explanations are given as a set of rules.

Several other explainability methods have been proposed for Graph ML, including PGExplainer (Luo et al., 2024) and GRETEL (Xie et al., 2022). PGExplainer generates explanations by learning a probabilistic mask over graph structures, making it more flexible in terms of capturing various graph features. GRETEL, on the other hand, is designed to provide global explanations, making it different from other methods that focus on local interpretability.

Finally, the method chosen in this work is GNNExplainer (Ying et al., 2019). The insight of how this method works is the following: given an initial prediction (link prediction, node classification, or graph classification) obtained through a GNN, GNNExplainer finds a subset of node features and edges that are responsible for the prediction. This subset is obtained by training an edge and node mask. This method was chosen as explanations are provided in the form of a subgraph that can be easily understandable. Additionally, it is a posthoc XAI method, that is, it is GNN model-agnostic, which means that if more sophisticated GNNs are developed in the future, these new GNNs can be easily incorporated into the pipeline. Furthermore, as a posthoc method, its explanations might not always be faithful to the model’s decision-making process. If a GNN has been trained on noisy data, GNNExplainer may highlight irrelevant edges or nodes simply because they correlate with predictions. These features make it a popular method in the research community (Kim, 2023; Pfeifer et al., 2022; Sun et al., 2023). However, a major drawback is that it lacks consistency when obtaining explanations. This means that explanations on the same prediction can significantly change if running GNNExplainer several times. A summary of the methods can be found in Table 2.

Table 2.
Summary of Explainability Methods in Graph ML.

Method Explanation Type Main Drawback

GraphLIME (Huang et al., 2023) Feature-based Limited to node features

CRIAGE (Pezeshkpour et al., 2019) Rule-based Requires rule extraction

GNNExplainer (Ying et al., 2019) Subgraph-based Inconsistent explanations

PGExplainer (Luo et al., 2024) Probabilistic mask Complexity in training

GRETEL (Xie et al., 2022) Global explanation Not applicable to local explanations

Method	Explanation Type	Main Drawback
GraphLIME (Huang et al., 2023)	Feature-based	Limited to node features
CRIAGE (Pezeshkpour et al., 2019)	Rule-based	Requires rule extraction
GNNExplainer (Ying et al., 2019)	Subgraph-based	Inconsistent explanations
PGExplainer (Luo et al., 2024)	Probabilistic mask	Complexity in training
GRETEL (Xie et al., 2022)	Global explanation	Not applicable to local explanations

Note. ML = machine learning.

3. Method

3.1. rd-explainer Method Overview

rd-explainer is a drug repurposing method we developed for rare diseases and its pipeline is illustrated in Figure 1. rd-explainer has three modules: the Knowledge Graph Construction module constructs a KG for the specific rare disease and drug repurposing task, the Prediction module trains a GNN model and predicts drug candidates for the rare disease symptoms, and the Explainer module computes the most important semantic subgraphs that explain the connection between the predicted drug and the symptom. First, disease-related information is gathered from different data sources: Monarch Initiative knowledge base ( Monarch Initiative Explorer ) for disease pathology, and DrugCentral (Avram et al., 2021) and Therapeutic Target Database (Zhou et al., 2022) for disease druggability. This information is then preprocessed and captured as a knowledge graph. Next, for each node in the graph, a feature vector is obtained that will be used as input for the GNN model. This is done by making use of a method known as edge2vec (Gao et al., 2019) to consider the different edge semantics in the KG for node embedding learning. We used the version extracted from GitHub (accessed in 2021).¹ The next step is to build and train the GNN model, which is done using the GraphSAGE framework for learning graph representation (Hamilton et al., 2017). Next, link prediction is performed for each drug–symptom node embedding pair using the dot product as a scoring function. Finally, we produced prediction explanations as semantic graphs using GNNExplainer (Ying et al., 2019), a recent and, to our knowledge, one of the first XAI methods for obtaining explanations from GNN predictions.

Figure 1.

rd-explainer drug repurposing method pipeline developed in this work.

3.2. Rare Disease-Specific Drug Repurposing Knowledge Graphs

3.2.1. Data Sources

Data were obtained from three different sources: Monarch ( Monarch Initiative Explorer ) (accessed in 2021), DrugCentral (Avram et al., 2021) (2021 version), and Therapeutic Target Database (TTD) (Zhou et al., 2022) (November 8th, 2021 version). Monarch is a knowledge base built on semantic principles, unifying gene, variant, genotype, phenotype, and disease data across different species. Its primary aim is to establish links between genes and phenotypes, thereby facilitating computational exploration of human disease biology. Monarch was chosen because it contains curated information across different species. This way, because rare diseases are often less studied than common diseases, incorporating information from other species can maximize the amount of knowledge in the graph. However, Monarch does not specialize in drug information.

Drug information was incorporated from DrugCentral (drug-target information) and from TTD (drug–disease information). DrugCentral is a comprehensive online database that provides information about approved drugs, active ingredients, and other pharmaceutical products. One of its major features is that it is open source and its data is freely available to anyone. For this project, we made use only of the drug-target information (as it is the main piece of information that is not present in Monarch) downloaded as a tsv file from their site (Zhou et al., 2022).² Similarly, TTD is a database that specializes in drugs and their respective therapeutic targets. Once more, this database is freely accessible and its information can be easily downloaded in the csv format (in this project, we just made use of the drug–disease information (Zhou et al., 2022),³ once again because it is the information missing in Monarch).

3.2.2. Knowledge Graph Construction

To extract information from Monarch, the BioKnowledge Reviewer (Queralt-Rosinach et al., 2020) tool was used. This tool was originally created to collect knowledge from several sources and create a knowledge graph that could be later used for hypothesis generation. It works by using several seeds (node identifiers [IDs]) as input to query the Monarch API and constructing the graph based on the neighborhood of those seeds. After introducing the seeds in the BioKnowledge Reviewer pipeline, the final output is the rare disease research question-specific knowledge graph structured in two dataframes (stored as csv files). One of them contains a list of nodes with their respective name, IDs, semantic entity type, synonyms, and description. The second file contains the list of edges, again containing the IDs of the entities participating in each link and other edge information such as type of edge, supporting evidence, and reference date. Monarch was our main source of information and therefore served as a starting point to create the rest of the graph. This way, data from other data sources were modified to fit Monarch’s standards by unifying the identifiers. Finally, the graphs were constructed using the NetworkX Python library (NetworkX). With this library, the dataframes extracted using the BioKnowledge Reviewer were converted into a Graph object.

We integrated data into two different knowledge graphs to perform the experiments. Each one of them was constructed using different (number of) node seeds to extract information from Monarch. The first one (KG A) uses only two seeds: DMD seed (HGNC:2928), corresponding to the human gene that causes the disease; and DMD seed (MONDO:0010679), corresponding to the disease itself. The second graph (KG B) extends KG A by including as seeds all phenotypes of the rare disease (in total, 27 more seeds). The seeds used for the construction of each graph can be found in Tables S1 and S2 in the Supplemental materials. The idea of creating two different graphs is to find out if the performance of the model and the quality of the explanations increase by incorporating more (phenotypic) information.

3.3. ML Model and XAI

3.3.1. Node Features

At this point, none of the nodes has any specific node features. It is possible to run a GNN relying only on graph information, that is, network topology (this is done, e.g., by using the node degree as graph feature); nonetheless, this resulted in poor performance (results not shown). To increase the efficiency of the model, edge2vec was used to produce a specific embedding for each node that captures information about its neighborhood. edge2vec (Gao et al., 2019) is a tool that generates node embeddings based on the node neighborhood and types of edges connecting each node. After executing edge2vec, each node was given a unique feature vector. Since edge2vec is an unsupervised method that does not use task-specific labels, these embeddings serve as general-purpose representations of the graph structure rather than encoding direct knowledge of the downstream task. This approach ensures that the GNN still needs to learn task-relevant patterns, rather than relying only on the precomputed embeddings.

3.3.2. Data Splitting

As any other machine learning task, data needs to be split into training, validation, and test sets. However, when tackling a link prediction task, there are different ways to perform this split. In link or edge prediction tasks, edges can be divided into two groups: message-passing edges and supervision edges. Message-passing edges are the ones that will be used by our GNN to obtain the embeddings, while supervision edges are the ones that will be used to test the performance of our model (CS224W; DeepSNAP). Additionally, when creating the supervision edges it is necessary to include negative examples by applying negative sampling. Negative sample edges are those not present in the original graph—pairs of entities known to be unconnected or for which no link is known. The goal is for the neural network to learn to distinguish true (positive) edges from false (negative) ones. In general, one negative edge is created for each true edge (CS224W; DeepSNAP).

In this work, we selected the all-graph transductive split (CS224W; DeepSNAP). This method divides the data as follows: in the training set, the supervision edges and message-passing edges are the same. In the validation set, the message-passing edges are the same as those in the training set, while the supervision edges are different from the training supervision edges. Finally, in the test set, the message-passing edges consist of the validation edges, and the supervision edges are distinct from both the training and validation supervision edges.

This method is one of the standard settings for link prediction tasks, as the whole graph can be seen in all dataset splits (CS224W). The split proportion we used was 80% of edges for training set, 10% for validation set, and 10% for test set. The training set was used to train the model, the validation set to select the best hyperparameters, and the test set to obtain the global performance of the model.

To avoid data leakage, node features were obtained by running edge2vec only in the train split; this way, no information from the validation or test set is seen during the training. This procedure was just used during the evaluation of the model to ensure our experiment was unbiased.

3.3.3. GNN Model

We first utilized a GNN algorithm to learn vector representation embeddings for nodes in our knowledge graphs. Then, we applied these node embeddings for drug–phenotype link prediction. The GNN algorithm that we used in this work is called GraphSAGE (Hamilton et al., 2017). GraphSAGE performs inductive graph representation learning by leveraging rich node attribute information. The main advantage that was brought by GraphSAGE is its scalability: instead of working with full batches (the whole graph is seen during the training) it works with mini-batches. Each mini-batch is a subset of computational graphs (a computational graph is the individual GNN that is built for each node) of $N$ nodes. By applying this technique, the GNN can better manage larger graphs. The GraphSAGE model was created using the DeepSNAP library (DeepSNAP) to obtain the predictions. Hyperparameter optimization was performed using Ray Tune (Liaw et al., 2018), as it is a model-agnostic library that allows multiple trials to be run in parallel, reducing the training time. The list of hyperparameters that needed to be tuned and the optimal values can be found in Table S5 in the Supplemental materials. In total, 30 models were created (each of them containing a random selection of parameters).

The final model consists of a GraphSAGE-based neural network that processes node embeddings through two graph convolutional layers using mean aggregation. The first SAGEConv layer transforms the input features into a 264-dimensional hidden representation, followed by batch normalization, LeakyReLU activation, and dropout (0.2) to prevent overfitting. The second SAGEConv layer maps the hidden representation to a 64-dimensional output space, which serves as the final node embeddings. Link prediction is performed by computing the dot product between the embeddings of node pairs. The model is trained for 150 epochs using the Binary Cross-Entropy with Logits Loss function and optimized with a learning rate of 0.07.

3.3.4. Drug–Phenotype Link Predictions

The GNN model generates embeddings for individual nodes within the graph as its final output. By applying the dot product between distinct node pairs and applying a sigmoid function, we obtain a value that shows the likelihood of a link existing between those nodes. Consequently, we obtain dot products between each drug and every phenotype in the graph, and rank them in descending order. The top-ranked dot products are considered the most promising targets. Links that were already present on the graph were removed from the ranking.

3.3.5. Graph-Based Prediction Explanations

We applied GNNExplainer to generate explanations for every drug–phenotype prediction. To do so, we adapted the pipeline code (from PyTorch geometric version 2.0.4) to generate explanations for the link prediction task, which was not implemented in the authors’ version (Ying et al., 2019) (see pseudocode in Algorithm 1 in the Supplemental materials). However, this XAI algorithm has a robustness problem in the explanations it produces (Agarwal et al., 2023) and, additionally, it may produce disconnected graphs that affect the interpretability of explanations by domain-users. To solve this issue, we developed the following procedure. First, we assume that a complete explanation is one that connects the two targeted nodes. If drug A can treat phenotype B, there must be some common pathway that allows A to interact with B. This way, the procedure starts by running GNNExplainer for several iterations. In each iteration, NetworkX is used to check if, in the subgraph generated by GNNExplainer, a path exists between both nodes. If no path is found, it continues with the next iteration; if it does exist, it stops iterating and that subgraph is considered to be the final explanation. If no subgraph is found that satisfies the “pathway” condition, the last subgraph is returned as a possible explanation.

In total, seven phenotypes were selected to evaluate the explanations (Muscular Dystrophy [HP:0003560], Respiratory Insufficiency [HP:0002093], Arrhythmia [HP:0011675], Congestive Heart Failure [HP:0001635], Dilated Cardiomyopathy [HP:0001644], Cognitive Impairment [HP:0100543], and Progressive Muscle Weakness [HP:0003323]). These phenotypes were selected to cover all the main areas that are affected by the disease (muscular, respiratory, cardiac, and intellectual symptoms). For each prediction obtained in these phenotypes (three drug predictions per phenotype), an explanation was obtained. This process was done for the predictions coming from KG A and for those coming from KG B. This makes a total of 42 explanations (21 for each graph).

Regarding the parameters of GNNExplainer, because the graphs are highly connected, explanations were generated by using the 1-hop neighborhood of the graph. Using a higher $k$ -hop neighborhood is not recommended as the number of nodes in the subgraph increases exponentially which can make it difficult to understand the explanation. This happens because both graphs are scale-free graphs, and thus, by increasing the number of hops there is a higher chance that a “hub-node” is hit, and the number of nodes escalates exponentially (see Section 4.1).

Additionally, the maximum size of the explanations was set to 15 (this means that no more than 15 edges will be part of the explanation). This way, we will avoid obtaining too complex explanations with many edges that might be impossible to comprehend by researchers. This was done by selecting the edges whose contribution values are among the 15th highest values.

Finally, the maximum number of iterations was set to 10. In other words, if after 10 iterations GNNExplainer has not found an explanation that connects the drug candidate with the targeted phenotype it will conclude that no “complete” explanation was found, and the last explanation produced by GNNExplainer will be the one that will serve as the final answer. This parameter can be increased or reduced depending on the expectations of the researcher. A large number of iterations increases the chances of finding a complete explanation at the cost of more computational time. In contrast, reducing the number of iterations reduces the computational time, which can be useful if a researcher wants to obtain explanations for a large number of predictions.

3.4. Evaluation and Metrics

3.4.1. Evaluation of GNN Model

Data. We used both graphs KG A and B. Data was split into three sets: training set, validation set, and test set. Baselines. Our baselines include edge2vec (Gao et al., 2019), GraphSAGE (Hamilton et al., 2017), ComplEX (Trouillon et al., 2016), DistMult (B. Yang et al., 2014), and TransE (Bordes et al., 2013). Evaluation metrics. The area under the precision–recall curve (AUPRC) was used to validate and test the performance of the model, as it has been shown to lead to better precision when evaluating link prediction (Y. Yang et al., 2014). Additionally, we also computed the area under the ROC (Receiver-operating characteristic) curve (AUROC), precision, recall, and the $F 1$ -score metrics—the harmonic mean of precision and recall.

Other evaluations were developed to further assess the performance of the model. These evaluations include the testing of different negative sampling sizes ( $n = 1$ , 5, 10, and 20) to determine the importance of keeping the data balanced. Additionally, a regular 10-fold cross-validation and a biased 7-fold cross-validation were performed. The biased cross-validation consists of the following: in each fold, four phenotypes were removed from the training set, and it was observed how well the model was able to predict the links of the removed phenotypes.

3.4.2. Evaluation of Explanations

The evaluation of the explanations was done manually, following a two-step process. First, they were classified as complete or incomplete explanations based on the appearance of a connection between the drug and the phenotype. We developed a function to visualize the explanations as semantic graphs based on Pytorch Geometic's visualization function⁴ (see the Section “Visualization of Explanations” in the Supplemental materials for further details). This way, if the explanation contains a link between the drug and the phenotype it is considered to be a complete explanation. These explanations are considered to be the most useful as they can be easily understood and interpreted. However, explanations where there is no link between drug and phenotype (where there are two separate clusters) or where only one of the target elements (either the drug or the phenotype) is missing, are considered incomplete explanations. Several illustrative examples are provided in the Supplemental materials (see the Section “Complete/Incomplete Explanation Example”).

During the second step, we evaluated the explanations using an objective and a subjective approach. First, complete explanations were reviewed and a manual search was performed to check whether the explanation proposed by the model had already been described in the literature (objective evaluation). This process was only performed for predictions that have supporting evidence in the literature and that were classified as complete explanations. The literature examination was performed using PubMed and Google Scholar during the first half of 2022. Finally, each explanation was evaluated for domain knowledge from rare disease researchers (subjective evaluation).

4. Results

4.1. Rare Disease KG Topology and Representation for Drug Repurposing

We generated two different drug repurposing knowledge graphs for the DMD rare disease. KG A contains 10,786 nodes, 93,905 directed edges. The average node degree of the graph ( $\frac{2 \times n u m b e r o f e d g e s}{n u m b e r o f n o d e s}$ ) is 10.83, being the node with the highest degree, the human DMD gene, with a total degree of 1,683. The diameter of the graph was 6, meaning that the longest shortest path between two nodes is 6 (in other words, one can travel from one node to another in 6 steps or fewer). The final feature that was obtained is the clustering coefficient, which measures the extent to which a graph is clustered together. In a complete graph (where all nodes are connected to all nodes) this clustering coefficient is equal to 1, while in a tree-like graph this coefficient is equal to 0. In KG A, this clustering coefficient is equal to 0.33. A summary of the features can be found in Table 3.

Table 3.
Table Showing Features of KG A and KG B.

Property KG A KG B

Number of nodes 10,786 83,665

Number of directed edges 93,855 1,984,774

Number of undirected edges 58,435 1,440,418

Average degree 10.83 34.43

Highest degree 1,683 4,817

Diameter 6 7

Average clustering coefficient 0.33 0.48

Number of drugs 337 1,565

Number of diseases 5,419 25,636

Number of drug–disease pairs 86 599

Property	KG A	KG B
Number of nodes	10,786	83,665
Number of directed edges	93,855	1,984,774
Number of undirected edges	58,435	1,440,418
Average degree	10.83	34.43
Highest degree	1,683	4,817
Diameter	6	7
Average clustering coefficient	0.33	0.48
Number of drugs	337	1,565
Number of diseases	5,419	25,636
Number of drug–disease pairs	86	599

KG = knowledge graph.

In the case of KG B (built from 29 nodes: KG A seeds extended by 27 phenotypes of DMD), the total number of nodes is 83,665, with a total of 1,984,774 directed edges. The average degree in this case is 34.43, being the node with the highest degree the physiological process “Protein Binding” with a total degree of 4,817. The diameter of the graph is 7, which shows one of the features of scale-free networks: despite increasing the number of nodes 8 times and the number of edges 20 times, the diameter of graph B only increased one unit with respect to graph A. In this case, the clustering coefficient is equal to 0.48, showing that KG B is more clustered. Table 3 shows a summary of the features of both graphs.

The schema of the knowledge graph, which is the same for KG A and KG B, can be seen in Figure 2 and shows how the eight different node types interact with each other. The schema contains 24 and 29 different edge types for KG A and KG B respectively, which are not included in this figure for clarity, but are listed in the Supplemental materials S3 and S4.

Figure 2.

Schema of the knowledge graph. Node types are: drugs or chemical compounds (DRUG), genes (GENE), symptoms/phenotypes or diseases (DISO), gene variants (VARI), genotypes (GENO), gene orthologs (ORTHO), anatomical structures (ANAT), and biological processes (PHYS).

4.2. GNN Model Performance for Rare Disease-Specific Drug Repurposing

In total, two GNNs were used, one trained on KG A and one trained on KG B. Hyperparameter optimization was developed using Ray Tune and optimal values can be found in Table S5 in the Supplemental materials. These hyperparameters were obtained by training several GNN models (Random Search) on graph A; and were later used to train a GNN model on graph B.

To measure link prediction performance, the scores obtained were precision, recall, and the $F 1$ -score, and can be found in Table 4 (the threshold used was 0.8). We found that both models (the one trained with KG A and the one trained with KG B) show high performance ( $F 1$ -score = 0.92 and 0.94 in KG A and KG B, respectively in the test set). To visualize the performance of the link prediction task, the ROC curve of KG A and KG B obtained in the test set can be found in Figures 3 and 4, respectively.

Figure 3.

AUROC on the test dataset using KG A. AUROC = area under the ROC curve; KG = knowledge graph.

Figure 4.

AUROC on the test dataset using KG B. AUROC = area under the ROC curve; KG = knowledge graph.

Table 4.

Precision, Recall, and $F 1$ -Score Obtained on Each Dataset, Trained on Each Graph.

	Precision		Recall		$F 1$ -Score
Dataset	KG A	KG B	KG A	KG B	KG A	KG B
Training	0.97	0.95	0.97	0.94	0.97	0.94
Validation	0.92	0.94	0.92	0.94	0.92	0.94
Test	0.92	0.94	0.92	0.94	0.92	0.94

KG = knowledge graph.

4.3. Evaluating rd-explainer with State-of-the-Art Methods

First, we evaluated our GNN model applying different strategies and compared its performance with the state-of-the-art graph embeddings used in drug repurposing methods. Then, we evaluated our approach based on its ability to predict drugs that have already been reported in the literature for a new symptom or phenotype.

We performed a regular 10-fold cross-validation and a biased 7-fold cross-validation evaluation in KG A. The regular 10-fold cross-validation obtained an average AUPRC of 0.98 and an average AUROC of 0.98. For the biased 7-fold cross-validation, in each fold four symptoms (along with the edges connected to those symptoms) were removed from the training set. The performance of the model was then tested on the removed symptoms. In this case, the average AUPRC was 0.75 and the AUROC was 0.8.

The performance of the pipeline was evaluated for a different number of negative edges. This evaluation was only performed in KG A due to the large increase in the number of edges in the evaluation tests (and the consequential increase in the computational time). The results can be seen in Table 5. It is seen that as the number of negative edges increases, the PR curve is affected while the ROC curve remains mostly intact, a result that has been previously reported (Junuthula et al., 2016).

Table 5.
Performance as the Number of Negative Edge Samples Increases.

Number of Negative Edges Precision Recall $F 1$ -Score AUROC AUPRC

1 0.92 0.92 0.92 0.97 0.97

5 0.86 0.92 0.89 0.97 0.90

10 0.87 0.90 0.89 0.97 0.84

20 0.69 0.92 0.75 0.96 0.74

Number of Negative Edges	Precision	Recall	$F 1$ -Score	AUROC	AUPRC
1	0.92	0.92	0.92	0.97	0.97
5	0.86	0.92	0.89	0.97	0.90
10	0.87	0.90	0.89	0.97	0.84
20	0.69	0.92	0.75	0.96	0.74

Note. This results were obtained using KG A. AUROC = area under the ROC curve; AUPRC = area under the precision–recall curve; KG = knowledge graph.

Finally, the performance of rd-explainer (tested in KG A) was also compared to other state-of-the-art methods, including edge2vec, GraphSAGE, ComplEX, DistMult, and TransE. Our results can be seen in Table 6 and revealed that rd-explainer outperformed all other methods based on the different evaluation metrics measured.

Table 6.

Prediction Performance Metrics Comparing rd-explainer with Other State-of-the-Art Graph Embedding Methods Including edge2vec, GraphSAGE, ComplEX, DistMult, and TransE.

Method	P	R	$F 1$	AUROC	AUPRC
edge2vec	0.90	0.90	0.90	0.98	0.97
GraphSAGE	0.71	0.65	0.62	0.64	0.87
ComplEX	0.84	0.76	0.74	0.95	0.99
DistMult	0.93	0.93	0.92	0.95	0.98
TransE	0.88	0.87	0.87	0.95	0.95
rd-explainer	0.94	0.94	0.94	0.98	0.98

Note. The best results are highlighted in bold. In the headings, P stands for precision, R for Recall, and $F 1$ for $F 1$ -score. AUROC = area under the ROC curve; AUPRC = area under the precision–recall curve.

4.4. Drug Predictions Validation Based on the Scientific Literature

We also evaluated the prediction performance based on the capacity of our method to discover marketed drugs already reported to be used for a new phenotype. First, we listed for each of the seven selected phenotypes the three drugs with the highest scores. Because the objective is to find new indications for drugs; if any of the reported drugs already appears in the graph as a treatment for the targeted symptom, this drug will be skipped and the next one with the highest score will be selected. For example, if aprindine is selected as the drug with the highest score to treat arrhythmia, but the relation “aprindine is a substance that treats arrhythmia” is already present in our graph, aprindine will not be reported as a possible drug candidate.

For each possible drug candidate, a literature search was performed to find preliminary evidence that that drug had already been used to treat the symptom. If the drug was contraindicated to treat the symptom (or if it could cause the symptom) it was also annotated. The results of each drug candidate obtained using KG A can be found in Table S9 in the Supplemental materials. Additionally, Table 7 summarizes the amount of drugs (in percentage) that contained supporting evidence, contraindication evidence, or no evidence at all. We found that only a fifth of drug candidates had supporting evidence in the literature, and that the vast majority of candidates (65.43%) did not have any evidence at all. There is a small percentage of them that are actually contraindicated to treat the targeted symptom/phenotype. Finally, the amount of supporting/contraindicating evidence can be found summarized in Table S7.

Table 7.
Percentage of Drugs Containing Supporting Evidence, Contraindication Evidence, or No Evidence at All for Both graphs A and B.

Property KG A (%) KG B (%)

Supporting evidence 20.99 27.16

Contraindication evidence 13.58 14.82

No evidence 65.43 58.02

Property	KG A (%)	KG B (%)
Supporting evidence	20.99	27.16
Contraindication evidence	13.58	14.82
No evidence	65.43	58.02

KG = knowledge graph.

The same approach was followed for KG B. Information regarding the drug candidates for each symptom (as well as supporting evidence) can also be found in Table S10 in the Supplemental materials. Additionally, the percentage of drugs with supporting evidence, contraindication evidence, or no evidence at all can be seen in Table 7. In this case, the number of drug candidates with evidence has increased relative to the drug candidates obtained with KG A (27% in B vs. 21% in A), and the number of drug candidates without evidence has been reduced (58% in B vs. 65% in A). The number of drug candidates with contraindications remains almost the same (13% in A vs. 14% in B).

4.5. Evaluating Drug Repurposing Explanations as Semantic Graphs

Evaluating an explanation is a tough task and many different benchmarks have recently appeared to evaluate them (Markus et al., 2021). In this work, we followed two different approaches to evaluate the explanations: a more subjective one, where the explanation was evaluated with our own biological knowledge; and a more objective one, where a manual literature search and curation was performed to check if the suggested explanation has already been reported. We selected seven phenotypes (muscular dystrophy, respiratory insufficiency, arrhythmia, dilated cardiomyopathy, congestive heart failure, progressive muscle weakness, and cognitive impairment) and their top three predictions, then explanations were produced from the models trained in both KGs. The selection of these phenotypes aimed to cover the diverse systems affected by the disease. Each explanation was analyzed and, if possible, compared to the one found in the literature.

Explanations were classified into complete and incomplete explanations. Complete explanations are those that show a connection (path) between the drug candidate and the targeted symptom/phenotype (Figure S3 in the Supplemental materials). They are considered complete because they allow for an easy human-understandable interpretation. However, incomplete explanations are those where the explanation is made up of two separate clusters (one for the drug and one for the phenotype) (Figure S4) or by a unique cluster where either the drug or the phenotype is missing (Figure S5).

The global analysis of the completeness of the explanations generated can be seen in Table 8 (amount of complete and incomplete explanations in each type of supporting evidence) and Table 9 (amount of supporting evidence in each type of explanation). This analysis was performed taking into account the explanations from both graphs. As can be seen in Table 8, in total the same number of complete and incomplete explanations were obtained (21 each). However, when looking at each category separately, it is seen that when there is evidence GNNExplainer tends to produce complete explanations (68%), and conversely when there is no supporting evidence or when the drug is contraindicated the resulting explanation is usually incomplete (62% and 70%, respectively). As can be seen in Table 9, when a complete explanation is created, almost two-thirds of the time the explanation contains supporting evidence (62%); while when the explanation is incomplete, only one-fourth of the time it contains supporting evidence (28%).

Table 8.
Number and Percentage of Complete and Incomplete Explanations in Each Evidence Type.

Complete Explanations Percentageof Complete Explanations(%) Incomplete Explanations Percentageof Incomplete Explanations(%)

Supporting evidence 13 68 6 32

Contraindication evidence 3 30 7 70

No evidence 5 38 8 62

Total 21 50 21 50

	Complete Explanations	Percentageof Complete Explanations(%)	Incomplete Explanations	Percentageof Incomplete Explanations(%)
Supporting evidence	13	68	6	32
Contraindication evidence	3	30	7	70
No evidence	5	38	8	62
Total	21	50	21	50

Table 9.

Number and Percentage of Explanations with No Evidence, with Supporting Evidence, and with Contraindications in Each Type of Explanation.

	Supporting Evidence	Percentage with Evidence(%)	Contraindication Evidence	Percentage with Contraindications(%)	No Evidence	Percentageof No Evidence(%)
Completeexplanations	13	62	3	14	5	24
Incompleteexplanations	6	28	7	33	8	38

An additional analysis was performed, this time considering each graph separately. This can be seen in Table 10 and Table S7 in the Supplemental materials. There is a clear difference between the explanations obtained in graphs A and B. First, KG A explanations are more likely to be complete (72% in A vs. 28% in B), while KG B produces more incomplete explanations (72% in B vs. 28% in A) (Table 10).

Table 10.

Number and Percentage of Complete and Incomplete Explanations in Each Evidence Type and in Each Graph.

		Complete Explanations		Incomplete Explanations
	Evidence Type	Number	Percentage	Number	Percentage
KG A	Supporting evidence	9	100	0	0
	Contraindication evidence	1	17	5	83
	No evidence	5	83	1	17
	Total	15	72	6	28
KG B	Supporting evidence	4	40	6	60
	Contraindication evidence	2	50	2	50
	No evidence	0	0	7	100
	Total	6	28	15	72

KG = knowledge graph.

An example of an explanation produced by rd-explainer can be seen in Figure 5. This explanation is classified as complete and suggests why doxorubicin should be considered to treat respiratory insufficiency; as it is a drug that targets CHRM1 a gene that interacts with DAG1, which causes the disease. Throughout this section, explanations have been classified into complete and incomplete. However, an explanation being complete does not make it a good explanation. This way, for example, an explanation of the type “Drug A targets Gene B, Gene B interacts with Gene C, and Gene C causes Disease D” can make biological sense such as in Figure 5. On the other hand, an explanation of the type “Drug A treats Disease B, Disease B is caused by Gene C, Gene C causes Disease D” does not make full biological sense (Drug A could treat Disease B by targeting a gene other than Gene C; this way, the same treatment could not be applied for Disease D). In fact this is what is observed in Figure S6 in the Supplemental materials, where disopyramide is said to treat muscular dystrophy following the next explanation: disopyramide treats urinary incontinence, affectation of the DMD gene can cause urinary incontinence, and DMD gene has muscular dystrophy as a phenotype. In this case, a person may have urinary incontinence for several reasons, and disopyramide may be able to treat one of them, but not necessarily the one caused by affectation in DMD gene.

Figure 5.

Explanation of drug candidate doxorubicin as possible treatment for respiratory insufficiency. Classified as complete explanation.

The objective evaluation is undoubtedly more unbiased and equitable. Nonetheless, subjective evaluations are also significant since there are drug–phenotype interactions that are not fully understood (especially when a certain drug is producing an undesired side effect), and so are not well established in the literature. However, analyzing the proposed explanations based on expert domain knowledge might shed light on the interaction and help to formulate a hypothesis that can be clearly designed to be tested in a wet laboratory.

After applying the objective evaluation, only one explanation (levosimendan—progressive muscle weakness) was found to have supporting evidence (where levosimendan treats the disease by increasing the troponin C affinity for calcium), and two links’ explanations (doxorubicin—respiratory insufficiency and sorafenib—respiratory insufficiency) contained unclear interactions (both were of type contraindications). The results after applying this evaluation can be found summarized in Table S6 in the Supplemental materials. Regarding subjective evaluations, 17 out of 21 explanations were found to be good explanations (they were in accordance with biological reasoning) such as the one illustrated by doxorubicin—respiratory insufficiency in Figure 5; and 4 were considered bad explanations (they did not make biological sense), the previously mentioned disopyramide—muscular dystrophy in Figure S6, and the explanations in Figures S7 to S9.

4.6. Generalizability of rd-explainer Tested on Other Case Studies

To show that this method can be extended to other rare diseases it was also tested in Alzheimer’s disease (AD) and amyotrophic lateral sclerosis (ALS) type 1. Although Alzheimer’s disease is not a rare disease, there are different types of Alzheimer with very little prevalence. This way, for the Alzheimer’s knowledge graph we used the general disease (MONDO:0004975) and all its causal genes that were present in Monarch (APP [HGNC:620], APOE [HGNC:613], PSEN1 [HGNC:9501], and PSEN2 [HGNC:9509]) as seeds. The final result would be a knowledge graph that specializes in Alzheimer’s diseases and that we can use to focus on the symptoms of the rare types of the disease. For the ALS type-1 knowledge graph we used the seed for the disease (MONDO:0007103) and the causal gene according to Monarch (SOD1 [HGNC:11179]). Table 11 shows the GNN performance in both diseases, again showing a high AUROC and AUPRC for these diseases.

Table 11.
Table Showing Different Performance Metrics Tested in AD and ALS.

Precision Recall $F 1$ -Score AUROC AUPRC

AD 0.95 0.95 0.95 0.98 0.97

ALS 0.94 0.93 0.94 0.97 0.97

	Precision	Recall	$F 1$ -Score	AUROC	AUPRC
AD	0.95	0.95	0.95	0.98	0.97
ALS	0.94	0.93	0.94	0.97	0.97

AD = Alzheimer’s disease; ALS = amyotrophic lateral sclerosis; AUROC = area under the ROC curve; AUPRC = area under the precision–recall curve.

The same approach that was followed for DMD was followed for both diseases: for each symptom we analyzed the three drug candidates with the highest score (these drug candidates should not appear in the knowledge graph); then a literature search was performed to check if the drug candidates had been reported by the scientific community. The complete list of phenotypes as well as drug candidates and scores for each phenotype can be found in Tables S11 and S12 in the Supplemental materials. These tables also contain whether drug candidates had supporting evidence in the literature.

Among the predictions, it is worth mentioning pexidartinib, a drug candidate that was proposed by the model to treat memory impairment in AD and currently undergoing a clinical trial as a drug that could potentially be beneficial to treat the disease (Ancidoni et al., 2021).

5. Discussion

We integrated disease-specific knowledge graphs in combination with GNN and XAI for interpretable drug repurposing. We found that state-of-the-art XAI methods based on GNNs support in silico predictions of candidate repurposable drugs for rare diseases by providing interpretable reasoning paths of mechanism of action. We developed rd-explainer, a method for performing computational drug repurposing specifically for rare diseases. It utilizes cutting-edge deep learning methods such as edge2vec and GNNs and provides drug–symptom/phenotype predictions with high-performance scores, and utilizes a modified version of GNNExplainer to provide explanations as semantic graphs for the interpretability of the results. We also found that these explanations have different levels of usefulness to generate testable hypotheses: paths linking drug and phenotype nodes are more understandable versus isolated clusters since they are similar to human reasoning; adding semantics to relations adds biological meaning to help to formulate a hypothesis and design the experiment in the laboratory; and providing clear semantic graphs by removing relations that are not contributors in the learning process. We tested the generalizability of our method by running it on two additional diseases: ALS and AD. ALS type 1 was selected to test the pipeline in another monogenic disease with fewer available information. AD was selected as it is a common disease with rare subtypes that can be caused by several genes, and we wanted to test the pipeline in a polygenic and multifactorial disease. We demonstrated that our pipeline performs well on mono- and polygenic rare diseases. rd-explainer is a researcher-centered drug repurposing method that has been demonstrated to be an innovative AI-based method for rare disease drug research. rd-explainer’s main advantage is its interpretability. The main motivation of this study was to provide explanations underlying AI predictions. rd-explainer provides explanations as semantic graphs, a type of explanation that resembles human reasoning. This is in line with current research on user-centric XAI (Wang et al., 2023). Not only does this have high value to support rare disease researchers to formulate evidence-based hypotheses testable in the wet laboratory (and reduce cost, time, and risk), but to gain new disease knowledge and speed up robust drug research. Our approach was to use state-of-the-art AI and XAI methods used in drug repurposing such as knowledge graphs to naturally represent known associations among biological entities with expressive semantics and supporting curated evidence, graph learning, and graph-based XAI methods. The advance in the field of rare diseases is that we provide interpretable predictions thanks to a pipeline that seamlessly integrates a graph learning model with an explainer, combining results of both model performance and explanation accuracy to mitigate the black-box problem and promote XAI adoption in the field (Borile et al., 2023a). BioKnowledge Reviewer tool provides rare disease-specific knowledge graphs for disease biology data collection using the Monarch knowledge base API (Queralt-Rosinach et al., 2020) and, thus, disease context. We argue that a tool or approach that can collect associations from a virtual, federated knowledge graph via APIs could extend this feature to any biomedical associations such as for drug data collection, and improve data and knowledge-driven research. Another great advantage of the rd-explainer method is its modular implementation; this means that different parts of the workflow (data, features, GNN, and explanations) can be independently modified and the pipeline can still be run. For example, if one is interested in using another node feature embedding algorithm instead of edge2vec, one can just modify that component of the pipeline and still run the rest of the workflow.

Our results showed that rd-explainer is a highly performant graph ML-based drug repurposing method. Our method builds rare disease-specific models trained in newly generated KG for the disease of focus and enriched with data for the prediction task. Compared with state-of-the-art AI-based drug repurposing approaches, rd-explainer demonstrates outstanding performance. Throughout this paper, we have compared rd-explainer with various AI methods that employ different techniques for their predictions, including GNNs such as GraphSAGE, random-walk embeddings such as edge2vec, and geometric embeddings using models like ComplEX, DistMult, and TransE. By combining random-walk models (edge2vec) with GNNs (GraphSAGE), rd-explainer achieves superior results in the link prediction task. In particular, edge2vec outperforms GraphSAGE, suggesting that the exceptional performance of rd-explainer is primarily attributed to the random-walk model, with the GNN providing an additional performance boost. This level of performance rivals other models developed for drug repurposing, such as deepDR (AUROC $=$ 0.908) (Zeng et al., 2019). Although there are benchmarks and frameworks to evaluate the performance of GNNs (Dwivedi et al., 2023; Hu et al., 2020; Li et al., 2024; Zhang et al., 2021; Zheng et al., 2024; Zhou et al., 2024), to the best of our knowledge there is no standard for drug repurposing, and this makes it challenging to directly compare rd-explainer with other methods due to one of its key features: the creation of high-quality disease-specific knowledge graphs. These knowledge graphs are enriched with data from a wide array of sources including domain expert knowledge via the seed nodes, and curated known relations among genes, anatomical structures, biological processes, and diseases not only from humans, but also importantly numerous other species to fill the lack of molecular knowledge. This comprehensive approach significantly boosts the graph’s richness and diversity, making it a valuable resource for tackling rare diseases, which often suffer from limited research attention. By maximizing the information available, rd-explainer enhances our ability to identify potential treatments for these understudied conditions and ultimately enable more effective and faster translation. In contrast, Huang et al. (2024) recently proposed a clinician-centered drug repurposing foundation model pretrained in a medical KG composed of 17,000 diseases and transfer learning based on disease mechanism similarity. It would be interesting to combine both approaches and investigate the effect of extending our KGs with similar disease networks from well-known diseases.

Our new predictions are valid drug candidates since they are consistent with recent findings in the literature. We demonstrated that rd-explainer can provide new interesting drug–phenotype predictions. For example, sunitinib, one of the drugs that appears to be a good candidate to treat disease symptoms according to both models (using KG A and KG B), has been considered a good drug candidate for treating DMD and in 2019 appeared to be in preclinical trials (Vitiello et al., 2019). This drug belongs to the group of tyrosine kinase inhibitors, and many other drugs that belong to this category have been proposed by our model (fedratinib, sorafenib, bosutinib, ruxolitinib, and midostaurin). Similarly, mezlocillin, an antibiotic used to treat Gram-negative bacterial infections, has also been proposed by our model; while gentamicin, another Gram-negative antibiotic, was in 2019 in clinical trials to treat DMD (Vitiello et al., 2019). This way, despite not producing drug candidates that are undergoing a clinical trial or treating the disease, it produces drug candidates that participate in similar biological processes (i.e., tyrosine kinases inhibitor, Gram-negative antibiotics).

Importantly, explanations for hypothesis generation may enable one to move toward a lab-in-the-loop framework. Regarding the interpretability and utility of the explanations, one of the 21 explanations examined was supported by evidence in the literature. Nonetheless, this does not mean that the explanations are useless. A good example of this would be the explanation for the methylprednisolone–muscular dystrophy link (Figure S10 in the Supplemental materials). The explanation is simple: “Methylprednisolone treats DMD, DMD has Muscular Dystrophy as a phenotype; thus, methylprednisolone can treat Muscular Dystrophy.” In this case, the explanation does not contain supporting evidence but the explanation still makes sense. In the literature, methylprednisolone is said to be a good candidate for the treatment of muscular dystrophies because it interacts with the glucocorticoid receptor and this leads to the activation of anti-inflammatory signaling and the inhibition of proinflammatory signaling (Quattrocelli et al., 2021). The explanation proposed by rd-explainer does not provide the underlying causative mechanism that relates methylprednisolone and muscular dystrophy, but a researcher can still see that muscular dystrophies and methylprednisolone are interrelated. This illustrates how even though an explanation may lack comprehensive supporting evidence, it can still provide valuable directional cues for further more precise investigation. Another important aspect is that rare disease findings in a lab can be introduced back in the knowledge graph to update and improve the disease-specific AI model for continual learning and enabling precise experimental design. In addition, this synergy fosters collaboration between computational and wet lab researchers to increase efficiency for disease-specific drug research (Queralt-Rosinach et al., 2020).

Finally, we found that knowledge graph topology has an impact on explainability. It was also seen that KG A usually produces more complete explanations, while in KG B incomplete explanations appear to be more numerous. This could happen due to the difference in the graph structure itself: graph A has a smaller clustering coefficient than graph B (see Section 4.1), which leads to more edges being present in the subgraphs produced by GNNExplainer. This way, because the 15th edges with the highest scores are selected, it is more likely to find a path between drug and phenotype in KG A than in KG B. Another interesting difference is that explanations generated with KG A tend to have a higher “sensitivity,” while explanations generated with KG B tend to have a higher “specificity.” When an incomplete explanation is produced using graph A it is very unlikely that the explanation will contain supporting evidence (0 explanations were found to have evidence if the explanation was incomplete in KG A). Similarly, when a complete explanation is produced in KG B, it is very likely that the explanations have supporting evidence or contraindication evidence (67% of complete explanations had supporting evidence and 33% of complete explanations had contraindication evidence). For this reason, if one remains skeptical about the explanations themselves, this quality of the explanations might be used as filter/validation. For example, if an incomplete explanation is obtained with KG A, it is unlikely that it is trustworthy (none of the incomplete explanations had supporting evidence). Similarly, if a complete explanation is obtained using KG B, it is likely that there is some interaction between the drug and the phenotype (all complete explanations generated with graph B had either supporting or contraindication evidence). Our findings are aligned with recent studies in which the influence of clustering coefficient and topology has been observed on embedding-based predictions (Gupta & Sardana, 2015; Robledo et al., 2022), here we extend these observations to its impact on graph-based explanations.

5.1. Limitations and Future Directions

An important limitation of this study is that we only utilized one XAI method, which is not model agnostic. XAI is a hot research topic in AI, where new and more sophisticated methods are frequently published (Saranya & Subhashini, 2023). It would be good to extend our study to other XAI types to check how applicable they are given the unique characteristics of rare diseases, including limited data, lack of knowledge, and lack of a gold drug–phenotype standard. Additionally, the data used in the pipeline is from 2021, so updating the pipeline on recent data in the future could further strengthen its applicability. Another important limitation is the lack of standard benchmarking and metrics to systematically evaluate explainers and explanations. Currently, some initial efforts are underway in this direction (Agarwal et al., 2022, 2023; Borile et al., 2023b; Daza et al., 2024; Fel et al., 2022; Yuan et al., 2023), but there is still a lack of a common standard (Alam et al., 2023). Another limitation is the known reproducibility issue of our explainer (Agarwal et al., 2023), which may imply that the explanations are different each time it is used, and may reduce the confidence and reliance on the explanations. We did several experiments to try and bring consistency to explanations; for example, executing GNNExplainer several times and using the mean mask as the final mask or increasing the number of epochs (results not shown). However, this did not solve the issue. This experience makes us strongly recommend working on the standard evaluation of explanations by the XAI community to foster trust in the application of AI in bioinformatics and biomedicine. Additionally, many times the explanation would consist of a subgraph in which the two targeted nodes would be disconnected from each other, which might bring confusion and could be seen as a “bad” explanation. Therefore, work toward methods that prioritize or focus on providing just connecting paths such as metapath-based ones (Fu et al., 2020; Himmelstein et al., 2017; Jiménez et al., 2024; Mayers et al., 2022; Noori et al., 2023) and improving path visualization for user interpretation (Himmelstein et al., 2022; Wang et al., 2021, 2023) is arguably recommended. Finally, while we focused primarily on integrating a graph ML model with an explainer, a clear line of research will be to work on interpretability and reproducibility of explanations in the context of the drug repurposing task. The reproducibility/inconsistency could be affected by the size and complexity of our data. This inconsistency could make the users of this pipeline skeptical about its explanations and, for this reason, more investigation should be conducted on this element of the pipeline to make it a more robust model. To improve this, ontologies could be incorporated into knowledge graphs to increase the quality and interpretability of our data. Ontologies help to standardize data into the shared meaning by a community enhancing thus interpretability by domain users. Importantly, the formal description of knowledge embedded in ontologies can be used to verify data consistency and to infer implicit knowledge into the graph (Alshahrani et al., 2017). Nonetheless, knowledge graph and ontology changes pose a great interoperability challenge to the community to keep up with downstream bioinformatics and data science workflows and analyses (Hegde et al., 2024; Unni et al., 2023). Finally, it would make our work more “FAIR” (Wilkinson et al., 2016), that is, not only understandable by humans but also by machines, by providing our drug repurposing for DMD KG from a FAIR data point (da Silva Santos et al., 2023), and rd-explainer from workflowHub (da Silva et al., 2020).

6. Conclusion

We present an application of XAI in state-of-the-art computational drug repurposing for rare diseases. Our knowledge graph-based deep learning method provides human understandable explanations for the drug–symptom/phenotype link prediction task and we demonstrated that graph XAI can be applied to rare diseases. The rd-explainer method is an innovative approach that can maximize available disease-specific knowledge and generates context-aware predictions with explanations. Our GNN-based method is highly performant and drug predictions are often supported by evidence. The key contribution of our study is that our method provides explanations in the form of semantic graphs that can help researchers of rare diseases make informed decisions to experimentally validate candidate drugs. However, we detected that data topology affects explanations, highlighting the importance of investigating further how best to represent graphical knowledge for robust model performance and explanation accuracy. rd-explainer is generalizable to other rare diseases and provides computer-aided guidance for biologists to accelerate translational research. Finally, future research should advance on necessary standard mechanisms to evaluate explainability and foster adoption by domain experts and to mitigate the black-box problem of trust on AI, especially for biomedicine where decisions can have an important impact on people’s lives.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the European Projects: BIND https://bindassociation.org/ Grant no 847826 and EJPRD https://www.ejprarediseases.org/ Grant no. 825575.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Code Availability

The code is freely accessible with an open MIT license at: .

Supplemental Material

Supplemental material for this article is available online.

ORCID iDs

Pablo Perdomo-Quinteiro

Katherine Wolstencroft

Marco Roos

Núria Queralt-Rosinach

Notes

References

Agarwal

Krishna

Saxena

Pawelczyk

Johnson

Puri

Zitnik

Lakkaraju

(2022). OpenXAI: Towards a transparent evaluation of post hoc model explanations. In Proceedings of the 36th international conference on neural information processing systems, NIPS ’22, Red Hook, NY, USA. Curran Associates Inc.

Agarwal

Queen

Lakkaraju

Zitnik

(2023, March). Evaluating explainability for graph neural networks. Scientific Data, 10, 1–18. https://doi.org/10.1038/s41597-023-01974-x

Alam

van Harmelen

Acosta

(2023, July). Towards semantically enriched embeddings for knowledge graph completion.

Al-Saleem

Granet

Ramakrishnan

Ciancetta

N. A.

Saveson

Gessner

Zhou

(2021). Knowledge graph-based approaches to drug repurposing for COVID-19. Journal of Chemical Information and Modeling, 61(8), 4058–4067. https://doi.org/10.1021/acs.jcim.1c00642

Alshahrani

Khan

M. A.

Maddouri

Kinjo

A. R.

Queralt-Rosinach

Hoehndorf

(2017, April). Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics, 33(17), 2723–2730. https://doi.org/10.1093/bioinformatics/btx275

Ancidoni

Bacigalupo

Remoli

Lacorte

Piscopo

Sarti

Corbo

Vanacore

Canevelli

(2021, December). Anticancer drugs repurposed for Alzheimer’s disease: A systematic review. Alzheimer’s Research & Therapy, 13, 96. https://doi.org/10.1186/S13195-021-00831-6

Avram

Bologa

C. G.

Holmes

Bocci

Wilson

T. B.

Nguyen

D.-T.

Curpan

Halip

Bora

Yang

J. J.

Knockel

Sirimulla

Ursu

Oprea

T. I.

(2021). DrugCentral 2021 supports drug discovery and repositioning. Nucleic Acids Research, 49, D1160–D1169. https://doi.org/10.1093/nar/gkaa997

Bordes

Usunier

Garcia-Duran

Weston

Yakhnenko

(2013). Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems (Vol. 26). Curran Associates, Inc. https://proceedings.neurips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html

Borile

Perotti

Panisson

(2023a). Evaluating link prediction explanations for graph neural networks. CoRR, abs/2308.01682. https://doi.org/10.48550/arXiv.2308.01682

10.

Borile

Perotti

Panisson

(2023b, August). Evaluating link prediction explanations for graph neural networks.

11.

CS224W Machine Learning with Graph—Home. http://web.stanford.edu/class/cs224w/

12.

da Silva

R. F.

Pottier

Coleman

Deelman

, Henri Casanova University of Southern California, & University of Hawai’i at Manoa. (2020 Workflowhub: Community framework for enabling scientific workflow research and development. 2020 IEEE/ACM workflows in support of large-scale science (WORKS) (pp. 49–56). https://api.semanticscholar.org/CorpusID:221396881

13.

da Silva Santos

L. O. B.

Burger

Kaliyaperumal

Wilkinson

M. D.

(2023, December). Fair data point: A fair-oriented approach for metadata publication. Data Intelligence, 5, 163–183. https://doi.org/10.1162/dint_a_00160

14.

Daza

Chu

C. X.

Tran

T.-K.

Stepanova

Cochez

Groth

(2024, July). Explaining graph neural networks for node similarity on graphs.

15.

Dwivedi

V. P.

Joshi

C. K.

Luu

A. T.

Laurent

Bengio

Bresson

(2023). Benchmarking graph neural networks. Journal of Machine Learning Research, 24(43), 1–48.

16.

DeepSNAP documentation—DeepSNAP 0.2.0 documentation. https://snap.stanford.edu/deepsnap/

17.

Fel

Hervier

Vigouroux

Poche

Plakoo

Cadene

Chalvidal

Colin

Boissin

Bethune

Picard

Nicodeme

Gardes

Flandin

Serre

(2022). Xplique: A deep learning explainability toolbox.

18.

Ferrari

Frisoni

Italiani

Moro

Sartori

(2022, November). Comprehensive analysis of knowledge graph embedding techniques benchmarked on link prediction. Electronics, 11, 3866. https://doi.org/10.3390/ELECTRONICS11233866

19.

Zhang

Meng

King

(2020, April). MAGNN: Metapath aggregated graph neural network for heterogeneous graph embedding. The web conference 2020—Proceedings of the world wide web conference, WWW 2020 (pp. 2331–2341). https://doi.org/10.1145/3366423.3380297 .

20.

Gao

Ouyang

Tsutsui

Liu

Yang

Gessner

Foote

Wild

Ding

(2019). edge2vec: Representation learning using edge semantics for biomedical knowledge discovery. BMC Bioinformatics, 20(1), 306. https://doi.org/10.1186/s12859-019-2914-2

21.

Goodman

Flaxman

(2017, September). European Union regulations on algorithmic decision-making and a “Right to Explanation”. AI Magazine, 38(3), 50–57. https://doi.org/10.1609/aimag.v38i3.2741

22.

Grover

Leskovec

(2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16, New York, NY, USA (pp. 855–864). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939754

23.

Guney

Menche

Vidal

Barábasi

A.-L.

(2016). Network-based in silico drug efficacy screening. Nature Communications, 7(1), 10331. https://doi.org/10.1038/ncomms10331

24.

Gupta

A. K.

Sardana

(2015, August). Significance of clustering coefficient over Jaccard index. In 2015 eighth international conference on contemporary computing (IC3). IEEE.

25.

Haendel

Vasilevsky

Unni

Bologa

Harris

Rehm

Hamosh

Baynam

Groza

McMurry

Dawkins

Rath

Thaxon

Bocci

Joachimiak

M. P.

Köhler

Robinson

P. N.

Mungall

Oprea

T. I.

(2020). How many rare diseases are there? Nature Reviews Drug Discovery, 19(2), 77–78. https://doi.org/10.1038/d41573-019-00180-y

26.

Hamilton

W. L.

Ying

Leskovec

(2017, June). Inductive representation learning on large graphs. Advances in Neural Information Processing Systems, 2017, 1025–1035. https://doi.org/10.48550/arxiv.1706.02216

27.

Hegde

Vendetti

Goutte-Gattat

Caufield

J. H.

Graybeal

J. B.

Harris

N. L.

Karam

Kindermann

Matentzoglu

Overton

J. A.

Musen

M. A.

Mungall

C. J.

(2024). A change language for ontologies and knowledge graphs.

28.

Himmelstein

D. S.

Lizee

Hessler

Brueggeman

Chen

S. L.

Hadley

Green

Khankhanian

Baranzini

S. E.

(2017, September). Systematic integration of biomedical knowledge prioritizes drugs for repurposing. eLife, 6, e26726.

29.

Himmelstein

D. S.

Zietz

Rubinetti

Kloster

Heil

B. J.

Alquaddoomi

Nicholson

D. N.

Hao

Sullivan

B. D.

Nagle

M. W.

Greene

C. S.

(2022, December). Hetnet connectivity search provides rapid insights into how biomedical entities are related. GigaScience, 12, giad047.

30.

Hsieh

Wang

Chen

Zhao

Savitz

Jiang

Tang

Kim

(2021, November). Drug repurposing for COVID-19 using graph neural network and harmonizing multiple evidence. Scientific Reports, 11, 1–13. https://doi.org/10.1038/s41598-021-02353-5

31.

Fey

Zitnik

Dong

Ren

Liu

Catasta

Leskovec

(2020). Open graph benchmark: Datasets for machine learning on graphs. In Proceedings of the 34th international conference on neural Information processing systems, NIPS ’20, Red Hook, NY, USA. Curran Associates Inc.

32.

Huang

Chandak

Wang

Havaldar

Vaid

Leskovec

Nadkarni

G. N.

Glicksberg

B. S.

Gehlenborg

Zitnik

(2024, September). A foundation model for clinician-centered drug repurposing. Nature Medicine, 30, 3601–3613.

33.

Huang

Yamada

Tian

Singh

Chang

(2023). Graphlime: Local interpretable model explanations for graph neural networks. IEEE Transactions on Knowledge and Data Engineering, 35(7), 6968–6972. https://doi.org/10.1109/TKDE.2022.3187455

34.

Jiménez

Merino

M. J.

Parras

Zazo

(2024, July). Explainable drug repurposing via path based knowledge graph completion. Scientific Reports, 14(1), 16587.

35.

Junuthula

R. R.

, Xu

K. S.

Devabhaktuni

V. K.

(2016, July). Evaluating link prediction accuracy on dynamic networks with added and removed edges. In Proceedings—2016 IEEE international conferences on big data and cloud computing (pp. 377–384). https://doi.org/10.1109/BDCloud-SocialCom-SustainCom.2016.63

36.

Kim

S. Y.

(2023, June). Personalized explanations for early diagnosis of Alzheimer’s disease using explainable graph neural networks with population graphs. Bioengineering, 10, 701. https://doi.org/10.3390/BIOENGINEERING10060701/S1

37.

Shomer

Mao

Zeng

Shah

Tang

Yin

(2024). Evaluating graph neural networks for link prediction: Current pitfalls and new benchmarking. In Proceedings of the 37th international conference on neural information processing systems, NIPS ’23, Red Hook, NY, USA. Curran Associates Inc.

38.

Liaw

Liang

Nishihara

Moritz

Gonzalez

J. E.

Stoica

(2018). Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118.

39.

Luo

Zhao

Cheng

Han

Liu

Chen

Zhang

(2024). Towards inductive and efficient explanations for graph neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence.

40.

Markus

A. F.

Kors

J. A.

Rijnbeek

P. R.

(2021, January). The role of explainability in creating trustworthy artificial intelligence for health care: A comprehensive survey of the terminology, design choices, and evaluation strategies. Journal of Biomedical Informatics, 113, 103655. https://doi.org/10.1016/J.JBI.2020.103655

41.

Mayers

Steinecke

T. S.

Queralt-Rosinach

A. I.

(2022, April). Design and application of a knowledge network for automatic prioritization of drug mechanisms. Bioinformatics, 38(10), 2880–2891.

42.

Monarch Initiative Explorer. https://monarchinitiative.org/

43.

Noori

M. M.

Tan

A. L. M.

Zitnik

(2023, May). Metapaths: Similarity search in heterogeneous knowledge graphs via meta-paths. Bioinformatics, 39(5), btad297.

44.

NetworkX—NetworkX documentation. https://networkx.org/

45.

Pezeshkpour

Tian

Singh

(2019). Investigating robustness and interpretability of link prediction via adversarial modifications. arXiv:1905.00563. https://doi.org/10.48550/arXiv.1905.00563 .

46.

Pfeifer

Saranti

Holzinger

(2022, September). GNN-SubNet: Disease subnetwork detection with explainable graph neural networks. Bioinformatics, 38, ii120–ii126. https://dx.doi.org/10.1093/bioinformatics/btac478

47.

Quattrocelli

Zelikovich

A. S.

Salamone

I. M.

Fischer

J. A.

McNally

E. M.

(2021). Mechanisms and clinical applications of glucocorticoid steroids in muscular dystrophy. Journal of Neuromuscular Diseases, 8, 39. https://doi.org/10.3233/JND-200556

48.

Queralt-Rosinach

Stupp

G. S.

T. S.

Mayers

Hoatlin

M. E.

Might

Good

B. M.

A. I.

(2020). Structured reviews for data and knowledge-driven research. Database (Oxford), 2020, baaa015. https://doi.org/10.1093/database/baaa015

49.

Ribeiro

M. T.

Singh

Guestrin

(2016). “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16 (pp. 1135–1144). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939778

50.

Robledo

O. F.

Zhan

X.-X.

Hanjalic

Wang

(2022, December). Influence of clustering coefficient on network embedding in link prediction. Applied Network Science, 7(1), 35.

51.

Rare diseases. https://ec.europa.eu/health/non-communicable-diseases/steering-group/rare-diseases_en

52.

Sadeghi

Ngom

(2022, July). An integrative heterogeneous graph neural network-based method for multi-labeled drug repurposing. Frontiers in Pharmacology, 13, 908549. https://doi.org/10.3389/FPHAR.2022.908549/BIBTEX

53.

Saranya

Subhashini

(2023, June). A systematic review of explainable artificial intelligence models and applications: Recent developments and future trends. Decision Analytics Journal, 7, 100230. https://doi.org/10.1016/J.DAJOUR.2023.100230

54.

Sun

Gao

Zhou

(2022, July). Why 90% of clinical drug development fails and how to improve it? Acta Pharmaceutica Sinica B, 12, 3049. https://doi.org/10.1016/j.apsb.2022.02.002

55.

Sun

Lin

Zheng

Ren

Tao

Wang

Zhao

Bai

Wang

Huang

Chen

(2023, December). A graph neural network-based interpretable framework reveals a novel DNA fragility–associated chromatin structural unit. Genome Biology, 24, 90. https://doi.org/10.1186/S13059-023-02916-X

56.

Szigyarto

C. A.-K.

Spitali

(2018, January). Biomarkers of Duchenne muscular dystrophy: Current findings. Degenerative Neurological and Neuromuscular Disease, 8, 1–13. https://doi.org/10.2147/DNND.S121099

57.

Trouillon

Welbl

Riedel

Gaussier

Bouchard

(2016). Complex embeddings for simple link prediction. In M. F. Balcan, & K. Q. Weinberger (Eds.), Proceedings of the 33rd international conference on machine learning. Proceedings of machine learning research (Vol. 48, pp. 2071–2080), New York, New York, USA, 20–22 June 2016. PMLR. https://proceedings.mlr.press/v48/trouillon16.html

58.

Unni

Touré

Krauss

Crameri

Österle

(2023, December). SPHN strategy to unravel the semantic drift between versions of standard terminologies.

59.

Vitiello

Tibaudo

Pegoraro

Bello

Canton

(2019). Teaching an old molecule new tricks: Drug repositioning for Duchenne muscular dystrophy. International Journal of Molecular, 20(23), 6053. https://doi.org/10.3390/ijms20236053

60.

Wang

Huang

Chandak

Gehlenborg

Zitnik

(2021). Interactive visual explanations for deep drug repurposing. Retrieved October 8, 2024, from https://icml.cc/virtual/2021/workshop/8358

61.

Wang

Huang

Chandak

Zitnik

Gehlenborg

(2023). Extending the nested model for user-centric XAI: A design study on GNN-based drug repurposing. IEEE Transactions on Visualization and Computer Graphics, 29(1), 1266–1276. https://doi.org/10.1109/TVCG.2022.3209435

62.

Wilkinson

M. D.

Dumontier

Aalbersberg

I. J.

Appleton

Axton

Baak

Blomberg

Boiten

J. W.

da Silva Santos

L. B.

Bourne

P. E.

Bouwman

Brookes

A. J.

Clark

Crosas

Dillo

Dumon

Edmunds

Evelo

C. T.

Finkers

Mons

(2016, March). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(1), 1–9. https://doi.org/10.1038/sdata.2016.18

63.

Xie

Huang

Saha

Ananiadou

(2022, October). GRETEL: Graph contrastive topic enhanced language model for long document extractive summarization. In N. Calzo Lari, C.-R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K.-S. Choi, P.-M. Ryu, H.-H. Chen, L. Donatelli, H. Ji, S. Kurohashi, P. Paggio, N. Xue, S. Kim, Y. Hahm, Z. He, T. K. Lee, E. Santus, F. Bond, & S.-H. Na (Eds.), Proceedings of the 29th international conference on computational linguistics, Gyeongju, Republic of Korea (pp. 6259–6269). International Committee on Computational Linguistics. https://aclanthology.org/2022.coling-1.546/

64.

Yang

Yih

Gao

Deng

(2014). Embedding entities and relations for learning and inference in knowledge bases. In International conference on learning representations. https://api.semanticscholar.org/CorpusID:2768038

65.

Yang

Lichtenwalter

R. N.

Chawla

N. V.

(2014, October). Evaluating link prediction methods. Knowledge and Information Systems, 45, 751–782. https://doi.org/10.1007/S10115-014-0789-0

66.

Ying

Bourgeois

You

Zitnik

Leskovec

(2019). GNNExplainer: Generating explanations for graph neural networks. Curran Associates Inc.

67.

Yuan

Gui

(2023, May). Explainability in graph neural networks: A taxonomic survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 5782–5799.

68.

Yue

Wang

Huang

Parthasarathy

Moosavinasab

Huang

Lin

S. M.

Zhang

Sun

(2020, February). Graph embedding on biomedical networks: Methods, applications and evaluations. Bioinformatics, 36, 1241–1251. https://doi.org/10.1093/bioinformatics/btz718

69.

Zeng

Zhu

Liu

Zhou

Nussinov

Cheng

(2019, December). deepDR: A network-based deep learning approach to in silico drug repositioning. Bioinformatics, 35, 5191–5198. https://dx.doi.org/10.1093/bioinformatics/btz418

70.

Zhang

Xie

(2022, December). Graph regularized non-negative matrix factorization with prior knowledge consistency constraint for drug–target interactions prediction. BMC Bioinformatics, 23(1), 564. https://doi.org/10.1186/S12859-022-05119-6

71.

Zhang

Sheng

Jiang

Xia

Gao

Yang

Cui

(2021). Evaluating deep graph neural networks.

72.

Zhang

Lei

Pan

F. X.

(2022, May). Drug repositioning with GraphSAGE and clustering constraints based on drug and disease networks. Frontiers in Pharmacology, 13, 872785. https://doi.org/10.3389/FPHAR.2022.872785/BIBTEX

73.

Zheng

Zhang

Chen

Molaei

Zhou

Pan

(2024). GNNEvaluator: Evaluating GNN performance on unseen graphs without labels. In Proceedings of the 37th international conference on neural information processing systems, NIPS ’23, Red Hook, NY, USA. Curran Associates Inc.

74.

Zhou

Zhang

Lian

Wang

Zhu

Qiu

Chen

(2022). Therapeutic Target Database update 2022: Facilitating drug discovery with enriched comparative data of targeted agents. Nucleic Acids Research, 50, D1398–D1407. https://doi.org/10.1093/nar/gkab953

75.

Zhou

Mao

Zhou

Chen

Tan

Zha

Feng

Chen

Wang

(2024). OpenGSL: A comprehensive benchmark for graph structure learning. In Proceedings of the 37th international conference on neural information processing systems, NIPS ’23, Red Hook, NY, USA. Curran Associates Inc.

Knowledge Graphs and Explainable Artificial Intelligence (AI) for Drug Repurposing on Rare Diseases

Abstract

Keywords

1. Introduction

2. Related Work

2.1. Knowledge Graph-Based Drug Repurposing

3.1. rd-explainer Method Overview

3.2.1. Data Sources

3.2.2. Knowledge Graph Construction

3.3. ML Model and XAI

3.3.1. Node Features

3.3.2. Data Splitting

3.3.3. GNN Model

3.3.4. Drug–Phenotype Link Predictions

3.3.5. Graph-Based Prediction Explanations

3.4. Evaluation and Metrics

3.4.1. Evaluation of GNN Model

3.4.2. Evaluation of Explanations

4. Results

4.1. Rare Disease KG Topology and Representation for Drug Repurposing

Table 5. Performance as the Number of Negative Edge Samples Increases. Number of Negative Edges Precision Recall F 1 -Score AUROC AUPRC 1 0.92 0.92 0.92 0.97 0.97 5 0.86 0.92 0.89 0.97 0.90 10 0.87 0.90 0.89 0.97 0.84 20 0.69 0.92 0.75 0.96 0.74

Table 7. Percentage of Drugs Containing Supporting Evidence, Contraindication Evidence, or No Evidence at All for Both graphs A and B. Property KG A (%) KG B (%) Supporting evidence 20.99 27.16 Contraindication evidence 13.58 14.82 No evidence 65.43 58.02

Table 11. Table Showing Different Performance Metrics Tested in AD and ALS. Precision Recall F 1 -Score AUROC AUPRC AD 0.95 0.95 0.95 0.98 0.97 ALS 0.94 0.93 0.94 0.97 0.97

5.1. Limitations and Future Directions

6. Conclusion

Funding

Footnotes

Declaration of Conflicting Interests

Code Availability

Supplemental Material

ORCID iDs

Notes

References

Table 5.
Performance as the Number of Negative Edge Samples Increases.

Number of Negative Edges Precision Recall $F 1$ -Score AUROC AUPRC

1 0.92 0.92 0.92 0.97 0.97

5 0.86 0.92 0.89 0.97 0.90

10 0.87 0.90 0.89 0.97 0.84

20 0.69 0.92 0.75 0.96 0.74

Table 7.
Percentage of Drugs Containing Supporting Evidence, Contraindication Evidence, or No Evidence at All for Both graphs A and B.

Property KG A (%) KG B (%)

Supporting evidence 20.99 27.16

Contraindication evidence 13.58 14.82

No evidence 65.43 58.02

Table 11.
Table Showing Different Performance Metrics Tested in AD and ALS.

Precision Recall $F 1$ -Score AUROC AUPRC

AD 0.95 0.95 0.95 0.98 0.97

ALS 0.94 0.93 0.94 0.97 0.97