Abstract
Background
Drug repurposing is a cost-effective strategy to identify drugs with novel effects. We searched for drugs exhibiting inhibitory activity to Herpes Simplex virus 1 (HSV-1). Our strategy utilized gene expression data generated from HSV-1-infected cell cultures which was paired with drug effects on gene expression. Gene expression data from HSV-1 infected and uninfected neurons were analyzed using BaseSpace Correlation Engine (Illumina®). Based on the general Signature Reversing Principle (SRP), we hypothesized that the effects of candidate antiviral drugs on gene expression would be diametrically opposite (negatively correlated) to those effects induced by HSV-1 infection.
Results
We initially identified compounds capable of inducing changes in gene expression opposite to those which were consequent to HSV-1 infection. The most promising negatively correlated drugs (Valproic acid, Vorinostat) did not significantly inhibit HSV-1 infection further in African green monkey kidney epithelial cells (Vero cells). Next, we tested Sulforaphane and Menadione which showed effects similar to those caused by viral infections (positively correlated). Intriguingly, Sulforaphane caused a modest but significant inhibition of HSV-1 infection in Vero cells (IC50 = 180.4 µM,
Conclusions
These results reveal the limits of the commonly used SRP strategy when applied to the identification of novel antiviral drugs and highlight the necessity to refine the SRP strategy to increase its utility.
Background
Herpes Simplex Virus 1 (HSV-1) infection remains a massive public health problem. HSV-1 causes common infections as well as more severe morbidities including blindness and encephalitis. Encephalitis can lead to long term neurological sequel and/or death, even if treated with current antivirals. HSV-1 has the ability to cause recurrent infections at the same sites as a result of its reactivation from the neuronal latent state that was established upon the initial infection. Latency is a poorly understood, untreatable process that is a dynamic interplay between the maintenance of virus DNA with a silenced or limited gene expression program and the host cell chromatin remodeling processes. Currently, only the lytic productive infection can be treated with antivirals, despite vigorous efforts to attempt to target the latent state. The most effective drugs are derivatives of Acyclovir (ACV) and Pencyclovir, and include esterified prodrugs, Valacyclovir (VAL) and Famvir, which have higher oral bioavailability. These drugs inhibit lytic infections by acting as nucleoside analogues that are selectively activated in virus-infected cells and then block viral DNA replication. However, breakthrough infections can still occur, particularly among long-term prophylactic antiviral receivers, which may lead to the development of ACV resistant mutants. Mutations arising in the viral thymidine kinase (TK), and/or in the DNA polymerase cause changes in the affinity for the antiviral resulting in resistancy. Importantly, TK is required for the growth of HSV in neurons during reactivation; therefore, TK mutations are selected against. Higher prevalence of HSV infections accompanied with decreased response to ACV is observed in immunocompromised patients, including HIV-positive patients. 1 The second line agent for ACV resistant infections is foscarnet, which requires intravenous administration and is considerably more toxic than ACV or VAL. Currently, no antivirals are effective at eradicating latent infection or removing the latent human reservoir that gives rise to recurrent disease. Several strategies targeting the latent genome for cleavage using meganucleases or gRNA directed CRSPR-cas9 are under development but they are far from clinical trial evaluation. The search for new antivirals that act differently from ACV remains a paramount goal and an urgent public health necessity.
Compound or drug repurposing (also known as repositioning or reprofiling) refers to the process of developing an existing compound/drug for new use beyond its predetermined indication. Compared to the traditional process of developing an entirely new drug for a specific purpose, repurposing bears lower risk of failure, especially concerning safety, and has much more rapid return on investment. 2 For instance, using the traditional method, DiMasi et al. searched commercial databases to identify 1,914 investigational compounds between 1995 and 2017, intended for treating a wide array of infectious diseases. 3 Of those, 1,323 entered clinical testing. After an average of 100 months of testing, only 20.7% were finally approved. While almost half of them are still in active clinical testing, 29.9% of them have since been abandoned. 3 If a compound is approved by the drug regulatory authorities for one indication and is found to be effective for another indication by reliable repurposing processes, the prior approval enables investigators to bypass pre-clinical studies, and phase I clinical trials. As such, the time to apply for the approval of a novel clinical use is greatly shortened. 4 Even if a compound is not approved by regulatory authorities, the source platform or pharmacological database provides us with copious physicochemical information and correlated experiments that may have already been performed. Pre-clinical cost per approved drug on average is estimated to comprise of 32% of total out-of-pocket costs. Moreover, considering that the average time from synthesis to initial human testing is 31.2 months, 5 repurposing investigational compounds is extremely cost-effective.
Drug repurposing can be opportunistic or serendipitous. There are some typical successful examples of off-target effects or newly discovered on-target effects, including the use of sildenafil for erectile dysfunction, 6 and thalidomide for multiple myeloma. 7 However, with rapidly developing computational approaches to probe large amounts of drug-related research data that is available, systematic methods can be used to identify reliable target compounds in an organized, data-driven way. 8 Considering the urgent needs for new broad-spectrum anti-viral agents, incorporative drug repurposing methods can serve as an approach to explore novel, anti-viral drugs in a cost-efficient way. Various successful discoveries of novel, anti-viral drugs have been achieved through drug repurposing approaches. 9 However, there are possible pitfalls of this approach in that: i) while failures due to toxicity are less likely to happen with pre-characterization, failures at late stages of clinical trials are not unusual. 10 Such late stage failures might result from limited primary screening methods (concentrations, delivery method, etc.) and/or the cell types/model systems employed to determine toxicity. Other concerns include: i) for compounds with poly-pharmacology, different indications may require different dosages; ii) once a compound failed for one desirable purpose, significant resources are going to be likely required for re-profiling the compound, potential lag time risk might also develop during this period; 8 iii) repurposing processes could be restricted by legal and regulatory forces in concern of exclusivity and marketing secure because repurposing focuses on existing agents. 11
Despite these pitfalls, the obvious benefit from drug-repurposing is well recognized and prompts the need to develop systematic approaches to investigate potential candidates. In this study, we propose a new strategy to generate anti-viral candidates based on the comparison of gene profile datasets by applying a platform called BaseSpace Correlation Engine (Illumina®). Gene profile changes of human induced pluripotent stem cell (hiPSC)-derived neurons after exposure to HSV-1 infection were chosen as inputs for comparison, and a set of compounds that induce opposite or similar gene expression changes were selecte as contrasts. We started from two negatively correlated compounds based on the traditional signature reversion principle (SRP), which assumes that if a drug could reverse the expression pattern of a set of hallmark genes for a particular disease phenotype, the drug might be able to reverse the phenotype itself. 10 While this principle has been successfully applied to various therapeutic scenarios,12–16 in the case of virus infections, it could be argued that the gene expression effects in cell culture models of infection represent host-defense systems. Therefore, we also investigated drugs with similar gene expression patterns to those observed in the infection cell culture models. We next investigated the antiviral efficacy of the putative antiviral drugs in two cell culture models. Finally, we investigated mechanisms of action for Sulforaphane (SFN), the most promising candidate by studying its effects on canonical pathways; we also compared the upstream regulators suggested in its most significantly correlated bioset to those responsible for gene profile changes in our original input dataset.
Results
Candidates from negatively and positively correlated compounds
2339 negatively and 3734 positively correlated biosets were ranked based on the significance of correlation (i.e. the rank

The process of applying BaseSpace Correlation Engine to find potential compounds. hiPSC: human induced pluripotent stem cell; MOI: Multiplicity of infection; FC: fold change.
Top 10 negatively (
GSE: Series number from Gene Expression Omnibus; P: significant level for overlapped genes provided by Correlation Engine; (−)/(+): negative/positive correlations.
The values shown in bold in table 1 are name strings of datasets (GSE datasets) and not data. The names of compounds tested in are in bold. For each GSE dataset, the significant level of overlapping genes were already listed in column 3 and column 7.
To verify the feasibility and reliability of this novel bioinformatic strategy, we started from negatively correlated compounds based aforementioned SRP for subsequent
Negatively correlated valproic acid and vorinostat both failed to produce anti-HSV effect in vero cell culture even at low MOI
Valproic acid is an anti-convulsant medication, whose mechanism is not fully understood but has traditionally been attributed to the blockade of voltage-gated sodium channels and increased brain levels of gamma-aminobutyric acid (GABA). Vorinostat is a histone deacetylase inhibitor for the treatment of cutaneous T cell lymphoma. These two candidates were applied to treat Vero cell culture acutely infected by HSV-1 at MOI: 0.3, with concentration range from 0.0005 µM to 50 µM. EGFP positive cell percentage (EGFP+%), driven by viral immediate early gene promoter ICP0 was acquired via flow cytometry. Fourty-eight hours of treatment with 5 µM of the anti-HSV-1 drug Acyclovir decreased EGFP+% to 47.87%, compared to infected group without ACV (97.15%) (Figure 2(c)). This was achieved without causing any apparent cell death, indicated by live cell percentage 99.3% of uninfected cells after being cultured with 50 µM ACV for the same time period (Figure 2(d)).

Effects of negatively correlated compounds towards HSV-1 acute infection compared with Acyclovir. Vero cells were infected with HSV-1 at MOI = 0.3. After inoculated with compounds from 0.0005 to 50 µM for 48 hours, percentages of total EGFP positive (EGFP+) cells were acquired via flowcytometry: (a) Valproic acid; (b) Vorinostat; (c) Acyclovir. (d) Live cell percentages of uninfected cells after cultured with same concentrations of compounds for 48 hours.
Though Valproic acid was not toxic to Vero cell culture at 50 µM (Figure 2(d)), it did not decrease EGFP+% at the highest concentration tested (Figure 2(a)). Vorinostat showed some toxicity on uninfected Vero cells at concentrations higher than 5 µM (live cell percentage 19.63%, Figure 2(d)). No significant anti-viral effects of Vorinostat were observed at lower concentrations (Figure 2(b)).
Positively correlated sulforaphane achieved moderate but significant anti-HSV effect in vero cell culture
Next, we selected the top significantly positively correlated compound Sulforaphane (SFN, a plant extract widely used as nutraceutical). We also selected Menadione which is also known as vitamin K3, based on aforementioned criteria. In HSV-1 infected Vero cells (MOI of 1), 72 hours treatment with compounds revealed that Menadione failed to show antiviral activity to HSV-1 lytic infection, even at 50 µM with EGFP+ cell percentage 96.8% (Figure 3(b)). In contrast, 50 µM sulforaphane treatment resulted in a moderate but statistically significant reduction of infected cells percentage compared to untreated cultures (from 94.5% to 82.1%,

Effects of positively correlated compounds towards HSV-1 acute infection compared with Acyclovir. Vero cells were infected with HSV-1 at MOI = 1. After inoculated with compounds from 5 to 50 µM for 72 hours, percentages of total EGFP positive (EGFP+) cells were acquired via flowcytometry: (a) sulforaphane; (b) Menadione; (c) Live cell percentages of uninfected cells after cultured with 50 µM of compounds for 4 days; (d) Acyclovir; (e) Half-maximal inhibitory concentration of Sulforaphane was calculated using non-linear fit with variable slope; (f) Half-maximal inhibitory concentration of Acyclovir was calculated using non-linear fit with variable slope.
Toxicity overode the anti-viral effect of sulforaphane when applied to human NPC cell culture
A range of concentrations (0.5 nM ∼ 500 µM) of SFN and ACV were applied to treat HSV-1 acutely infected two-dimensional cultures of neural progenitor cells derived from human pluripotent stem cells. ACV was still protective in NPC cultures with a decrease of EGFP+ percentage to 25.61% at 5 µM, while EGFP+ cells were barely observed under microscope at 50 µM (FC showed 1.02%), (Figure 4(b),

Human iPSC-derived NPCs were cultured with compounds overnight and then infected with HSV-1 at MOI = 0.1 with the presence of compounds. After inoculated with compounds from 0.5 nM to 500 µM for 72 hours, percentages of total EGFP positive (EGFP+) cells and cell viability were acquired via flowcytometry: (a) Live cell percentages of uninfected cells after cultured with different concentrations of compounds for 4 days; (b) Total EGFP+ percentage, left: Acyclovir, right: sulforaphane; (c) Half-maximal lethal concentration of Sulforaphane was calculated using non-linear fit with variable slope; (d) Half-maximal inhibitory concentration of Acyclovir was calculated using non-linear fit with variable slope.
Upstream regulator analysis using ingenuity pathway analysis
To cross-validate the correlation between gene expression changes observed in our input dataset and those resulted from sulforaphane, the most significantly correlated bioset from sulforaphane related studies was chosen (Table 1.
4508 matching genes from both our input dataset and sulforaphane dataset were identified by Correlation Engine, to demonstrate the strongly correlation between these two datasets, here we only reported the exact up- and down-regulated genes with absolute fold changes higher than or equal to 3. Their original ranks as well as their functions indicated by Correlation Engine are also demonstrated in the table (Table 2).
Up- and down-regulated genes from both our input bioset and sulforaphane bioset with |fold change| ≥ 3.
All matching genes identified from both biosets were input into IPA for canonical pathway analysis. Among the top 10 significant canonical pathways identified from our dataset, 6 of them are also significantly (

Ingenuity pathway analysis. (a) top 10 significantly altered pathways identified from our dataset, compared to their corresponding significant levels when analyzed with SFN dataset; B) list of the 15 upstream transcriptional regulators that are significantly altered (p < 0.001) in both biosets; (c) brief functional analysis of the 15 regulators from (b), suggested by IPA.
Discussion
We tested a simplistic repurposing strategy to identify novel agents with potential antiviral (more specifically anti-HSV-1) characteristics based on BaseSpace® Correlation Engine developed by Illumina®. We identified 10 compounds that induced opposite or similar changes in the gene expression profile compared to what we had observed in HSV-1 infected neurons from previous work. Importantly, we tested the anti-HSV-1 activity of the compounds selected
Most of the drug-repurposing strategies use an initial data-driven
Strict filtration processes were applied to the exported list of compounds and their corresponding biosets (Figure 1). We considered it crucial to acquire specific biosets and check its original design and experimental processes in detail. We identified several compounds significantly correlated with the gene profile changes noted in HSV-1 infected neurons (Top 10 listed in Table 1). Further sorting was performed manually (Figure 1,
The failure of negatively correlated compounds to show antiviral effects suggests that relying simply on the signature reversion principle might not always lead to the desired novel antiviral compounds, especially when it comes to developing antiviral agents using transcriptomics data. Our partial experimental validation for sulforaphane, a positively correlated drug further support this assumption. Sulforaphane is a phytochemical extracted from cruciferous vegetables like broccoli, which has been shown to have protective effects toward a wide range of viruses such as HIV, 34 Respiratory syncytial virus, 35 Hepatitis C virus, 36 and inhibit Epstein-Barr virus reactivation. 37
Due to the complex input dataset, a large number of common genes were identified by the ranking and mapping process with the help of correlation engine. To further link the genes to their functional pathways, we conducted further canonical pathway analysis and compared the common canonical pathways indicated by our bioset and the one acquired from SFN in IPA. Surprisingly, a majoirty of the top 10 canonical pathways identified from our dataset are also significantly altered in SFN dataset (
Our analysis highlights important limitations of the SRP when utilized to identify anti-herpetic drugs. These limitations may be partially overcome by considering the transcritomic signal from host-defense mechanism. Testing the candidate drugs using disease- or tissue-specific cell types would also be of significant importance.
There are several limitations to this study: i) limited number of compounds from the top 10 list were tested
Methods
Data processing with correlation engine
The process of identifying a potential antiviral is based on the comparison of gene expression signature changes. To identify compounds that can induce opposite or similar gene expression changes to the source gene expression signature, the original dataset was compared to a large number of other expression profiles with statistically filtered genes in a comprehensive database. A commercially available genomic knowledgebase called BaseSpace Correlation Engine (Illumina®) fascilitated the screening process (https://www.illumina.com/products/by-type/informatics-products/basespace-correlation -engine.html). The Correlation Engine database contains over 1,69,000 lists of statistically filtered genes from over 22,000 studies carried out in 16 species (as of April, 2019). The expression signature we input into the Correlation Engine are a list of genes with fold changes that reflect expression changes between normal hiPSC-derived neurons and those exposed to HSV-1 and undergoing acute infection (MOI = 0.3, data available at https://www.ncbi.nlm.nih.go-v/geo/query/acc.cgi?acc=GSE111656).
24
This list is referred to as ‘bioset’ in the following contexts. One could consider the bioset the primary entity in subsequent analyses, consisting of a ranked list of elements (genes, probes, proteins, compounds, single-nucleotide variants [SNVs], sequence regions, etc.) that corresponds to a given experimental factor or condition in an experiment or an assay, for a gene expression experiment, the biosets will consist of gene lists with associated change values and statistical information for each relevant experimental factor. Significance
The aforementioned signature is compared to all other biosets in the database using a specialized geneset enrichment algorithm which is a fold-change rank-based statistical test called the Running Fishers test. Its general design is analogous to the Gene Set Enrichment Analysis (GSEA) method. It dynamically detects the most significant enrichment signal in a ranked signature set, allowing the signature set to contain relatively more comprehensive collections of genes at a preselected statistical cutoff. It allows us to assess the overlap in regulated genes and determine if those genes are regulated in a similar or opposite manner. 22 However, running Fisher algorithm differs from GSEA in the assessment of the statistical significance, where p-values are computed by a Fisher’s exact test rather than by permutations. Overall, the advantage of this approach is the flexibility of being able to compute correlation scores for data of different sizes and filter thresholds. The directional relationship between 2 signatures from two biosets is capturedby the sign of the correlation score. Upregulated genes and downregulated genes are separated into directional subsets, and correlation scores are computed for each directional subset from one signature against each subset from the other signature. The overall correlation score is the sum of directional subset scores, and the sign of the sum determines whether the 2 signatures are positively or negatively correlated.
Specifically,
Cell lines and virus strain
All cells were cultured in standard conditions (37 °C, 5% CO2, and 100% humidity).
African green monkey (Vero) cells (CCL-81; ATCC) were maintained in Dulbecco’s Modified Eagle Medium (Gibco) supplemented with 10% fetal bovine serum (FBS; HyClone) and 5% anti-biotic/anti-mycotic (Gibco).
Human NPCs were derived from human induced pluripotent stem cell (hiPSC) line 73–56,010–02 using the previously described method. 41 hiPSC 73–56,010–02 were established at the National Institute of Mental Health (NIMH) Center for Collaborative Studies of Mental Disorders-funded Rutgers University Cell and DNA Repository (http://www.rucdr.org/mental-health) (RUCDR).
HSV-1 strain applied in this study is a KOS-based (VR-1493; ATCC) recombinant that was previously described, which expresses enhanced green fluorescent protein (EGFP) from the immediately early ICP0 promoter and monomeric red fluorescent protein (RFP) driven by the true late regulated promoter driving expression of Glycoprotein C. 42
Cell culture infection
Vero cells were seeded in 96-well flat-bottom cell culture plates at recommended density and incubated in media until approximately 80–85% confluency. Cells were then infected as previously described using the KOS-based HSV-1 at a multiplicity of infection (MOI) of 0.3. Infecting virus was removed 1 hour post infection. Cells were washed and then further incubated in media with or without compounds, so that compounds were applied after absorption. Vero cells were then detached with Trypsin-EDTA solution (0.25%, Cat#T4049) at 48 hours post infection and prepared for flow cytometry.
NPCs were seeded in 96-well cell culture plates coated overnight with Matrigel (Corning, REF#356234) at recommend density and cultured in STEMdiff™ Neural Progenitor medium (NP medium) until cells were approximately 80–85% confluent. The cells were infected with the same virus strain at MOI = 0.1 in the presence or absence of tested compounds supplemented into STEMdiff™ NP medium. In our previous study, this MOI was shown to be effective and high enough to induce lytic infection in NPC culture. 43 NPCs infected in the presence of compounds were also pretreated with the antiviral for 24 hours. After one hour, the infection medium was removed, cells were washed with PBS and maintained in STEMdiff™ NP medium with or without antiviral. NPCs were dissociated with Accutase (Biolegend, Cat#423201) 72 hours after infection and prepared for flow cytometry.
Flow cytometry
Flow cytometry was carried out using a BD Fortessa LSR (Beckman Dickenson) for quantitative analysis of fluorescent cells and cell viability was assayed using the Viability Dye 780 (BioGems.Ltd).
Statistical analysis
All experiments were performed in biological triplicates or quadruplicates. When there’s comparison between one experimental group and untreated group, unpaired Student t test was applied and two-tail
Conclusions
Based on the transcriptomics data from HSV-1 infected neurons, we identified a list of candidate antiviral compounds with opposite or similar signatures. Although the negatively correlated compounds were ineffective against acute HSV-1 infection in Vero cell cultures, a positively correlated compound (Sulforaphane) achieved moderate anti-HSV-1 effect in Vero cell culture, but its toxicity overrides its antiviral effect in NPC culture. More comprehensive methods should be developed to balance the effect of virus and the defense from the host which are reflected in transcriptomics data acquired from infected
Footnotes
Availability of data and materials
Authors’ contributions
WZ performed the experiments with great help from AC (
Acknowledgements
We sincerely thank Jacquelynn Jones for her kind edits to improve the readability of the context.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by Stanley Medical Research Foundation 07 R-1712, Veterans Administration Pittsburgh Health System Start-Up Funds (Vishwajit L Nimgaonkar, M.D., PhD.), and by the NINDS R01 NS115082-01A1. We also thank China Scholarship Council (CSC) for providing support to WZ (CSC ID: 201806370301).
