Abstract
Introduction
An outbreak of coronavirus disease 2019 (COVID-19), caused by a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become an extraordinary event and a public health threat owing to its global transmission. 1 The SARS-CoV-2 main protease (Mpro; also known as 3-chymotrypsin-like protease) is involved in the processing of viral polyproteins to produce mature active proteins. 2 The inhibition of SARS-CoV-2 Mpro might be a therapeutic intervention for COVID-19. 3 Hence, this protease presents an attractive target for drug-discovery efforts.
A large amount of activity data are available for existing inhibitors of SARS-CoV-2 Mpro that belong to diverse structural classes.4–14 Several FDA-approved protease inhibitors such as ritonavir and danoprevir were found to inhibit SARS-CoV-2 Mpro at low micromolar concentrations. 15 The pharmacophore models of SARS-CoV-2 Mpro inhibitors based on the structure of ligand-receptor complexes have been extracted.16–19 However, to the best of our knowledge, to date, there is little information available regarding a highly predictive pharmacophore for SARS-CoV-2 Mpro inhibitors. Therefore, we aimed to generate chemical feature-based pharmacophores for SARS-CoV-2 Mpro inhibitors to guide the discovery of structurally novel inhibitors with improved efficacy.
In this study, we developed chemical feature-based pharmacophores for SARS-CoV-2 Mpro inhibitors. The best-validated pharmacophore model (Hypo1) was used as a query for virtual screening against natural molecules from an in-house database. The hit compounds were then filtered using Lipinski's rule of five as the filtration criteria. The study identified a potential hit molecule through an extensive activity assay in vitro and molecular docking studies.
Results and Discussion
The three-dimensional (3D)-QSAR pharmacophore protocol was used for pharmacophore model generation. The top 10 pharmacophore models were obtained, which contained two hydrogen bond acceptor features (A1 and A2), one hydrogen bond donor feature (D), and one general hydrophobic feature (H). The characteristics of the 10 generated pharmacophores, such as total cost, cost differences, root mean square (RMS) deviation, and correlation coefficient, are listed in Table 1. Cost analysis was used to determine the significance of the pharmacophore models. 20 For a significant pharmacophore model, the cost difference must be sufficiently large. Other parameters, such as the correlation coefficient and RMS deviation value, were used to evaluate the ability of a pharmacophore model to predict the activity of training set molecules. 21 The first model (Hypo1) among the 10 generated pharmacophore models had the highest cost difference (735.01), lowest RMS deviation (1.5433), and best correlation coefficient (0.982). Therefore, Hypo1 was selected as the signified pharmacophore model. Figure 1 shows the 3D spatial arrangement of Hypo1 and its alignment with the most active compound (IC50 = 0.0076 μM) in the training set.

Best pharmacophore model Hypo 1. (A) Hypo 1 illustrating hydrophobic (H), hydrogen bond donor (D), and hydrogen bond acceptor (A). Distance-1, 10.386 Å; Distance-2, 5.491 Å; Distance-3, 7.343 Å; Distance-4, 5.552 Å; Distance-5, 6.232 Å; and Distance-6 6.538 Å. (B) Hypo 1 mapping with training set compound 1 (IC50 = 0.0076 μM).
Summary of Hypotheses Run. a
Null cost of 10 top-scored hypotheses is 843.473. Fixed cost value is 77.1876. Configuration cost is 19.2586.
The significance of the best pharmacophore model, Hypo1, was evaluated based on its ability to predict the biological activity of the compound. First, Hypo1 showed a reliable ability to predict the activities of the compounds from the training set (Table 2). Of the 24 compounds in the training set, 23 had errors of less than 5%. Second, the predictive ability of Hypo1 was validated using a test set of 24 compounds (Figure 2). Most of the IC50 values were predicted accurately, indicating that Hypo1 has excellent predictive validity. The experimental and estimated activities of the test set are listed in Table 3. Third, the crystal structure of GC-376 (an effective inhibitor of SARS-CoV-2 Mpro) was mapped to the pharmacophore Hypo1. The ligand GC-376 was mapped satisfactorily to the four essential features of Hypo1 with a crystallized conformation (PDB ID: 7D1M) (Figure 3A). The predicted orientation of GC-376 generated by Hypo1 was similar to the experimental binding orientation characterized by X-ray crystallography, with an RMS deviation of 0.8708 (Figure 3B). The ligand GC-376 was used to assess whether Hypo1 could accurately predict the experimental activity of the compound. Our results indicated that the experimental and predicted IC50 values of GC-376 were 0.15 μM and 0.19 μM, respectively, with an error factor of 1.25%. These results further indicate that our pharmacophore model is reliable.

Molecular structures and activity data (IC50 values, μM) of the 24 test set compounds.

Validation of the best pharmacophore model Hypo 1 with the crystal structure of compound GC-376. (A) Hypo 1 mapping with the crystal structure of compound GC-376. Pharmacophore model illustrating hydrophobic (H), hydrogen bond donor (D), and hydrogen bond acceptor (A). (B) The conformation of GC-376 proposed by Hypo 1 (C1) mapping with that taken from the crystal structure (C2).
Experimental Activity and Estimated Activity for the Training Set Based on the Pharmacophore Model Hypo 1.
Experimental Activity and Estimated Activity for the Test Set Based on the Pharmacophore Model Hypo 1.
We used Hypo1 as a 3D query to screen an in-house library of natural compounds, and 1502 hits were captured. To assess drug-like and pharmacokinetic properties, these hit compounds were further subjected to Lipinski's rule of five. The results showed that 255 compounds met the standards. Subsequently, the 10 compounds with the highest ranking were selected for determination of their inhibitory activity against SARS-CoV-2 Mpro in vitro using a fluorescence resonance energy transfer protease assay. The results showed that one compound (W-7) demonstrated a significant IC50 value of 75 μM.
Docking was performed using the CDOCKER algorithm to predict the binding mode between the hit compound W-7 and SARS-CoV-2 Mpro. The co-crystallized compound GC-376 was first re-docked into the active site of SARS-CoV-2 Mpro. The results showed that CDOCKER was able to correctly dock compound GC-376 to the active center of SARS-CoV-2 Mpro with an RMS deviation of 0.8708. Subsequently, the hit compound W-7 was docked into the active site of SARS-CoV-2 Mpro. As shown in Figure 4A, compound W-7 was found to dock well and was located at the center of the binding pocket. The docking results (Figure 4B) also indicated that two of the hydroxyl groups of the hit compound and Glu 166 of SARS-CoV-2 Mpro formed optimal hydrogen bonds. In addition, a hydrogen bond was formed between one of the hydroxyl oxygens of the hit compound and Gln 189. Furthermore, compound W-7 satisfied the expected hydrophobic interaction, as defined by our query Hypo1, with the surrounding residues as His41, Met49, Phe140, Ser144, and Met165. This binding motif is in agreement with the results of a previous study. 22 The alignment of Hypo1 with compound W-7 was performed. The results indicated that W-7 could map Hypo1 well (Figure 4C). The RMS deviation between the orientation of compound W-7 mapped with Hypo1 and its orientation after docking was 0.807 (Figure 4D). These results clearly confirmed that the binding mode between SARS-CoV-2 Mpro and W-7 is consistent with that proposed by Hypo1.

The hit compound W-7. (A) A 3D structure of W-7 in SARS-CoV-2 Mpro binding groove (PDB ID: 7D1M). SARS-CoV-2 Mpro is represented as a molecular surface colored by electrostatic potential. The negative, neutral, and positive regions are represented by opaque red, white, and blue, respectively. (B) Model of W-7 bound to SARS-CoV-2 Mpro (PDB ID: 7D1M). Hydrogen bonds shown in dotted lines. (C) Hypo 1 mapping with W-7. Pharmacophore model illustrating hydrophobic (H), hydrogen bond donor (D), and hydrogen bond acceptor (A). (D) Conformation of W-7 proposed by Hypo 1 (C1) mapping with that created by the docking (C2).
Conclusions
In summary, pharmacophore models for SARS-CoV-2 Mpro inhibitors were constructed based on 24 diverse compounds with varying levels of activity. The best pharmacophore (Hypo1) was selected based on the highest cost difference and correlation coefficient, and the lowest total cost and RMS value. The validated Hypo1 was then employed as a query for virtual screening of molecules from an in-house natural product database, and 1502 hits were identified. The hit compounds were further analyzed using drug-like filters and SARS-CoV-2 Mpro-inhibitory activity analysis. Finally, one hit (W-7) showed effective biological activity, with an IC50 value of 75 μM. Molecular docking was performed to understand the interaction between this hit compound and SARS-CoV-2 Mpro. Taken together, these results show that the hit W-7 may act as a good lead against SARS-CoV-2 Mpro.
Materials and Methods
Generation of Pharmacophore Models
A set of 24 structurally diverse compounds with IC50 values ranging from 0.0076 μM to 635 μM was used as the training set compounds (Figure 5). The structures of the selected molecular datasets were built using Discovery Studio 2019 (DS). Every compound was minimized to a local energy minimum using the CHARMM force field. 23 A maximum of 250 conformers within a threshold of 20 kcal/mol above the global energy minimum was generated for each compound using the Poling algorithm. 24 Three-dimensional QSAR pharmacophore methodologies in DS were used to create hypotheses. The chemical features used to construct the hypotheses were the hydrogen bond acceptor (A), hydrogen bond donor (D), and general hydrophobic features (H). 18

Molecular structures and activity data (IC50 values, μM) of the 24 training set compounds.
Database Search
The validated pharmacophore Hypo1 was utilized as a 3D query to search an in-house library containing 34 439 natural compounds to obtain hit molecules that matched the chemical features of the queried pharmacophore. The Search 3D database protocol implemented within DS was employed with a best/flexible search approach. All hit compounds were further evaluated using Lipinski's rule of five.
Biological Testing
A fluorescence resonance energy transfer protease assay (Beyotime Biotechnology) was used to determine the inhibitory activity of the compounds against SARS-CoV-2 Mpro. The recombinant SARS-CoV-2 Mpro was mixed with an assay buffer, serial dilutions of each compound, or the positive control inhibitor ebselen with a final volume of 96 μL, and incubated. The reaction was initiated by the addition of 4 μL fluorogenic substrate. Subsequently, the fluorescence signal at 325 nm (excitation) and 393 nm (emission) was measured every 30 s for 10 min using a SpectraMax i3x multifunctional enzyme marker (Molecular Devices). The relative fluorescence units of compounds at various concentrations compared to those added with DMSO were used to generate IC50 curves.
Molecular Docking
The molecules were subjected to docking in the SARS-CoV-2 Mpro active site using CDOCKER within DS. 25 The 3D structure of SARS-CoV-2 Mpro (PDB ID: 7D1M) was obtained from the Protein Data Bank (http://www.rcsb.org/pdb/). The active site was defined as a sphere of 12 Å around the geometric centroid of the co-crystal ligand GC-376. Each ligand was allowed to generate 10 different orientations, and the most desirable orientation was selected based on the scoring function (-CDOCKER interaction energy) and binding interaction.
Footnotes
Authors’ Notes
Jing Wang, Yu Jiang, Zhanli Wang, and Yuheng Ma contributed equally to this work.
Authors’ Contributions
JW and YJ conducted the study, performed data analysis, and prepared the manuscript. YW and HY performed experiments. ZW and YM designed and supported the study, and edited the manuscript. All authors approved the final version of the manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article. This study was supported by the National Natural Science Foundation of China (grant number 82170296), and Scientific Research Fund Project of Baotou Medical College (grant number BYJJ-XG 202001).
