Development of cardiotoxicity model using ligand-centric and receptor-centric descriptors

Abstract

Background:

Bioinformatics and statistical analysis have been employed to develop a classification model to distinguish toxic and non-toxic molecules.

Aims:

The primary objective of this study is to enumerate the cut-off values of various physico-chemical (ligand-centric) and target interaction (receptor-centric) descriptors which forms the basis for classifying cardiotoxic and non-toxic molecules. We also sought correlation of molecular docking, absorption, distribution, metabolism, excretion, and toxicology (ADMET) parameters, Lipinski rules, physico-chemical parameters, etc. of human cardiotoxicity drugs.

Methods:

A training and test set of 91 compounds were applied to linear discriminant analysis (LDA) using 2D and 3D descriptors as discriminating variables representing various molecular modeling parameters to identify which function of descriptor type is responsible for cardiotoxicity. Internal validation was performed using the leave-one-out cross-validation methodology ensuing in good results, assuring the stability of the discriminant function (DF).

Results:

The values of the statistical parameters Fisher Discriminant Analysis (FDA) and Wilk’s λ for the DF showed reliable statistical significance, as long as the success rate in the prediction for both the training and the test set attained more than 93% accuracy, 87.50% sensitivity and 94.74% specificity.

Conclusion:

The predictive model was built using a hybrid approach using organ-specific targets for docking and ADMET properties for the FDA (Food and Drug Administration) approved and withdrawn drugs. Classifiers were developed by linear discriminant analysis and the cut-off was enumerated by receiver operating characteristic curve (ROC) analysis to achieve reliable specificity and sensitivity.

Keywords

Linear discriminant analysis (LDA)cardiotoxicity leave-one-out discriminant function (DF)Wilk’s lambda statistical model

Introduction

The National Cancer Institute defines cardiotoxicity as the toxicity that affects the heart.¹ It is associated with the deviation of cardiac electrical activity and contractile dysfunction leading to heart failure.^2,3 The development of competent medication is also challenging due to the high investment and research needed to pursue safe drugs with no potential of adverse side-effects.^4,5 Drug-induced cardiotoxicity is majorly caused by anthracyclines, arrhythmias, antidepressants, beta-blockers, among others.^6
–8 A high dose of cardiotoxic drugs is reported to cause apoptosis and deregulation of myo-contractility.^9,10 Also, these drugs induce cardiac muscle injury and modification in the normal functioning of the ion channels and pumps (voltage-gated sodium and potassium ion channel and Na⁺-K⁺ ATPase pump).^11
–13

Various drugs including antidepressants, antipsychotics, antihistamines, analgesics, opioids, beta-blockers, antiarrhythmics, ACE inhibitors, diuretics, etc. are known to contribute side-effects and toxicity by interacting with numerous receptors both centrally and peripherally resulting in cardiac deaths.^8,14
–16 For instance, psychotropic drugs directly affect cardiac repolarization contributing to heart muscle disease and high-risk mental disorder.¹⁴ Dopaminergic, serotonergic, histaminergic, beta-adrenergic, and muscarinic receptors are the examples of psychotropic targets producing adverse cardiovascular side-effects such as orthostatic hypotension, syncope and related medical ailment.¹⁷

A drug discovery cycle ranges from 10 years and more and their cost in research and development is estimated to $1,000 million for a pharmaceutical company.¹⁸ The methods to forecast blocking orproduction of some existing drugs for the treatment of various diseases is threatened with discontinuation, because of limited markets and suspicion of long-term adverse effects.¹⁹ Computational approaches reduce these risks by learning the available toxicity data and make informed decisions to pursue new chemical entities or optimized leads under in vitro and in vivo testing.²⁰ There is a potential need to develop toxicity prediction models by using in silico methods to forecast blocking or reversing the toxicity effects which can subsequently get validated through in vitro and in vivo methods.^14,20,21 Lee et al. showed hERG (human ether-a-go-go related gene)-related cardiotoxicity prediction using neural network models which achieved 80% accuracy, 60% sensitivity and 100% specificity.¹² Structure-based drug designing efforts of Angiotensin II receptor type 1 (AT1) antagonists identified high-affinity hERG1 blocking compounds as potential AT1 inhibitors.¹³ These computational approaches derive relationships between different compounds through the physicochemical properties and its toxicological endpoints (e.g. LD50); popularly known as Quantitative Structure-Activity Relationships (QSAR) approach.^22,23 Similar to physico-chemical descriptors as QSAR variables, absorption, distribution, metabolism, excretion, and toxicology (ADMET) properties of compounds are also treated as variables in QSAR model building.^22,24

We have proposed a toxicity prediction model in this article for the prediction of probable cardiotoxic drugs using a multi-parametric approach combining drug status, calculated physico-chemical and ADMET properties (ligand-centric descriptors) and docking scores obtained from protein target interactions (receptor-centric descriptors).^20,25 All these properties were then used to decipher the discriminant function using Linear Discriminant Analysis (LDA) for distinguishing cardiotoxic and non-toxic compounds. Fisher Discriminant Analysis (FDA) classified the compound data in the form of two classes based on independent and dependent variables.^26
–28 The classification of drugs being toxic or non-toxic relied on the coefficients of linear discriminant function and selected targets from each group play a crucial role in probability generation.²⁹

The objective of LDA is to derive a function to classify cardiotoxic and non-toxic compounds by determining the “cut-off” values for each independent variables and the LDA classification performance was evaluated using various statistical measures of Receiver Operating Characteristic curve (ROC curve) including sensitivity, specificity and accuracy^30,31 (Figure 1).

Figure 1.

The flow chart of the proposed method combining ligand-centric and receptor-centric descriptors in LDA modeling.

Materials and methods

Compounds collection and descriptors calculations

We chose four different types of drug classes having implications in the cardiac system. These include antidepressants, antihistamines, antiarrhythmic and beta-blockers. A total of 91 drugs including approved and withdrawn composed of 72 non-cardiotoxic compounds (79%) and 19 toxic (21%) compounds. were compiled from different databases and resources such as Drugbank (https://www.drugbank.ca/),³² Drugs.com (https://www.drugs.com/),³³ LiverTox (https://www.ncbi.nlm.nih.gov/books/NBK547852/),³⁴ etc (Table S1). The 3D structures of drugs were retrieved from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/) in structure data format (SDF) format. Subsequently, drug-likeness, Lipinski’s rule of five and physico-chemical properties were calculated using the drug-likeness tool (DrugLiTo)³⁵ and Osiris property explorer.^36

–41 Subsequently, we computed ADMET properties using ADMET Predictor (Simulation Plus Inc.) such as logP, topological surface area (TPSA Å²), ADMET_solubility, Perm_cornea, etc.

Virtual screening against selected protein targets

We selected protein targets by initially gathering the details regarding the primary targets of 91 drugs with evidence of pharmacological action and selected the most common four targets implicated in cardiotoxicity. For example, Timolol antagonizes the B1-adrenergic receptor. The targets were retrieved from Protein Data Bank (PDB) based on resolution and ligand-bound complexes: histamine H1 receptor (PDB ID: 3rze)⁴² for antihistamines, cytochrome P450 2D6 (PDB ID: 3tbg)⁴³ for antidepressants and cytochrome P450 3A4 (PDB ID: 4d6z)⁴⁴ for antiarrhythmics. The homology model of the B1-adrenergic receptor for beta-blockers was developed using the Modeller program due to the unavailability of the crystal structure suitable for virtual screening. First, we retrieved the primary sequence of B1-adrenergic receptor from UniProtKB (P08588)⁴⁵ in Fasta format followed by template identification as a part of Modeller modeling protocol.⁴⁶ The best protein model was selected based on GA341, a composite score of compactness, statistical distance potential and sequence identity between target and template, and zDOPE score (z-statistic applied on Discrete Optimized Protein Energy), a distance-dependent potential native function of Modeller program.⁴⁷ The homology model was further evaluated for stereochemistry checks using Ramachandran plot.⁴⁸

All the four ligand-bound protein complexes (3 PDB protein-ligand complexes and 1 homology model with ligand-bound template) were prepared using YASARA Structure (academic license)⁴⁹ with AMBER03 force field.⁵⁰ Hydrogens were added and bond orders and hybridization states were assigned using Clean utility.⁵¹ The ligand-bound site was defined as the docking site (simulation cell) for virtual screening using YASARA Structure. The simulation cell was placed by inspecting the centroid of the bound ligand and cell boundaries were defined in each direction by 8 Å distance.²⁵ We chose the YASARA Vina docking module for virtual screening in which Vina applies a stochastic-based dock pose search and implements the YASARA scoring function to score the docked pose in energy units (kcal/mol). According to YASARA conventions, positive energy score indicates high affinity in direct contrast to other known scoring functions where low negative energy values represent best binding pose.^52,53

Linear discriminant analysis of descriptors

The selection of descriptors was made by LDA in such a way to better classify cardiotoxic and non-toxic compounds and shed light on the mechanism of toxicity. The descriptors selection process was performed based on toxic and non-toxic compounds through linear discriminant function.⁵⁴ The discriminant function (DF) was defined as follows^54,55:

Y = a_{0} x_{0} + a_{1} x_{1} + a_{2} x_{2} + . . . + a_{n} x_{n}

where Y is a discriminant score, the dependent variable; x ₀, x ₁,…, x_n are the selected descriptors (independent variables) and a ₀, a ₁,…, a _n are the coefficients or weights of descriptors calculated by the least-squares method.

We utilized the rule of maximum likelihood picking up a DF with high importance while ensuring few descriptors to obtain a DF with statistical significance. Simultaneously, over-fitting as well as chance correlations among descriptors are avoided to create meaningful DF with better discriminating power. We assessed the following parameters for identifying the best DF: Wilk’s λ, Fisher degree (F) and the p-level (p) with 95% confidence limit and leave-one-out cross-validation (LOOCV).^56
–58 It was analyzed whether the subset of compound data has over the top effect over the created DF to examine over-fitting. We also examined the predicted classification in both classes (cardiotoxic and non-toxic). The LDA model consistency was also studied by evaluating the performance on test data (left-out data) upon selected DF which was not used for creating the DF.

Performance of the LDA model

To assess the performance of DF in better distinguishing between two classes of compounds, we utilized SPSS (Statistical Package for Social Sciences package)⁵⁹ statistical software to generate LDA classification models. After the model generation, ROC curve analysis was carried out to decipher cut-offs between two ROC parameters: sensitivity (true positives) and specificity (true negatives) values.^30,31,60 The choice of DF was made by assessing the area under the ROC curve (AUC) which is the widely used measure of discriminatory power of a diagnostic test⁶¹. We accepted a DF among various DFs which is superior in statistical significance (p < 0.01)⁶² and Wilk’s λ with high sensitivity (most non-toxic compounds classified as non-toxic in LDA class prediction) and with best possible high specificity (few toxic compounds predicted as non-toxic exceptionally). In the regulative aspect of the study, the importance was given to the sensitivity rather than specificity which enabled the correct classification of non-toxic compounds while allowing few toxic compounds as non-toxic which slightly affect the accuracy of the LDA model. Various evaluative measures of ROC curve can be mathematically expressed⁶³ as follows:

A c c u r a c y = \frac{(T P + T N)}{T o t a l} \times 100

S e n s i t i v i t y = \frac{T P}{T P + F N} \times 100

S p e c i f i c i t y = \frac{T N}{T N + F P} \times 100

where TP = true positive, TN = true negative, FP = false positive, and FN = false negative.

Results and discussion

Target selection and virtual screening of compound set

We chose the most common targets of cardiovascular system to understand the interactions of compound sets viz. B1-adrenergic receptor, histamine H1 receptor, cytochrome P450 2D6 and cytochrome P450 3A4. The rationale behind selecting only four targets of cardiovascular system as the compound set used in the study possess binding capabilities to any one of the four targets. There are other interesting targets such as A2A adenosine receptor which have a myriad of clinical applications related to cardiovascular diseases. Further, the protein targets for the compound set was sought after the compilation of compound set which falls in four different classes, antidepressants, classified before high risk antihistamines, beta-blockers and antiarrhythmics. It will be desirable to build a comprehensive classification model by enhancing the compound set and target set.

Due to the unavailability of crystal structure of Human B1-adrenergic receptor, we modeled its structure using Modeller program. The template, Turkey (bird) B1-adrenergic receptor, PDB ID: 2y00 was chosen due to its best alignment with our query protein sequence: 1e-166 E value with 68% sequence identity and 73% sequence coverage (Figure S1). The template is complexed with dobutamine drug (a β1-agonist) which offered a reliable ligand-bound conformation to be modeled in query structure and subsequently used in virtual screening purposes. Moreover, the modeled protein obtained a GA341 score of 1.0 and zDOPE score of 1.03 indicating a reliable model for structure-based studies (Figure S2). Ramachandran plot showed 98% amino acid residues lie in favored region and 2% in allowed region without any outliers (Figure S3).

The remaining three targets were downloaded from PDB. The protein structures and compound set were prepared for virtual screening. The ligand-bound site in the structures were defined as simulation cell for docking with YASARA Structure (academic license). The binding energy of compound set (dock score; kcal/mol) with its respective protein target obtained from virtual screening is given in Table 1. The best binding pose of top five ranked compounds along with its crystal ligand across four targets is illustrated in Supplementary Figure S4 to S7. A variety of intermolecular interactions were observed including electrostatic bonds, hydrogen bonds, van der Waals and pi-sigma bonds, which boosted the dock score effectively. For example, Triprolidine secured a dock score of 10.808 kcal/mol (more positive means better affinity) with histamine H1 receptor. The aromatic moieties of Triprolidine is stacked between Tyr408 and Phe432 residues along with hydrophobic contacts by Asn19, Trp428, Phe435 and other pocket residues.

Table 1.

The computed physico-chemical descriptors and dock score of compounds set with respective protein targets.

Sr. No.	Drug Name	MolecularWeight (kDa)	logp	TPSA(A^2)	Docking Score (kcal/mol)
1	Amitriptyline	277.412	4.633	4.44	8.93
2	Amineptine	337.5	2.504	4.44	9.74
3	Bupropion	239.747	2.412	33.68	7.13
4	Clomipramine	314.861	2.807	7.68	8.06
5	Desipramine	266.4	2.473	7.68	7.91
6	Dimetacrine	294.443	3.618	7.68	8.7
7	Dosulepine	295.449	2.575	29.74	8.46
8	Doxepin	279.384	2.383	29.74	8.7
9	Iproclozide	242.7	2.441	7.68	8.2
10	Lofepramine	419	30.91	24.75	10.37
11	Nortriptyline	263.4	0.677	76.97	7.37
12	Phenoxypropazine	166.22	4.665	16.61	8.79
13	Protriptyline	263.4	4.699	16.61	8.64
14	Trimipramine	294.443	2.738	7.68	8.542
15	Imipramine	280.4	2.872	41.3	7.395
16	Astemizole	458.6	1.348	16.8	5.432
17	Azatadine	290.4	3.727	8.88	10.069
18	Benactyzine	327.4	1.437	8.88	10.084
19	Cinnarizine	368.5	4.318	4.44	7.136
20	Cyclizine	266.4	1.014	16.8	8.728
21	Cyproheptadine	287.4	1.763	13.67	9.249
22	Dexbrompheniramine	319.24	2.306	13.67	10.199
23	Diphenylpyraline	281.4	3.011	6.48	10.845
24	Bupranolol	271.78	2.686	13.67	9.226
25	Olopatadine	337.4	0.886	29.27	8.829
26	Ketotifen	309.4	1.279	20.04	9.449
27	Clemastine	343.4	2.594	16.8	10.808
28	Cetirizine	388.9	1.718	0	5.687
29	Flunarizine	404.5	2.189	59.89	7.571
30	Terfenadine	471.7	0.497	122.77	7.52
31	Buclizine	433	5.47	94.52	7.86
32	Doxylamine	270.37	2.239	42.77	9.356
33	Phenylpropanolamine	151.21	1.076	72.37	6.939
34	Thiethylperazine	399.6	4.096	64.17	9.1
35	Propafenone	341.4	3.518	79.22	7.597
36	Pyrilamine	285.4	3.362	59.7	8.073
37	Loratadine	382.9	1.18	33.54	7.038
38	Sotalol	272.37	1.847	36.87	6.348
39	Meclizine	390.9	1.417	58.2	8.467
40	Xylometazoline	244.37	0.187	59.56	7.054
41	Triprolidine	278.4	2.61	63.14	7.961
42	Carbinoxamine	290.79	0.857	46.26	8.618
43	Ajmaline	326.4	0.237	91.39	7.144
44	Amiodarone	645.3	1.389	108.56	7.06
45	Bretylium	243.16	0.924	56.74	6.358
46	Disopyramide	339.5	−2.367	136.26	6.449
47	Dronedarone	556.8	1.292	48.14	8.744
48	Encainide	352.5	5.95	39.97	7.709
49	Esmolol	295.37	0.62	75.17	6.88
50	Flecainide	414.348	2.196	76.56	8.16
51	Ibutilide	384.6	1.412	95.48	7.5
52	Indecainide	308.4	5.256	46.07	6.02
53	Lidocaine	234.34	−0.341	119.19	7.02
54	Lorcainide	370.9	1.076	72.37	7.55
55	Mexiletine	179.26	3.025	97.92	7.58
56	Phenytoin	252.27	2.072	100.16	8.52
57	Procainamide	235.33	1.625	58.1	7.45
58	Prenylamine	329.5	0.814	55.3	6.15
59	Quinidine	324.4	0.518	86.53	6.67
60	Timolol	316.42	0.908	75.53	7.45
61	Tocainide	192.26	1.848	55.3	5.61
62	Nimesulide	308.31	3.568	46.07	7.08
63	Acebutolol	336.4	1.389	58.1	6.34
64	Alprenolol	249.35	0.572	75.17	6.61
65	Atenolol	266.3	0.455	106.66	7.92
66	Bitoterol	557.7	2.135	46.07	7.08
67	Bisoprolol	325.4	0.237	91.39	5.85
68	Bopindolol	380.5	2.457	87.2	7.22
69	Carteolol	292.37	2.201	71.37	6.51
70	Carvedilol	406.5	1.189	92.24	6.83
71	Cerivastatin	459.5	2.241	46.07	7.07
72	Cloranolol	292.2	−0.353	89.16	6.94
73	Epanolol	369.4	1.552	55.3	6.69
74	Betaxolol	307.4	1.81	64.53	6.42
75	Fenproporex	188.27	1.349	64.53	6.47
76	Isradipine	371.4	3.844	64.17	9.031
77	Labetalol	328.4	3.4	99.9	8.136
78	Mepindolol	262.35	3.8	53.1	8.18
79	Metoprolol	267.36	2.8	49.3	7.382
80	Nadolol	309.4	3	121	6.172
81	Nebivolol	405.4	2.6	110	7.847
82	Oxprenolol	265.35	2.3	68.8	7.667
83	Penbutolol	291.4	2.6	94.6	8.321
84	Pindolol	248.32	4.1	63.3	7.927
85	Practolol	266.34	3.7	91.7	9.092
86	Prazosin	383.4	2.8	57.6	8.362
87	Propranolol	259.339	2.2	82.6	7.573
88	Sparfloxacin	392.4	2.2	146	8.279
89	Talinolol	363.5	5.1	20.2	6.191
90	Celiprolol	379.5	−1.7	113	7.243
91	Zimelidine	317.22	1.6	38.3	6.02

The dock pose examination of top 5-ranked antidepressants against cytochrome P450 2D6 revealed that Lofepramine and Notriptyline exhibited a pi-pi stack with Phe120 residue similar to its co-crystal ligand, thioridazine derivative. In the case of histamine H1 receptor, doxepine, a compound in the set which is also the co-crystal ligand of target (PDB id: 3rze) secured a dock score of 8.4 kcal/mol with three pi-pi stacks established by residues Trp428, Phe432 and Phe435. Interestingly, all the top 5-ranked compounds had a score of ∼10.00 kcal/mol and created the three pi-pi stacks pattern of co-crystal ligand. The higher affinity of hits in contrast to co-crystal ligand is contributed to additional hydrophobic contacts.

For B1-adrenergic receptor, we first validated the ligand-bound pocket conformation of bird B1-adrenergic template modeled on the Human B1-adrenergic receptor by placing the template co-crystal ligand (dobutamine) in modeled one and re-docked using YASARA Vina. We obtained few best poses with root mean square deviation (RMSD) less than 2 A illustrating the reliability of docking protocol in generating crystal close conformations (results not shown). The best pose of top 5-ranked beta-blockers guided by high dock score did not generate common intramolecular interaction patterns. For example, bopindolol and esmolol established a pi-pi stack with Phe218 residue whereas prazosin produced a H bond with this residue. Although different types of contacts were developed, the dock score of top 5 beta-blockers secured less than 10 kcal/mol. Similar to B1-adrenergic receptor docking task, the top 5 antiarrhythmics obtained a dock score less than 10 kcal/mol. The pyridine-carbamate co-crystal ligand of cytochrome P450 3A4 (PDB id: 4d6z) produced pi-pi stacks with Phe108 and Phe304 residues. Although the same interaction pattern was not observed in the top 5-ranked compounds, the target pocket is enriched with phenylalanine residues enabling several pi-pi stack mode of interactions. The residues include Phe57, Phe108, Phe213, Phe215 and Phe304.

Descriptors calculations

We have used ligand-centric and receptor-centric descriptors for the development of the cardiotoxic LDA classification model. The former was calculated based on 2D and 3D geometry of ligands whereas the latter were taken from virtual screening of library molecules with four selected targets (protein) receptors. The Physico-chemical properties of all selected compounds were computed using Osiris property explorer and DrugLiTo programs. The computed 2D and 3D descriptors belong to major types of descriptors family including topology, fingerprints, fragment-based fingerprints and other properties such as clogP, molecular weight and topological surface area. The distribution of clogP values in the compound set was mostly in positive values (<5.0) with few negative values demonstrating most cardio system drugs are well absorbed and permeable. Also, calculations of fragment-based drug-likeness highlighted other important properties such as mutagenic, tumorigenic, irritant capacity and reproductive effect. The computed descriptors of physico-chemical types are listed in Table 1.

Nowadays, the prediction of toxicity risks can be performed parallelly and easily on the webservers. We used the ADMET predictor to calculate toxicity risk parameters such as ADMET_Solubility, ADMET_Solubility_Level, ADMET_AlogP98, CMR_Discriminant_Score, FRC_Discriminant_Score, among others. These properties can also be classified before high risk or low-risk compounds based on the distribution of different parameter values (Table S2). However, we chose to classify compound sets by supplying the computed descriptors (both Physico-chemical and ADMET properties) as input to LDA based on actual drug status (cardiotoxic or non-toxic) rather than interpreting the data distribution of different properties used to create descriptors by the software.

Development of cardiotoxicity LDA classification model

Similar to multiple linear regression, discriminant functions (DF) are generated by calculating differences between groups on each of the independent variables using group mean and one-way ANOVA results data. Table 2 lists the results of test of equality of group means. All the variables attained Wilk’s λ in the range of 0.889 to 1.000 demonstrating the entire 27 variables nearly have similar discriminating potential between groups. The standardized and unstandardized canonical DF coefficients for each variable is given in Table 3. Further, the pooled within-group matrices revealed the intercorrelations were low. The Eigen value of 1.094 accounting 100% variance showing the discriminating ability of DF function (function = 1) with a canonical correlation of 0.723 between variables and groupings (Table S3). We were able to generate one DF with 100% cumulative proportion which forbids the generation of other possible DFs. Hereafter, we discuss the DF function = 1 for its ability to distinguish cardiotoxic and non-toxic groups. Wilk’s λ multivariate statistic is 0.478 which is close to 0 highlighting strong discriminating power of DF (Table S4). Additionally, the p-value of 0.01 obtained from Chi-square statistic rejected the null hypothesis stating no discrimination between DF’s canonical correlation and all smaller canonical correlations.

Table 2.

Group statistics and tests of equality of group means of the descriptors of LDA model.

Total properties	Mean	Std. Deviation	Valid N (list wise)		Wilks’ Lambda	F	df1	df2	Sig.
Total properties	Mean	Std. Deviation	Unweighted	Weighted	Wilks’ Lambda	F	df1	df2	Sig.
logP	2.46	3.36	91	91	0.998	0.196	1	89	0.659
MolecularDocking	7.77	1.21	91	91	0.999	0.052	1	89	0.82
ADMET_Solubility	−2.58	1.61	91	91	0.934	6.252	1	89	0.014
ADMET_Solubility_Level	3.20	0.90	91	91	0.958	3.896	1	89	0.051
ADMET_AlogP98	1.79	1.51	91	91	0.975	2.278	1	89	0.135
CMRDiscriminantScore	−3.24	9.58	91	91	0.986	1.283	1	89	0.26
FRCDiscriminantScore	−4.66	10.57	91	91	0.993	0.665	1	89	0.417
FRRDiscriminantScore	−6.88	19.72	91	91	1	0	1	89	0.985
FMCDiscriminantScore	−9.10	11.43	91	91	0.966	3.137	1	89	0.08
FMRDiscriminantScore	−9.58	20.68	91	91	1	0.023	1	89	0.879
DEVDiscriminantScore	0.56	12.60	91	91	1	0.003	1	89	0.959
RT31er95ConfidenceLimits	228.77	253.86	91	91	0.976	2.161	1	89	0.145
RT3ComputedRatOralLD50Log1Moles	2.52	0.93	91	91	0.951	4.617	1	89	0.034
MTFComputedMaximumToleratedDose	125.97	190.69	91	91	0.999	0.057	1	89	0.812
MTFComputedLog1mol	3.34	3.50	91	91	0.994	0.545	1	89	0.462
MTGComputedMaximumToleratedDose	165.64	230.14	91	91	0.994	0.537	1	89	0.465
MTG1er95ConfidenceLimits	84.93	153.38	91	91	0.999	0.057	1	89	0.811
MTGUpper95ConfidenceLimits	128.18	228.59	91	91	0.998	0.134	1	89	0.715
MTGComputedLog1mol	3.07	3.45	91	91	0.994	0.527	1	89	0.47
RILComputedLog1molh	2.27	3.41	91	91	0.982	1.675	1	89	0.199
OMNDiscriminantScore	3.82	14.58	91	91	0.943	5.343	1	89	0.023
FT3ComputedFatheadMin1wLC501Moles	5.56	2.12	91	91	0.889	11.155	1	89	0.001
VMcGowan	395.94	87.51	91	91	0.994	0.534	1	89	0.467
CYP1A2_Inh	−56.13	60.51	91	91	0.925	7.216	1	89	0.009
CYP1A2_Km	57.48	118.44	91	91	0.991	0.798	1	89	0.374
CYP1A2_Vmax	11.52	35.04	91	91	0.999	0.053	1	89	0.819
Perm_Cornea	735.80	3080.35	91	91	0.993	0.592	1	89	0.444

Table 3.

The coefficients of standardized and unstandardized canonical discriminant function (DF) of LDA model.

Sr. No.	Properties	Function
Sr. No.	Properties	Standardized	Unstandardized
1	logp	0.115	0.034
2	MolecularDocking	−0.046	−0.038
3	ADMET_Solubility	1.658	1.058
4	ADMET_Solubility_Level	−0.097	−0.109
5	ADMET_AlogP98	0.575	0.384
6	CMRDiscriminantScore	−0.467	−0.049
7	FRCDiscriminantScore	−0.287	−0.027
8	FRRDiscriminantScore	0.022	0.001
9	FMCDiscriminantScore	−0.111	−0.01
10	FMRDiscriminantScore	0.057	0.003
11	DEVDiscriminantScore	−0.128	−0.01
12	RT31er95ConfidenceLimits	−0.224	−0.001
13	RT3ComputedRatOralLD50Log1Moles	−0.066	−0.072
14	MTFComputedMaximumToleratedDose	−0.26	−0.001
15	MTFComputedLog1mol	−0.896	−0.255
16	MTGComputedMaximumToleratedDose	0.227	0.001
17	MTG1er95ConfidenceLimits	0.162	0.001
18	MTGUpper95ConfidenceLimits	−0.081	0
19	MTGComputedLog1mol	1.348	0.389
20	RILComputedLog1molh	0.035	0.01
21	OMNDiscriminantScore	−0.534	−0.037
22	FT3ComputedFatheadMin1wLC501Moles	1.03	0.511
23	VMcGowan	0.168	0.002
24	CYP1A2_Inh	−0.351	−0.006
25	CYP1A2_Km	−0.282	−0.002
26	CYP1A2_Vmax	0.191	0.005
27	Perm_Cornea	−0.041	0
	Constant		−1.465

The regression equation of DF is expressed as:

= - 1.465 + (0.034*logp) - (0.038*MolecularDocking) + (1.058*ADMET_Solubility) - (0.109*ADMET_Solubility_Level) + (0.384*ADMET_AlogP98) - (0.049*CMRDiscriminantScore) - (0.027*FRCDiscriminantScore) + (0.001*FRRDiscriminantScore) - (0.010*FMCDiscriminantScore) + (0.003*FMRDiscriminantScore) - (0.010*DEVDiscriminantScore) - (0.001*RT31er95ConfidenceLimits) - (0.072*RT3ComputedRatOralLD50Log1Moles) - (0.001*MTFComputedMaximumToleratedDose) - (0.255*MTFComputedLog1 mol)

+ (0.001*MTGComputedMaximumToleratedDose) + (0.001*MTG1er95ConfidenceLimits) + (0.000*MTGUpper95ConfidenceLimits) + (0.389*MTGComputedLog1mol) + (0.010*RILComputedLog1molh) - (0.037*OMNDiscriminantScore) + (0.511*FT3ComputedFatheadMin1wLC501Moles) + (0.002*VMcGowan) - (0.006*CYP1A2_Inh) - (0.002*CYP1A2_Km) + (0.005*CYP1A2_Vmax) + (0.000*Perm_Cornea)

The group centroids table revealed that the difference between the means of the DF scores by non-cardiotoxic group (drug status = 1) and cardiotoxic group (drug status = 2) is −2.545 (Table S6) representing both the groups were satisfactory separated by DF and have high-discrimination ability. The DF of LDA model correctly classified 91.7% of non-toxic compounds and 84.2% of cardiotoxic compounds (Table 4). Cross-validation of hold-out cases revealed 73.6% grouped cases were correctly classified.

Table 4.

LDA classification of drug statuses of compound set.

Classification Results^b,c
DrugStatus			Predicted Group Membership		Total
DrugStatus			1	2	Total
Original	Count	1	66	6	72
	Count	2	3	16	19
	%	1	91.7	8.3	100.0
	%	2	15.8	84.2	100.0
Cross-validated^a	Count	1	56	16	72
	Count	2	8	11	19
	%	1	77.8	22.2	100.0
	%	2	42.1	57.9	100.0

a. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case.

b. 90.1% of original grouped cases correctly classified.

c. 73.6% of cross-validated grouped cases correctly classified.

The cut-off points between two groups were predicted through the ROC curve. The DF calculated class along with the drug status (cardiotoxic/non-toxic) was supplied as input to ROC curve analysis. The true positive rate was defined as the correct identification of non-toxic compounds as non-toxic compounds by the LDA model based on the discriminant function (DF) values. Similarly, the true negative rate was defined as the correct identification of cardiotoxic compounds as cardiotoxic compounds (Table S6). We obtained the finest discriminatory ratio as area under the curve had attained 0.958 (Table S7) when the sensitivity reached 87.50% while specificity touched 94.74% in the ROC curve (cut-off value criterion: >0.1) (Table 5 and Figure 2). The positive likelihood rate (+LR) of 16.62 demonstrates very strong evidence in correctly classifying non-cardiotoxic compounds. According to this cut-off value, the greater class values showed more toxic while lesser value depicted non-toxic effects of any cardiotoxic drug from aforesaid targets (Table S8). We then explored whether there is any correlation between receptor-centric descriptor (Dock score) and ligand-centric descriptors (physico-chemical/ADMET properties). We found that the canonical correlation between Dock score and ADMET_AlogP (atom-based log P calculations) noticed in pooled-within groups table (not shown) was 0.461 while its counterpart ClogP (fragment-based log P calculations) secured 0.330. It was interesting to note both Dock score and AlogP are computed based on atom contributions which are additive in nature to come up with a single unique score. Both the descriptors provide positive values which are normally distributed and appears to provide apparent correlation.

Table 5.

Criterion values and coordinates of the ROC curve of the selected discriminant function (DF) of LDA model.

Criterion	Sensitivity	95% CI	Specificity	95% CI	+LR	95% CI	−LR	95% CI
≥−2.66	100.00	95.0–100.0	0.00	0.0–17.6	1.00	—	—	—
>−1	98.61	92.5–100.0	68.42	43.4–87.4	3.12	2.3–4.2	0.020	0.003–0.2
>0.1*	87.50	77.6–94.1	94.74	74.0–99.9	16.62	14.5–19.1	0.13	0.02–1.0
>1	65.28	53.1–76.1	100.00	82.4–100.0	—	—	0.35	—

Figure 2.

ROC curve analysis of the selected discriminant function (DF) of LDA model. The pink line represents the various data points related to sensitivity and specificity of the DF. The blue dotted lines indicate the cut-offs associated with each data point of the selected DF model. The straight diagonal line extending from the lower left corner to the upper right (dotted line) represents the no performance boundary.

We further examined how the selected LDA model performed in each drug classes. The classification upon antidepressants class achieved best performance in correctly classifying correct drug status (10 non-toxic compounds predicted as non-toxic and all toxic compounds as toxic). In the case of antihistamines, 20 non-cardiotoxic compounds are correctly identified as non-toxic boosting the sensitivity. Only 2 cardiotoxic antihistamines were correctly predicted as compounds with cardiotoxic potentials. Similar to antidepressants, 20 non-cardiotoxic beta-blockers were predicted as non-cardiotoxic whereas all cardiotoxic compounds were correctly identified as cardiotoxic. The LDA model correctly predicted 14 non-toxic antiarrhythmics while sub-standard prediction was made in classifying cardiotoxic compounds as toxic (2 compounds) and as non-toxic (2 compounds). Collectively, the presented LDA model achieved better performance in the sensitivity across four drug classes (Table 6). The LDA model will be available upon request.

Table 6.

Number of compounds classified by LDA model in each drug class.

Drug Class	Cardiotoxic compounds		Non-cardiotoxic compounds		Total
Drug Class	Correctly Predicted as Cardiotoxic	Incorrectly Predicted as Non-Cardiotoxic	Correctly Predicted as Cardiotoxic	Incorrectly Predicted as Non-Cardiotoxic	Total
Antidepressants	4	0	1	10	15
Antihistamines	2	3	2	20	27
Beta-blockers	6	0	3	20	29
Antiarrhythmics	2	2	2	14	20
Total	14	5	8	64	91

Conclusion

The presented LDA model strongly advocates the incorporation of both receptor-centric and ligand-centric descriptors in modeling cardiotoxicity. We obtained 90.1% correct classification of actual drug status and 73.6% correct classification in left-out samples during cross-validated. The ROC analysis showed AUC of 0.958 with best 87.50% sensitivity and 94.74% specificity. This good statistic upholds the need to extend this approach to multiple drug classes and its respective protein targets in LDA modeling. We found that both physico-chemical and ADMET properties contribute equally to discriminant function of LDA due to close values of Wilk’s lambda metric. Interestingly, Dock score of compound set achieved 46% correlation with AlogP property. The YASARA Vina scoring function partially distinguished dock poses of compounds with no common intermolecular interaction pattern with low scores, especially in the cases of beta-blockers and antiarrhythmics. The LDA model can be made more comprehensive by introducing more drug classes and targets and subsequently validate through in vitro testing.

Supplemental material

6-10-20-Suppliment - Development of cardiotoxicity model using ligand-centric and receptor-centric descriptors

6-10-20-Suppliment for Development of cardiotoxicity model using ligand-centric and receptor-centric descriptors by Chirag N Patel, Sivakumar Prasanth Kumar, Rakesh M Rawal, Manishkumar B Thaker and Himanshu A Pandya in Toxicology Research and Application

Footnotes

Acknowledgments

The authors gratefully acknowledge the Department of Botany, Bioinformatics and Climate Change Impacts Management, Gujarat University for providing an opportunity to access the bioinformatics research facilities.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was financially supported by the Financial Assistance Programme – Gujarat State Biotechnology Mission, Gujarat, India [grant number GSBTM/FAP/1443] and Department of Science and Technology [grant number GSBTM/MD/JDR/1409/2017-18].

ORCID iD

Himanshu A Pandya

Supplemental material

Supplemental material for this article is available online.

Abbreviations

References

Dong

Chen

. Cardiotoxicity of anticancer therapeutics. Front Cardiovasc Med 2018; 5: 9.

Wang

Hill

. Electrophysiological remodeling in heart failure. J Mol Cell Cardiol 2010; 48: 619–632.

Allessie

Ausma

Schotten

. Electrical, contractile and structural remodeling during atrial fibrillation. Cardiovasc Res 2002; 54: 230–246.

Coleman

Pontefract

. Adverse drug reactions. Clin Med 2016; 16: 481.

Lavan

Gallagher

. Predicting risk of adverse drug reactions in older adults. Ther Adv Drug Saf 2016; 7: 11–22.

Pichon

M-F

Cvitkovic

Hacene

, et al. Drug-induced cardiotoxicity studied by longitudinal B-type natriuretic peptide assays and radionuclide ventriculography. In vivo 2005; 19: 567–576.

Kelleni

Abdelbasset

Drug induced cardiotoxicity: mechanism, prevention and management. London, UK: IntechOpen, 2018.

Ray

Chung

Murray

, et al. Atypical antipsychotic drugs and the risk of sudden cardiac death. N Engl J Med 2009; 360: 225–235.

Zungu-Edmondson

Shults

Wong

C-M

, et al. Modulators of right ventricular apoptosis and contractility in a rat model of pulmonary hypertension. Cardiovasc Res 2016; 110: 30–39.

10.

Tognon

Nunes

NDS

Castro

FAD

. Apoptosis deregulation in myeloproliferative neoplasms. Einstein (Sao Paulo) 2013; 11: 540–544.

11.

Liu

Melchert

. In vitro cultured cardiomyocytes for evaluating cardiotoxicity. 2018.

12.

Lee

H-M

M-S

Kazmi

, et al. Computational determination of hERG-related cardiotoxicity of drug candidates. BMC Bioinform 2019; 20: 250.

13.

Aksoydan

Kantarcioglu

Erol

, et al. Structure-based design of hERG-neutral antihypertensive oxazalone and imidazolone derivatives. J Mol Graph Model 2018; 79: 103–117.

14.

Mackin

. Cardiac side effects of psychiatric drugs. Hum Psychopharm Clin Exp 2008; 23(1): S3–S14.

15.

Katritch

, et al. Structure of an agonist-bound human A2A adenosine receptor. Science 2011; 332: 322–327.

16.

Rasmussen

DeVree

Zou

, et al. Crystal structure of the β2 adrenergic receptor–Gs protein complex. Nature 2011; 477: 549–555.

17.

Ungvari

Tarantini

Yabluchanskiy

, et al. Potential adverse cardiovascular effects of treatment with fluoxetine and other selective serotonin reuptake inhibitors (SSRIs) in patients with geriatric depression: implications for atherogenesis and cerebromicrovascular dysregulation. Front Genet 2019; 10: 898.

18.

Kavlock

Ankley

Blancato

, et al. Computational toxicology—a state of the science mini review. Toxicol Sci 2007; 103: 14–27.

19.

Edwards

. Adverse drug effects and their clinical management: a personal view. Drug Saf 2014; 37: 383–390.

20.

Patel

Kumar

Rawal

, et al. A multiparametric organ toxicity predictor for drug discovery. Toxicol Mech Methods 2020; 30: 159–166.

21.

Saucerman

Brunton

Michailova

, et al. Modeling β-adrenergic control of cardiac myocyte contractility in silico. J Biol Chem 2003; 278: 47997–48003.

22.

Ekins

. Predicting undesirable drug interactions with promiscuous proteins in silico. Drug Discov Today 2004; 9: 276–285.

23.

Vedani

Dobler

Lill

. The challenge of predicting drug toxicity in silico. Basic Clin Pharmacol Toxicol 2006; 99: 195–208.

24.

Hansch

Fujita

p-σ-π Analysis. A method for the correlation of biological activity and chemical structure. J Am Chem Soc 1964; 86: 1616–1626.

25.

Kumar

Patel

Rawal

, et al. Energetic contributions of amino acid residues and its cross-talk to delineate ligand-binding mechanism. Proteins Structure Function Bioinform 2020; 88(9): 1207–1225.

26.

Ekins

. In silico approaches to predicting drug metabolism, toxicology and beyond. London: Portland Press Limited, 2003.

27.

Gao

Cheng

Tan

P-N

. A novel framework for incorporating labeled examples into anomaly detection. In: Proceedings of the 2006 SIAM international conference on data mining, 2006, pp. 594–598. SIAM.

28.

Zhao

Chellappa

Nandhakumar

. Empirical performance analysis of linear discriminant classifiers. In: 1998 Proceedings 1998 IEEE Computer Society conference on computer vision and pattern recognition, 1998, pp. 164–169. IEEE.

29.

Polat

Güneş

Arslan

. A cascade learning system for classification of diabetes disease: generalized discriminant analysis and least square support vector machine. Expert Syst Appl 2008; 34: 482–487.

30.

Backhaus

Erichson

Plinke

, et al. Faktorenanalyse. In: Multivariate Analysemethoden. Berlin: Springer, 1996, pp. 189–260.

31.

Grzybowski

Younger

JG.

Statistical methodology: III. Receiver operating characteristic (ROC) curves. Acad Emerg Med 1997; 4: 818–826.

32.

Wishart

Feunang

Guo

, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 2018; 46: D1074–D1082.

33.

Aw TT. Collection and analysis on data from Drugs. com. 2017.

34.

Health NIo. LiverTox: clinical and research information on drug-induced liver injury. 2017. Available at: https://livertoxnihgov.

35.

Mojaddami

Sakhteman

Fereidoonnezhad

, et al. Binding mode of triazole derivatives as aromatase inhibitors based on docking, protein ligand interaction fingerprinting, and molecular dynamics simulation studies. Res Pharm Sci 2017; 12: 21.

36.

Sander

. OSIRIS property explorer. Allschwil: Actelion Pharmaceuticals Ltd, 2001.

37.

Patel

Kumar

SKP

Pandya

, et al. Retrieval of promiscuous natural compounds using multiple targets docking strategy: A case study on kinase polypharmacology. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM), 2017, pp. 288–291. IEEE.

38.

Kerns

. Drug-like properties: concepts, structure design and methods from ADME to toxicity optimization. Cambridge: Academic press, 2015.

39.

Steinbeck

Han

Kuhn

, et al. The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. J Chem Inform Comput Sci 2003; 43: 493–500.

40.

Clark

Pickett

. Computational methods for the prediction of “drug-likeness.” Drug Discov Today 2000; 5: 49–58.

41.

Lipinski

. Lead-and drug-like compounds: the rule-of-five revolution. Drug Discov Today Technol 2004; 1: 337–341.

42.

Shimamura

Shiroishi

Weyand

, et al. Structure of the human histamine H1 receptor complex with doxepin. Nature 2011; 475: 65–70.

43.

Wang

Stout

Zhang

, et al. Contributions of ionic interactions and protein dynamics to cytochrome P450 2D6 (CYP2D6) substrate and inhibitor binding. J Biol Chem 2015; 290: 5092–5104.

44.

Kaur

Chamberlin

Poulos

, et al. Structure-based inhibitor design for evaluation of a CYP3A4 pharmacophore model. J Med Chem 2016; 59: 4210–4220.

45.

Shi

Xing

, et al. A rare mutation of β1-adrenergic receptor affects sleep/wake behaviors. Neuron 2019; 103: 1044–1055. e1047.

46.

Webb

Sali

. Protein structure modeling with MODELLER. In: Functional genomics. Berlin: Springer, 2017, pp. 39–54.

47.

Parmar

Patel

Highland

, et al. Pesticide target protein and phytochemical interactions—a computational study mitigating mosquito-vectors. In: Young Scientists’ Conference, India International Science Festival, 2015.

48.

Gopalakrishnan

Sowmiya

Sheik

, et al. Ramachandran plot on the web (2.0). Protein Pept Lett 2007; 14: 669–671.

49.

Land

Humble

. YASARA: a tool to obtain structural guidance in biocatalytic investigations. In: Protein engineering. Berlin: Springer, 2018, pp. 43–67.

50.

Patel

Georrge

Modi

, et al. Pharmacophore-based virtual screening of catechol-o-methyltransferase (COMT) inhibitors to combat Alzheimer’s disease. J Biomol Struct Dyn 2018; 36: 3938–3957.

51.

Patel

Narechania

. Targeting epidermal growth factor receptors inhibition in non-small-cell lung cancer: a computational approach. Mol Simul 2018; 44: 1478–1488.

52.

Krieger

Darden

Nabuurs

, et al. Making optimal use of empirical energy functions: force-field parameterization in crystal space. Proteins Structure Function Bioinform 2004; 57: 678–683.

53.

Krieger

Vriend

. New ways to boost molecular dynamics simulations. J Comput Chem 2015; 36: 996–1007.

54.

Cronin

Dearden

Walker

, et al. Quantitative structure-activity relationships for human health effects: commonalities with other endpoints. Environ Toxicol Chem 2003; 22: 1829–1843.

55.

Luan

Zhang

Zhao

, et al. Classification of the carcinogenicity of N-nitroso compounds based on support vector machines and linear discriminant analysis. Chem Res Toxicol 2005; 18: 198–203.

56.

Kachigan

. Statistical analysis: an interdisciplinary introduction to univariate & multivariate methods. Santa Fe: Radius Press, 1986.

57.

Yuan

Liang

, et al. Uncorrelated linear discriminant analysis (ULDA): a powerful tool for exploration of metabolomics data. Chemom Intell Lab Syst 2008; 93: 70–79.

58.

Yang

Hong

, et al. Discrimination of type 2 diabetic patients from healthy controls by using metabonomics method based on their serum fatty acid profiles. J Chromatogr B 2004; 813: 53–58.

59.

Verma

. Data analysis in management with SPSS software. Berlin: Springer Science & Business Media, 2012.

60.

Mateo

Bosch-Reig

. Classification of Spanish unifloral honeys by discriminant analysis of electrical conductivity, color, water content, sugars, and pH. J Agric Food Chem 1998; 46: 393–400.

61.

Lusted

. Decision-making studies in patient management. N Engl J Med 1971; 284: 416–424.

62.

Bruin

. Newtest: command to compute new test. UCLA: Statistical Consulting Group (2006), 2018.

63.

Urbina

Docampo

. Specific chemotherapy of Chagas disease: controversies and advances. Trends Parasitol 2003; 19: 495–501.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.68 MB