Abstract
A quantitative structure-activity relationship (QSAR) model has been developed between cytotoxic activity and structural properties by considering a data set of 119 podophyllotoxin analogs based on 2D and 3D structural descriptors. A systematic stepwise searching approach of zero tests, a missing value test, a simple correlation test, a multicollinearity test, and a genetic algorithm method of variable selection was used to generate the model. A statistically significant model (
Keywords
Introduction
P
QSAR is one of the most important methods in chemometrics, which gives information that is useful for drug design and medicinal chemistry. 6,7 It correlates the biological activity of the molecules to their physical or chemical parameters. 8,9 There are many examples available in literature in which QSAR models have been used successfully for the screening of compounds for biological activity. 10-13 Although comparative molecular field analyses (CoMFA) are statistically excellent and offer good predictive performance, they are inherently limited to the need to align with the database molecules correctly within 3D space. Nevertheless, especially for structurally diverse molecules, unambiguous 3D alignment to initiate the CoMFA process is still a difficult task. The determination of the “active” conformation that each compound will retain is a critical issue due to the unavailability of X-ray structure. We should have some knowledge or hypothesis regarding active conformations of the molecules under study as a prerequisite for structural alignment. Hence, the developed models based on CoMFA may not suit the drug design because of a false conformational hypothesis. However, we were motivated to explore possible alternatives that would use alignment free descriptors derived from 2D or 3D molecular topology and thus alleviate frequent ambiguity of structural alignment typical of 3D QSAR methods. Accordingly, in this QSAR study, we have applied E-state, electronic, structural, topological, quantum mechanics, and physicochemical-based descriptors, which can be calculated without structural alignments. The behavior of QSAR model is examined with a variety of statistical parameters, 14 and the contribution of various descriptors is analyzed.
Materials and Methods
Core structure of podophyllotoxin
The scaffold structure of the natural podophyllotoxin consists of 5 rings—namely, the A, B, C, D, and E rings (

The scaffold structure of podophyllotoxin.
Data set
A total of 119 podophyllotoxin analogs were used in the study and were taken from various sources belonging to different ring modifications. These molecules were divided randomly into 81 molecules in the training set and 38 molecules in the test set. The analogs included in
Podophyllotoxin Derivatives (Tetraline Lactones) with Cytotoxic Activities against the P-388 Cell Line
Podophyllotoxin Derivatives (Nonlactonic Tetralines) with Cytotoxic Activities against the P-388 Cell Line and New Proposed Structural Derivatives with Unknown Cytotoxic Activity
Podophyllotoxin Derivatives (Pyrazolignans and Isoxazolignans) with Cytotoxic Activities against the P-388 Cell Line and New Proposed Structural Derivatives with Unknown Cytotoxic Activity
Podophyllotoxin Derivatives (Lactones and Nonlactonic Naphthalene) with Cytotoxic Activities against the P-388 Cell Line
Aza-Podophyllotoxin Derivatives with Cytotoxic Activities against the P-388 Cell Line
To use the QSAR model developed in this study for prediction of tubulin polymerization inhibition (TPI), we used an external data set of 16 podophyllotoxin derivatives that is included in
Experimental IC50 Value for In Vitro Tubulin Polymerization Inhibition by Podophyllotoxin Analogs (External Validation Set)
Building of molecular structures
All these podophyllotoxin analogs were built from the scaffolds by different ring modification and substitution of functional groups, as mentioned in
Descriptor calculation
All the molecular descriptors such as E-state indices; log P; superpendentic index; structural, symmetrical, topological, lead likeness, electronic Wang-Ford atomic charge, and extended Hückel partial charge functions; bulk; moments; orbital energies; molecular connectivity indices; gravitational indices; hydrophobicity; steric and thermodynamic factors; and topological descriptors were calculated using ADME Model Builder software package (version 4.5).
20
These descriptors help differentiate the molecules mostly according to their size, degree of branching, flexibility, and overall shape. Some of the descriptors included in the study are listed and described in
List of Descriptors Used in the Study
Screening of descriptors and development of the QSAR model
A set of 372 molecular descriptors was calculated using the ADME Model Builder software package (version 4.5). A systematic search in the order of missing value test, zero test, correlation coefficient, multicollinearity, and genetic algorithm was performed to determine significant descriptors using the ADME Model Builder (version 4.5) software package. Any parameter that was not calculated (missing value) for any number of the compounds in the data set was rejected in the first step. Some of the descriptors were rejected because they contained a zero value for all the compounds (zero tests). To minimize the effect of collinearity and to avoid redundancy, a correlation matrix was developed with a cutoff value of 0.6 and the variables physically removed from the analysis that showed exact linear dependencies between subsets of the variables and multicollinearity (high multiple correlations between subsets of the variables). From the descriptors, the set of descriptors that would give the statistically best QSAR models was selected by using a genetic function approach implemented in the ADME Model Builder (version 4.5) software package. The genetic algorithm (GA) starts with the creation of a population of randomly generated parameter sets. The usage probability of a given parameter from the active set is 0.5 in any of the initial population sets. The sets are then compared according to their objective functions. The parameters set used for the GA includes mutation 0.1, crossover 0.9, population 300, number of generations 1000,
Validation of the QSAR model
The predictive capability of the QSAR equation was determined using the leave-one-out cross-validation method. The cross-validation regression coefficient (
where
where
where
where
Further statistical significance of the relationship between activity and the descriptors was checked by randomization test (Y-randomization) of the models. The Y column entries were scrambled and new QSAR models were developed using same set of variables as present in the unrandomized model. We have used a parameter,
This parameter,
To check the intercorrelation of descriptors, variance inflation factor (VIF) analysis was performed. The VIF value is calculated from 1/(1 −
Results and Discussion
The 119 active compounds with their acute cytotoxicity (IC50 values in µM) to the P-388 cell line were randomly divided into a training set of 81 compounds and a test set of 38 compounds. With the wide range of difference between the IC50 values and the large diversity in the structures, the combined data set of 81 molecules and 38 molecules is ideal as a training and test set, as both sets do not suffer from bias due to the similarity of the structures. The various molecular descriptors (372 in total) as described in
where
Statistical Assessment of Quantitative Structure-Activity Relationship (QSAR) Equations with Varying Number of Descriptors
Observed and Predicted Cytotoxic Activity to the P-388 Cell Line of the Training Set of Podophyllotoxin Derivatives
pIC50 = −log IC50.
Observed and Predicted Inhibitory Activity to the P-388 Cell Line of the Test Set of Podophyllotoxin Derivatives
pIC50 = −log IC50.

Relationship between predicted and experimental activities as per equation (1) of the training set compounds.

Relationship between predicted and experimental activities as per equation (1) of the test set compounds.
The intercorrelation of the descriptors used in the QSAR model (1) was very low (below 0.6), which is in conformity to the study that, for a statistically significant model, it is necessary that the descriptors involved in the equation should not be intercorrelated with each other. 14 To further check the intercorrelation of descriptors, VIF analysis was performed. In this model, the VIF values of these descriptors are 1.337 (SHDW5), 1.527 (DIP), 1.091 (V5CH), 1.143 (SNMN), 1.302 (L/B2), 1.091 (SRMX), and 1.727 (GEOM4), which are less than the threshold value of 10. 24,25
Satisfied with the robustness of the QSAR model developed using the training set, we have applied the QSAR model to an external data set of podophyllotoxin analogs constituting the test set. As the experimental values of IC50 for these inhibitors are already available, this set of molecules provides an excellent data set for testing the prediction power of the QSAR model for new ligands.
Podophyllotoxin is a clinically effective anticancer agent that represents perhaps the most significant addition to the pharmacopoeia of cancer chemotherapeutic agents in the past decade.
26
However, new findings related to its activities, mechanism of action, and pharmacological properties have been unveiled. The accepted mechanism of action of podophyllotoxin and its structural derivatives revealed that these molecules preferentially inhibit tubulin polymerization, which leads to arrest of the cell cycle in the metaphase and thus induces cytotoxicity to cancer as well as normal cells. Mechanistically, TPI is the cause, and the cytotoxicity is its response, and one can be predicted from the other. Therefore, a separate data set consisting of 16 analogs of podophyllotoxin (
Observed and Predicted Inhibitory Activity to Tubulin Polymerization of the Validation Set of Podophyllotoxin Derivatives
pIC50 = −log IC50.
Descriptors interpretation
Based on the developed QSAR model, it is observed that the important parameter that contributes to the potentiating activity is SHDW5, which is a set of geometrical descriptors. It indicates standardized shadow area 5, relating to the size and shape of molecules. It is calculated by projecting the molecular surface on 3 mutually perpendicular planes,
V5CH is the fifth-order chain molecular connectivity index; this descriptor contains information about the size and the degree of branching in a molecule. The descriptors SNMN and SRMX measure minimum nucleophilic superdelocalizability and maximum free radical superdelocalizability of the compounds (calculated using the Hückel calculation). Superdelocalizability is an indication of the electronic “richness” or “poorness” of a specific atom. Nucleophilic superdelocalizability looks at the concentration of unoccupied orbitals at each atom, and the radical superdelocalizability measures the concentration of all orbitals, occupied and unoccupied, at each atom. Occurrence of nucleophilic and radical superdelocalizabilities indicates that 2 reactivity mechanisms can occur in the data set and may be the possible causes of the cytotoxicity. L/B2 is the length-to-breadth ratio of compounds calculated by rotating the molecule in the
The descriptors used for the constructed QSAR model in this work encoded electronic, geometrical, and topological aspects of molecules. Appearances of these descriptors in the model reveal the role of electronic and steric interactions in inducing cytotoxicity to cancer cells.
Conclusion
We have compiled a virtual library of podophyllotoxin analogs built through structural modification of scaffold structure of natural podophyllotoxin. We have demonstrated that the QSAR model developed in this study can be applied to estimate the cytotoxic activity with a high level of accuracy for a diverse set of podophyllotoxin analogs. Using a combination of topological and electrotopological state indices, as well as electronic and thermodynamic descriptors of chemical structures, we have built several robust QSAR models with high values of
Footnotes
Acknowledgements
We are thankful to the support by the Department of Science and Technology, India under the project SR/FT/L-44/2005. We thank Scube Scientific Software for supplying ADME Model Builder for a trial period.
