Abstract
Drug designing costs as well as design of immunotherapeutic agents could be nearly halved through the involvement of computer-aided drug designing methods in discovery and research. The inter-disciplinary, time-, and money-consuming process of drug discovery is amended by the development of drug designing, the technique of creating or finding a molecule that can render stimulatory or inhibitory activity upon various biological organisms. Meanwhile, the advancements made within this scientific domain in the last couple of decades have significantly modified and affected the way new bioactive molecules have been produced by the pharmaceutical industry. In this regard, improvements made in hardware solutions and computational techniques along with their efficient integration with biological processes have revolutionized the in silico methods in speeding up the lead identification and optimization processes. In this review, we will discuss various methods of recent computer-aided drug designing techniques that forms the basis of modern day drug discovery projects.
Keywords
Introduction
There are seven basic steps in the drug discovery pipeline in order for a drug to reach the market from the initial idea: target selection; disease selection; identification of lead compound; optimization of lead; trial testing at preclinical level; trial testing at clinical level; and pharmaco-genomic optimization. In fact, repeated success of the last five steps is absolutely required. Both chemical synthesis and natural sources (microorganisms, plants, and animals) can provide the testing compounds. Problems such as existence of toxicity or carcinogenicity, low or absent activity, inefficiency, complexity of synthesis, and so on can render the compound to be rejected as perspectiveless. It is because of these reasons that only one out of 100,000 investigated compounds may ultimately be introduced into the market, thereby raising the average new drug development cost to a whooping ~2.6 billion dollars. 1 The major efforts in enhancing drug development efficiency have been targeted at phases of discovery and ligand optimization. A tremendous increase in the number of novel potential targets took place with the initiation of genetic decoding of different organisms, identification of molecular mechanisms of various diseases, proteomic investigation, and protein chemistry advancement. The field of drug discovery in the last few decades has seen a sort of revolution with the involvement and merger of computer technology and experimental processes and the emergence and upgradation of bioinformatics, to be included within the purview of rational drug designing approaches. This includes computer-based methods (computer-aided drug design [CADD]) and experimental procedures. 2 High throughput screening and combinatorial chemistry forms the major part of these experimental methods. 3 Drug designing based on computer methods relies on the theory that the activity of pharmacological compounds takes place through its interaction with macromolecular targets such as nucleic acids or proteins. Critical factors that facilitate such interaction involve steric complementarity with interacting surface molecule, formation of hydrogen bonds, hydrophobic interaction, and electrostatic forces.
CADD approaches have now been routinely employed in the identification of new inhibitors or stimulators via dynamic or molecular modeling, pharmacophore modeling, structural interaction fingerprints, docking, virtual screening, grid technologies, statistical learning methods, quantum chemical methods, combined molecular mechanic, and quantum chemical methods that utilize highly efficient and selective programs and large commercially available databases. This review discusses the recent developments in the field of CADD methods along with their potential applications.4,5
Identification of drug target
During the last two decades, medicinal chemistry has taken significant strides and made progress that aided the discovery of new drug targets. Drug target identification would be a high-cost and time-consuming affair if it depended on experimental procedures alone. Furthermore, systems-based approaches can be used for the identification of polypharmacology-based drug targets so as to ensure better efficiency targeting for more than one protein at a time. These approaches require to be supplemented with protein quality annotation both at the functional and structural levels, which are ultimately used for the identification of central proteins by constructing the interaction graphs. Depending on their expression profiles, structural features, and localization, the selection of the central hub proteins can then be done in combination for the best pair identification for polypharmacology.
Therefore, target discovery involves a lot of computational algorithms like the ones required for inhibitor design, identification of drug target interaction networks, classifying body fluids, multiple drug target prediction, pseudo dinucleotide composition identification based recombination spot identification, classification of drugs based on anatomical therapeutic chemical classification, carcinoma, and hepatic cirrhosis classification.5,6 Both drug development and medicinal chemistry research have gained significant insights from these computational approaches. 7 In prioritizing the drug targets, it is of the outmost importance to understand the location of a protein molecule. This information goes on to facilitate the knowledge of biological processes such as pathway analysis, protein-ligand interaction, and protein-protein interaction. Multiple software solutions have been designed over the years to predict the location of a specific protein within a particular compartment of the cell. The predictions also take into account the different cell types (eukaryotic, prokaryotic, bacterial, plant, or human). 8 Also, for the prediction of potential drug targets, algorithms for the prediction of virulence factors, proteases and their types, HIV cleavage sites in proteins, and antimicrobial peptides have been developed. 9 In addition, drug discovery is greatly contributed by the detailed studies of some well-known protein families such as kinases, nuclear receptors, G protein-coupled receptors (GPCR), and so on. Among the Food and Drug Administration (FDA)-approved drugs, 50% target the GPCR proteins while 13% target the nuclear receptors. Thus, in-depth knowledge can be gained by the proper characterization of this family of proteins that would significantly aid the understanding of drug side effects and ligand-protein interactions.10,11
Chemoinformatics – paving of the beaten path
Chemoinformatics, a new field with a long tradition, implies a computerized method in solving chemical problems. 12 Studies related to QSAR, chemical structure representation and searching, molecular modeling, synthesis design, and computer-aided structure elucidation have all been combined together to form a discipline of its own.13,14 Many areas of chemistry can benefit significantly from the chemoinformatics procedures that include drug design and analytical chemistry, while the solutions to many challenging chemical questions still remain to be produced. Chemoinformatics include topics like chemical reaction and compound representation, databases, data sources and data, chemical and physical data calculating methods, data analysis methods, as well as structure descriptors calculation. Thus, it has a very wide range of applications that include the type of structure needed to get the required property, the way of synthesis of the precise structure, knowledge of a reaction product, and so on. Chemoinformatics in its typical form can be applied to different chemistry areas that include retrieval and storage of chemical structures and related data. It enables managing the vast amount of generated data; prediction of chemical, physical, and biological properties; spectroscopic data based structural elucidation of compounds, design of organic synthesis, prediction of the products, and course of organic reactions, prediction of origin, quality, and age of investigated objects through the analysis of analytical chemistry data. 14
In addition, it aids in planning of chemical libraries, chemical libraries comparison, QSAR establishment, analysis and definition of structural diversity, high throughput data analysis, de novo ligand design, ligand-receptor docking, ADME-TOX modeling, analysis of biochemical pathways, and xenobiotics metabolism prediction. 13
Pharmacophore modeling
Drug discovery and designing often involves the CADD technique called pharmacophore modeling.15,16 To retrieve active and novel compounds, pharmacophore modeling along with structure based design (SBD) has been employed to screen three-dimensional (3D) databases. In contrast to SBD, pharmacophore modeling is usually used when information like 3D structure is not readily available. 17 However, the use of pharmacophore modeling is not necessarily ruled out when such target structures are available. In fact, a researcher aided with structural information can develop better pharmacophore based models by utilizing the information on a 3D receptor active site. A number of articles have come up to the regard that provides information about the recent advances and methodology of pharmacophore modeling. Pharmacophore modeling in case of predictive ADME/Tox is used in the development of quantitative models besides screening 3D databases.4,16
This will ultimately help in the prediction of active compounds which are otherwise not present in the training set. 10 Pharmacophore modeling has found application in 3D-QSAR, as well, wherein the pharmacophore 3D relationship features have been used alongside different statistical treatments as descriptors. Enhancement to 3D-QSAR process has been suggested with the recently developed conformationally sampled pharmacophore modeling. 17
In a related method, the anchor-GRIND method, biological and chemical knowledge is efficiently combined for the compounds whose molecular descriptors are alignment-independent. The gap between GRID independent descriptors and standard 3D-QSAR can presumably be filled by the anchor-GRIND method that can be particularly helpful when the diverse substituents and ligands share a common scaffold. The alignment-introduced bias can be avoided by the anchor-GRIND descriptors, which are easy to interpret, statistically sound and can discriminate between low- and high-affinity inhibitors. 18
Chemical clustering
In computational chemistry, clustering of chemicals has a critical role. Chemical clustering has been utilized to identify a common scaffold, understanding a specific functional group behavior and identifying outliers in a given dataset, and so on. Chemical clustering can be based on a number of different methods such as a maximum common sub-structure, graph properties, and binary fingerprints.19,20 This has led to the development of a number of open source as well as commercial software. Clustering by ChemBioServer, a free web-based application is done by two methods, affinity propagation (AP) as well as hierarchical clustering algorithm. For a particular cluster, the cluster is displayed in an attractive graphical manner by the web server besides the representative scaffold. The server performs the analysis and screening of compounds based upon geometry, vdW energy, toxic/undesired moieties, and physico-chemical properties. Clustering, comparing, and searching for chemicals is performed by the ChemMine tools. Chemical clustering is done via three clustering algorithms: multi-dimensional scaling, binning, and hierarchically. In addition to these, numerical clustering data are provided, as well. The web server also possesses an in-built property calculation module. However, it faces the drawback of not being able to compare more than two chemicals at a time and to find a representative cluster chemical. ChemMineR is an open source tool that can cluster entire compound libraries and visualize compound structure and clustering results thereby providing a number of different options for chemical clustering of multiple compounds.9,21,22 JKlustor Suite by ChemAxon is employed to calculate diversity, search similarity, compare structures, and perform clustering of chemical compounds based on molecular descriptors.
Molecular docking
The technique of molecular docking is widely used in lead optimization and hit identification. 23 A large number of potential structures are generated by a docking algorithm and the most favored structure geometry is finally selected based on scoring function. Docking can be further subdivided into two categories depending on the proteins’ interacting partners: protein-ligand docking (to study the binding of small molecules to protein) and protein-protein docking (to study interaction between two protein molecules). Protein-ligand docking can be studied with the help of AutoDock, an open source molecular modeling software. Interaction energy grid maps are precalculated by the software, AutoGrid, for various atom types and then the ligand is docked with the aid of AutoDock in the predefined genetic algorithm-based grid. 24
Another anchor and grow based docking program is Dock that can be employed for both flexible and rigid body docking. The latest version of Dock with the addition of new features such as receptor flexibility and solvation enhanced force field scoring can predict binding poses. 25 A new open source program, AutoDock Vina is used for virtual screening and protein-ligand docking. It is fast and has improved upon AutoDock version 4 for accuracy of predicting the binding modes. 24
DNA-protein docking can be performed with the help of an academically free program, Hex, that can be used for protein-ligand docking. HADDOCK is a docking software that makes use of bioinformatic predictions, mutagenesis data, and biophysical interaction data for protein-protein docking; the software has also found application in protein-ligand docking. Feature Table Definition (FTDock) based on Fourier correlation algorithm performs rotational and translational search between two molecules in possible directions and as such is used for rigid body docking.24–27
A combination of molecular dynamics simulations and docking
The issue of hitting a moving target in current drug designing approaches mainly involves protein flexibility and as such this aspect has been incorporated by most of the advanced database mining and CADD methods. 27 These techniques, besides providing proper results, critically analyze the growing amount of structural genomics information before applying them. The pharmaceutical and biotechnology industries extensively use a number of different docking programs. The proteins are assumed to be rigid by most of the docking algorithms owing to the high demand that flexibility implicates. 28 However, the ligands are considered flexible molecules by a number of current docking programs. Forcefield-based methods are used by the docking algorithms such as Monte Carlo simulations or molecular dynamics that allow for movements of both targets and ligands. ICM and QXP are some of the docking programs that are based on molecular dynamics. Evolutionary docking methods include evolutionary programming, Tabu search, and genetic algorithms. Gaussian functions are used by the shape complementary-based methods that fit the negative protein image with the ligand shape. A variety of different scoring functions for the estimation of binding affinity of fragments or novel structures within the receptor pocket at precise position have been developed in the past decade. Molecular mechanics energy functions based on force field scoring functions are used by these scoring functions that uses the sum of van der Waals and electrostatic interactions as an approximation for the ligand-protein complex binding free energy. The aim of docking of protein-ligand is the prediction and ranking of the structure(s) that arise from the target protein (with a known 3D structure) and given ligand association. 27 Ligand-macromolecule complexes can be more reliably predicted by the combination of accurate (although costly) molecular dynamics techniques and fast docking protocols. This combinatorial idea is supported by their complementary strengths. Within a very short span of time, vast conformational space is explored by the docking simulations that allows large drug-like compound collections to be scrutinized at a reasonable cost. Both protein and ligand flexibility can be treated by molecular dynamics based docking protocols. Explicit water molecule effects in molecular dynamics simulations can be directly investigated. It also helps in obtaining accurate binding free energies. 29 Another important thing to consider in computational drug designing is the presence of some cryptic drug binding site within the protein receptor besides considering inhibitor docking and explicit solvent molecular dynamics. Alternate protein receptor binding sites are revealed by many structural studies. A significant challenge to CADD is represented by these situations, as they are presented only during the binding of an inhibitor to the receptor protein. 28
New binding sites are created by the side chain movement that is identified repeatedly and successfully in protein explicit solvent molecular dynamic simulations. During molecular dynamics, the different structural snapshots generated based on ligand-docking calculations indicate that the sampled conformations can competently bind the inhibitor with favorable docked energies as compared to other positions. Docking research studies involving directed inhibitor-binding site is suggestive of the possibility for developing hybrid inhibitors that can simultaneously target the cryptic and regular binding sites. Protein receptor cryptic binding sites can thus be efficiently predicted by molecular dynamics simulations before the initiation of their experimental discovery and confirmation.
Absorption, distribution, metabolism, and excretion (ADMET) techniques
While developing a computational model, there is increasing awareness in recent years for the prediction of small chemical compounds’ ADMET properties that stands for absorption, distribution, metabolism, excretion, and toxicity. 30 A number of different in silico methods for evaluating and fast screening small molecule ADMET properties in the past have been developed dependent on simple empirical rules. The full in vivo system complexity cannot be evaluated by these rules. However, they can help in decision-making and provide valuable information. A chemical’s toxic potential can be estimated with the aid of the user friendly, open source software Toxtree based on the decision tree method. PASS (Prediction of Activity Spectra for Substances) can predict nearly 1244 types of biological activity that includes pharmacological effects, toxic and adverse effects, influence on gene expression, mechanisms of action, interaction with transporters and metabolic enzymes, and so on. Therefore, within a very short period of time, large compound database can be easily screened. The most comprehensive, manually curated database for ADMET profiles of diverse chemical compounds is the AdmetSAR. Around 50 ADMET endpoints could be predicted by admetSAR besides the usual database search, through the ADMET-Stimulator, chemoinformatics-based toolbox, which integrates predictive and high quality QSAR models. In silico screening of drug candidate ADMET profiles and environmental chemicals is also provided by this web server. Chemical structures can be analyzed by the OncoLogicTM application in order to determine the possibility of it causing cancer. 31 The empirical rules of the application are based on the incorporation of the knowledge of cancer-causing ability of chemicals in humans and animals and analysis of structure-activity relationship (SAR), thereby mimicking human expert decision logics. Toxicological and physico-chemical molecular properties can be predicted by the freely available webserver OSIRIS Property Explorer. However, in the course of designing the pharmaceutically active compounds, it needs to be optimized. Besides, this tool can calculate the important physico-chemical properties and critical considerations like drug score, drug likeness, and so on. The freely available web server, DrugLogit, is able to predict whether a compound can act as non-drug or drug, which depends on readily available and simple molecular properties of compounds.2,15,28
Strategies undertaken
Before the initiation of a CADD project, the drug target’s detailed 3D structure should be the primary consideration in general. New lead compounds can be generated either through a structure based (docking, de novo ligand design) or ligand-based (pharmacophore, CoMFA, QSAR) approach, that requires evaluation for the best candidate selection followed by synthesis and purchase. Subsequently, the leads are tested for their activity and via an iterative process fed back into CADD. However, the strict distinction between structure- and ligand-based CADD methods faces a number of drawbacks. 32 The functional groups 3D structural arrangement on a scaffold is conserved by most of the ligand-based methods. This can be regarded as crucial in maintaining the existing ligand activity that precludes novel ligand discovery which is undertaken with the target protein’s different interactions. The evaluation through docking methods of the induced fit for both the protein and ligand makes it computationally more expensive. Often, the large scale changes brought about by the ligand-protein binding get ignored. Thus, the structure based methods face limitations with the detailed target structure availability with and without ligands linked to it, ideally in different conformations. In an exciting prospect, the integration between the structure- and ligand-based CADD methods have been proposed to present distinct natural system facets. Therefore, in this approach, in a particular drug design, all the available information can be applied in an objective and quantitative way. To produce structure based pharmacophores, crystal structure complexes are used by the integrated CADD methodologies. This represents those features of the ligand that are intricately involved with target protein interaction along with the ligand surrounding space that the protein occupies, allowing information regarding the relevant interactions and binding cavity size to be obtained. The protein-ligand complexes that can be regarded as a pharmacophore thus obtain a ‘superligand’. 32 Exploring energy landscapes in the field of structural biology is a major challenge for problems pertaining to protein folding, aggregation, and dynamics. To accelerate conformational sampling as well as enhance the knowledge of structure, kinetics and thermodynamics, reduction of degrees of freedom number to move to internal coordinate space and better Cartesian coordinate based sampling methods development have been followed.
Consequently, for protein energy landscape sampling, there is considerable interest in internal coordinate space activated methods. For all-atom protein conformations sampling, Artist is one of the first applications that utilized an activated method in internal coordinate space. Internal coordinate space trajectories involving activation-relation technique are markedly different from those of the other internal coordinate based appliances that aim at refining or folding protein structures. In this regard, it should be mentioned that crossing and identifying well defined energy minima connecting saddle points result in conformational changes. This is an efficient method in both densely and sparsely packed environments for exploring conformational space thereby offering CADD application perspectives. As for instance, resistance in case of infectious diseases is rendered primarily through gene mutations in the infectious organisms thereby altering their interacting profile with the target protein. Resistance mutation prediction and mechanistic studies of a number of different protein structure- and sequence-based computer methodologies have now been undertaken. Therefore, the protein drug resistance mutation prediction has come up as an important aspect of the drug designing approach and is of particular relevance in the field of immunotherapeutics.33–35 The 3D drug target structure availability which is important in disease pathology has led to the utilization of a structure-based approach which aides in evaluation of salvations, molecular interactions, and drug-protein binding dynamic properties along with their potential correlation with resistance providing mutation. Volume-based binding site fitness models and structure-derived binding energies have also been used. Prediction of resistance mutation is also done with the help of sequences since structural information is often not available. Statistical learning methods like decision trees, SVM, and neural networks are also useful in the prediction of resistance providing mutations.
Major projects
In addition to the availability of an increasing number of open source software and tools, there are organizations that work in collaboration with industry or public partner(s) for the development of medicines at affordable costs for the otherwise neglected diseases like malaria, tuberculosis (TB), and so on. An example of such collaborative, non-profit, open source initiative is DNDi (Drugs for Neglected Diseases Initiative) that aims at the development of newer treatment strategies against some of the most neglected dreaded diseases, with a major thrust on sleeping sickness and malaria. Until now, the initiative has been successful in providing one compound in the treatment against sleeping sickness and two compounds against malaria. Another similar non-profit organization is IDRI (Infectious Disease Research Institute) that focuses mainly on infectious diseases such as malaria, TB, and so on. The major goals of these organizations are diagnosis, treatment, and prevention of infectious diseases. Their TB vaccine developed in collaboration with GlaxoSmithKline (GSK) is now in clinical trials (Phase I). For the development of in silico toxicology models, the collaborative project OpenTox was undertaken that could find application in predictive toxicology. The initial OpenTox framework was designed and built up in collaborative efforts between different government research groups, universities, and enterprises. Open source software development is the base behind Blue Obelisk that promotes reusable chemistry which is used by diverse Internet groups. The three primary areas involved here are: (1) open data; (2) open standard; and (3) open source. OSDD (Open source for drug discovery) for the field of drug discovery is a translational platform, connecting research organizations, clinician, experimental biologists, and informaticians for providing affordable medicines against malaria and TB. The basis for all such initiatives has been the computational method of drug designing.
Conclusions
CADD methods for the identification of potential inhibitors or stimulators through dynamic, molecular or pharmacophore modeling, molecular mechanic methods, QM and combined QM, grid technologies, structural interaction fingerprints, docking, statistical learning methods, and virtual screening have been used to develop highly efficient and selective programs and large commercially available databases. Based on the different CADD techniques discussed here, novel drugs have been designed against a number of diseases including cancer, malaria, TB, sleeping sickness, neurodegenerative diseases, AIDS, and so on.31,36 Greater refinement of these CADD strategies can only enhance the efficiency of these methods and further reduce the costs of drug development.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The Heilongjiang University of Science and Technology Young Researcher Training Plan (Q20120104) financially supported this work.
