Sage Journals: Discover world-class research

Abstract

Background

Modeling protein-protein interactions (PPIs) using docking algorithms is useful for understanding biomolecular interactions and mechanisms. Typically, a docking algorithm generates a large number of docking poses, and it is often challenging to select the best native-like pose. A further challenge is to recognize key residues, termed as hotspots, at protein-protein interfaces, which contribute more in stabilizing a protein-protein interface.

Results

We had earlier developed a computer algorithm, called PPCheck, which ascribes pseudoenergies to measure the strength of PPIs. Native-like poses could be successfully identified in 27 out of 30 test cases, when applied on a separate set of decoys that were generated using FRODOCK. PPCheck, along with conservation and accessibility scores, was able to differentiate ‘native-like and non-native-like poses from 1883 decoys of Critical Assessment of Prediction of Interactions (CAPRI) targets with an accuracy of 60%. PPCheck was trained on a 10-fold mixed dataset and tested on a 10-fold mixed test set for hotspot prediction. We obtain an accuracy of 72%, which is in par with other methods, and a sensitivity of 59%, which is better than most existing methods available for hotspot prediction that uses similar datasets. Other relevant tests suggest that PPCheck can also be reliably used to identify conserved residues in a protein and to perform computational alanine scanning.

Conclusions

PPCheck webserver can be successfully used to differentiate native-like and non-native-like docking poses, as generated by docking algorithms. The webserver can also be a convenient platform for calculating residue conservation, for performing computational alanine scanning, and for predicting protein-protein interface hotspots. While PPCheck can differentiate the generated decoys into native-like and non-native-like decoys with a fairly good accuracy, the results improve dramatically when features like conservation and accessibility are included. The method can be successfully used in ranking/scoring the decoys, as obtained from docking algorithms.

Keywords

algorithm protein-protein interactions protein hotspots computational alanine scanning residue conservation protein-protein docking

Background

Protein-protein interactions (PPIs) are critical for the normal functioning of a cell. Proteins interact with one another and carry out biological processes such as signal transduction, gene regulation, and immune response. Since interactions of two proteins often result in one or more biological processes, it is important to gain knowledge on interacting partners of proteins and hence their functions. We now have several databases^1–3 that record PPI data from high-throughput experimental methods for thousands of proteins.

Protein-protein docking, the method of assembling two identical/different proteins together using physics-based computational algorithms, is still a challenging problem. Although we now have the methods that can explore all the possible rigid-body conformational states for the two proteins in a bound state, selection and ranking of the native-like pose is still a major problem. It is well understood that only when two proteins bind with one another in the correct conformation, they perform the designated function, and hence, selecting the native-like pose assumes central importance.

Several methods have been developed both for sampling the conformational space of the two proteins, which bind with one another, and to select the best native-like pose from the pool of generated complexes. Many of these algorithms perform grid-based searches to sample the conformational space. These algorithms inherently use shape/size complementarity, desolvation, or electrostatic interactions,^4,5 physical force fields,^6,7 empirical functions,^8–10 and knowledge-based potentials derived from existing determined structures^11–14 to dock the two proteins. Scoring of the docked poses is generally performed using either residue-level potentials or atomic potentials. While most residue-level potentials are simple to construct, easy to use, and are computationally very fast, atomic potentials are more accurate and also computationally more demanding. Several methods have been developed to computationally identify and/or predict the interacting partners for the proteins. These methods predict/identify interactions using features such as hydrophobic patch size,¹⁵ evolutionary-conserved positions,^15,16 position-specific sequence profiles and residue neighbor list,¹⁷ surface accessibility,^17–19 distance,^20% structural similarity,¹⁶ secondary structure contributions,^21% amino-acid composition,^21,22 dipeptide composition, and biochemical tripeptide composition.²² PISA,²³ one of the most popular prediction tools for predicting the stability of a complex, uses several features such as free energy of formation, solvation energy gain, interface area, hydrogen bonds, salt-bridges across the interface, and hydrophobic specificity.

Earlier, in our lab, we developed COILCHECK²⁴ and COILCHECK+,²⁵ which can be used for structural analyses and validation of special class of PPIs, namely, in coiled-coils. PPCheck is an improvised version of these tools with many significant new features (like consideration of only interface residues in calculation of normalized energy per residue, inclusion of interface waters in hydrogen bond energy calculations, and implementation of an optimum distance cutoff” that can be used in a generalized way for calculating electrostatic interactions). PPCheck can be used to analyze diverse set of protein-protein interfaces, where simple distance criteria are employed to screen for interface residues followed by a quantitative view of the strength of interactions at the interface. PPCheck has been applied on a benchmark dataset of protein-protein complexes and standard values for the number of interface residues and the strength of PPIs had been obtained earlier.²⁶ PPCheck can be used to analyze a set of docking decoys to recognize the native-like and the non-native-like poses. It uses a combination of energy, conservation, and accessibility scores to rank the models that are generated by the docking algorithm(s). When applied, separately, on a dataset of 30 dimers and decoys from six targets of Critical Assessment of Prediction of Interactions (CAPRI),^27–31 it showed promising results in differentiating the native-like and the non-native-like poses.

Experimental alanine scanning mutagenesis is time consuming and an expensive way of finding out the structurally important residues. FoldX³² is one of the popular webservers that is used for prediction of important interactions that provide stability to the protein complexes, but it is not very accurate. Thus, there is a necessity for a new and powerful tool that can reliably predict the changes in binding energy of the protein complex when one of the residues is mutated to alanine. When applied on a set of 40 mutations from experimental studies, PPCheck performed well in calculating the changes in binding free energy of the complexes.

Hotspots are the interface residues that contribute maximally toward the stability of the complex, and when mutated to alanine, they impart an appreciable decrease in binding strength (difference of 2 kcal/mol or more in the binding energy). They are generally seen to exist in clusters called hot regions³³ and are more conserved than other interface residues. Protein hotspots, apart from providing stability to the complex, also contribute to the specificity at the binding sites. Studies have also shown that hotspots mainly remain buried in the interface.³³ Only a few databases, such as Alanine Scanning Energetics database (ASEdb)³⁴ and Binding Interface Database (BID),³⁵ contain information about a handful of experimentally determined hotspots, and hence, there is a need to develop computational tools to predict them. Several methods employ energy-based models,^36,37 knowledge-based models,^38–42 and molecular dynamics-based models.^43–45 Graph theoretical approaches have also been applied to identify and analyze protein hotspots.⁴⁶ PPCheck attributes pseudoenergies to protein-protein interface as a sum of non-covalent interaction energies (van der Waals, electrostatic, and hydrogen bond energies). Graph theoretical parameters, such as degree or extent of spatial residue interaction (ESRI) (an alternative term used in the present study for simplicity) for all the interface residues, are employed as features to predict hotspots. PPCheck reported high accuracy on the test dataset when compared with other existing methods of hotspot prediction.

PPCheck is freely available as a webserver at http://caps.ncbs.res.in/ppcheck/.

Methods

Identification of non-covalent interactions

Simple distance criteria are employed for the preliminary identification of non-covalent interactions, such as van der Waals, electrostatic, and hydrogen bonding. The respective energies are calculated using standard force fields as described in the following.

If the atoms of amino acids from two neighboring chains come within a distance of 7Å, then they are considered to be interacting and contribute to van der Waals interaction energy.

Van der Waals interaction energy is calculated as

V = 4 . 184 \times \sqrt{E_{i} \times E} [{(\frac{R_{i} + R_{j}}{r})}^{12} - 2 {(\frac{R_{i} + R_{j}}{r})}^{6}] k J m o l^{- 1}

(1)

V is the van der Waals energy; R_i and Rj are the van der Waals radii for the atoms i and j, respectively; E is the van der Waals well depth; and r is the distance between the atoms.

Electrostatic interactions have been reported, and the corresponding energies are calculated using CHARMM package,⁴⁷ if the charged residues are within or equal to an optimum distance cutoff of 10Å. Coulomb's equation was used to quantify these interactions as follows:

v = 4.184 (\frac{q_{2} \times q_{2}}{D \times r}) \times 332 k J m o l^{- 1}

(2)

E is the electrostatic energy; q₁ and q₂ are partial atomic charges for the charged amino acids as obtained from CHARMM package; r is the distance between atoms; and D is the distance-dependent dielectric constant (D = 2r).

Hydrogen bonds are identified, and the corresponding energy is calculated using Kabsch and Sander's equation as used in the DSSP program as follows:

E = 4.184 \times q_{1} \times q_{2} (\frac{1}{r_{o n}} + \frac{1}{r_{C h}} - \frac{1}{r_{O h}} - \frac{1}{r_{c n}}) \times f k J m o l^{- 1}

(3)

where q₁ = 0.42e, q₂ = 0.20e, and f = 332, and partial charges on the C, O (+q₁, -q₁) and N, H (-q₂, q₂) atoms.

Water molecules, when present at the interface, are considered when they form bridging hydrogen bonds with amino acids from two interacting protein chains. The single point charge model of water is considered,⁴⁸ and hence, the values of the charges are chosen as follows for the water-amino acid interactions in the Kabsch and Sander's equation⁴⁹:

q₁ = 0.42e, q₂ = 0.41e, when water acts as hydrogen bond donor.

q₁ = 0.82e, q₂ = 0.20e, when water acts as hydrogen bond acceptor.

All the non-covalent interaction energies are summed up to total energy, and the ratio of total energy to the number of interface residues is termed as normalized energy per residue.

PPCheck, like COILCHECK,²⁴ also reports residues involved in hydrophobic, strong electrostatic (salt-bridges) interactions, and short contacts, based on distance between specific atoms and nature of amino acids. All the hydrophobic amino acids, such as Leu, Ile, Val, Trp, Phe, and Tyr, are reported for hydrophobic interactions, if they are found within or equal to a C^β-C^β distance of 7Å. If oppositely charged amino acids are observed within or equal to a C^β-C^β distance of 4Å, then they are considered and reported as potential salt-bridges. Atoms are reported to be engaged in short contacts, if their spatial distance at the interface is lesser than the allowed van der Waals distance.

Short contacts are calculated as

D = r - (R - 0.40)

(4) where R is the sum of van der Waals radii of the two atoms and r is the distance between the atoms.

Implementation

PPCheck has been developed using a combination of HTML, PERL, and PHP. The webserver works fine on all the browsers and the platforms. It takes ~22 seconds to identify interactions between two protein chains having ~150 amino acids each.

Selection of right cutoff for electrostatic interactions

Electrostatic interactions are long-range forces. In order to select the right cutoff for these kinds of interactions, the number of charged residues at the interfaces with C^β-C^β distance within various cutoff distances (7Å, 8Å, 10Å, 12Å, and 15Å) was calculated in the training dataset.

Dataset for prediction of native-like docking pose

In an earlier study,²⁶ PPCheck was applied on 270 non-redundant, well-characterized, and high-quality protein-protein interfaces where crystal structures were determined with resolution better than 2.5Å. It was observed that in most of the stable PPIs, the number of residues at the interface ranges from 51 to 150 and the normalized energy per residues is better than -2 kJ/mol (ie, less than -2 kJ/mol). These values were used as standardized criteria to distinguish the native-like docking pose from the non-native-like ones.

As studied earlier,⁵⁰ a set of six CAPRI targets (T24, T25, T26, T29, T32, and T36) with a total of 1883 decoys (best-predicted models by CAPRI participants) was collected from the CAPRI website maintained at the European Bioinformatics Institute (EBI) (http://www.ebi.ac.uk/msd-srv/capri/capri.html). In all, 132 of 1883 decoys were deemed as near-native-like models.

Residue conservation as an additional parameter with PPCheck to predict accuracy of native-like docking decoys

To improve the performance of PPCheck in its ability to differentiate native-like docking poses from the non-native-like ones, residue conservation and solvent accessibility were used as additional features. For finding out the extent of residue conservation of a protein, homologous sequences were identified using Position-Specific Iterative (PSI)-BLAST,⁵¹ against non-redundant (NR)-database (March 2013) with one iteration (Blastp), at an e-value cutoff of 10^-2. The resultant hits are collected and clustered at 45% sequence identity using CD-HIT⁵² (word size = 2). Multiple sequence alignment was then performed using ClustalW,⁵³ and the extent of residue conservation at each position was scored using structure-based alignment matrix created by Johnson and Overington matrix.⁵⁴ The conservation score for each residue is calculated as

C_{i} = \frac{\sum_{j = 1}^{n} (S_{ab})}{n}

(5) where C_i is the total conservation score for the residue at the ith position, a is the residue type present in the query sequence at the ith position in the multiple sequence alignment, b is the residue type present in the jth homologous sequence at the ith position in the multiple sequence alignment, S_ab is the amino-acid conservation score from structure-based matrix when residue type a in query sequence is substituted by residue type b in homologous sequence at the ith position, and n is the number of homologs for the query sequence in the multiple sequence alignment.

C_{n o r m_{i}} = \frac{C_{i}}{100}

(6)

Conservation score for each residue was further normalized to a range between 0 and 1 by dividing it by 100 (maximum amino-acid substitution score in structure-based matrix is 100 for cysteine-cysteine substitution).

Solvent accessibility is calculated using protein surface accessibility (PSA) module of JOY⁵⁵ program. A residue is treated as exposed if the relative accessibility as compared to a model tripeptide is more than 7%. If the relative accessibility of a residue is less than 7%, then it is considered as buried.

PPCheck was applied on an earlier studied dataset²⁶ of well characterized 270 protein-protein complexes in order to select the optimum values of solvent accessibility and number of conserved residues (conservation score > 0.65) present at the interface, which can help in clear differentiation of native-like poses from the non-native-like ones. It was observed that more than 75% of the protein-protein interfaces (209/270 protein-protein interfaces) (Supplementary File 1) fell either in strong interactor or medium interactor categories. An interface was termed as strong interactor type if both the interacting chains had 10 or more exposed-conserved residues in their monomeric form. However, if one of the chains at the interface had 5 or more exposed-conserved residues in the monomeric form, while its interacting partner had 10 or more exposed conserved residues in the monomeric form, then such a complex was termed as medium interactor type. Since majority (209/270 protein-protein interfaces) of the complexes belonged to these (strong interactor or medium interactor) types, they were treated as gold standards for selecting/predicting the native-like docking pose.

Studies on an additional dataset for evaluating the effectiveness of methodology

In order to assess the prediction accuracy of PPCheck, a non-redundant set of 30 dimers (15 homodimers and 15 heterodimers) was selected, as studied earlier,⁵⁶ to constitute the current test dataset. For the 30 dimers, the two interacting chains were separated and then allowed to dock using FRODOCK.⁵⁷ For all the generated docked poses for each of the 30 pairs of chains (30 complexes), PPCheck was applied and its ability to predict best native-like docking pose was evaluated. Top-5, top-10, and top-20 models ranked by PPCheck were compared with the already available native pose (in PDB) with respect to (i) root mean square deviation (rmsd) of C-alpha atoms, as obtained from MMalign,⁵⁸ and (ii) percentage of common interface residues (pcir) as observed in the native pose.

Similarly, rmsd values and pcir were calculated and compared between the top-1, top-5, top-10, and top-20 models of the test dataset, when conservation and solvent accessibility scores were used to assist PPCheck in differentiating the native-like docking poses from the non-native-like dosing poses.

Dataset for hotspot prediction

A set of 192 residue mutations from ASEdb and 126 residue mutations from BID were considered for training and testing PPCheck for hotspot prediction. The two datasets ASEdb and BID are mutually exclusive and independent of each other. Also, the same datasets have been largely used by almost all the existing methods of hotspot prediction; thus, consistency in the datasets is maintained. This also ensures a fair comparison between the existing methods.

ASEdb contains information about differences in binding energy of the complex when a single residue is mutated to alanine. A residue is considered as a hotspot if a gain of 2 kcal/mol or more of free energy of interaction is obtained, when mutated to alanine. Non-hotspots are the residues that when mutated to alanine causes a change in the binding energy by less than 0.4 kcal/mol. A total of 77 hotspots and 115 non-hotspots from ASEdb and 39 hotspots and 87 non-hotspots from BID were mixed and randomly rotated 10 times to form mutually exclusive 10-fold training and 10-fold test dataset. All further studies were carried out using these datasets.

Extent of spatial residue interaction

The number of residues from the partner chain with which a particular residue (say A) in the protein chain interacts is defined as the ESRI represented as D_i of that residue (A). For example, if a residue at the 100th position in a protein is present at the interface and is spatially interacting with 10 other residues from the partner chain, then the ESRI of this residue (present at the 100th position) is said to be 10.

Normalized extent of spatial residue interaction

The ratio of the number of residues that interact with an interface residue (say x) and the average number of residues that all the interface residues interact with is known as the normalized extent of spatial residue interaction (NESRI) of that residue (x). The definition of NESRI has been illustrated in a better way using an example in Supplementary File 2.

Extent of energy contribution

The energy contributed by a residue (say A), present at the interface, while interacting with residues present at the interface of the interacting partner is known as the extent of energy contribution (EEC), represented as D_e of that residue (A). For example, if a residue at the 100th position is present at the interface and while interacting with other residues from the interacting chain, it contributes a total of -10 kJ/mol energy, then the EEC of this residue (present at the 100th position) is said to be -10.

Normalized extent of energy contribution

Normalized extent of energy contribution (NEEC) of a residue is the ratio of EEC of a residue to the average EEC of all the residues present at the interface. The definition of NEEC is explained in a detailed manner, using an example, in Supplementary File 2.

Comparison of hotspot prediction performance of various methods

In order to evaluate the performance of PPCheck, parameters such as sensitivity, specificity, accuracy, and F-score were compared with some of the other available methods for hotspot prediction. For this study, all the methods were applied on 126 mutations of test dataset (from BID) and the various parameters were computed and compared. The various parameters are calculated as

Sensitivity = RECALL = True Positive Rate = \frac{TP}{TP + FN}

Specificity = \frac{TN}{TN + FP}

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

{False Positive R}_{ate} = \frac{FP}{FP + TN}

(10)

PRECISION = \frac{TP}{TP + FP}

(11)

F - score = 2 \times \frac{PRECISION \times RECALL}{PRECISION + RECALL}

(12) where TP (true positive) is a hotspot that is predicted as a hotspot, TN (true negative) is a non-hotspot that is predicted as a non-hotspot, FP (false positive) is a non-hotspot that is predicted as a hotspot, and FN (false negative) is a hotspot that is predicted as a non-hotspot.

Dataset for performing computational alanine scanning

ASEdb records information about changes in binding energy when a residue is mutated to alanine. A set of randomly selected 40 such mutations (Supplementary File 3) was extracted from the database, and computational alanine scanning was performed using PPCheck.

Results and Discussion

Selection of optimum cutoff for electrostatic interaction

A number of charged residues at protein-protein interfaces were calculated between C^β and C^β atoms at various distance cutoffs of 7Å, 8Å, 10Å, 12Å, and 15Å within the training dataset. An optimum distance cutoff is recognized as a value beyond which a significantly large number of residues are spuriously included as interface residues. A slight increase (of maximum up to two residues) was observed when the cutoff was increased from 7Å to 8Å in all the 262 complexes in the training dataset. When the distance cutoff was increased from 8Å to 10Å, the increase was still up to two residues, in all except one complex. However, when the cutoff was increased from 10Å to 12Å, we observe that the number of interface residues in complexes starts increasing by 10 extra residues. Hence, a cutoff of 10Å was selected as an optimum distance cutoff for identifying electrostatic interactions. The chosen cutoff is also important since higher distances include a large number of charged residues at the interface that do not contribute significantly to the stability. Figure 1 shows the pictorial representation of increase in interface residues (number of interface charged residues in bins) when the distance cutoff is increased gradually, in bins, from 7Å to 15Å.

Figure 1.

Increase in the number of charged interface residues at various C^β-C^β atom distance cutoffs in the training dataset. An increase in the number of interface residues (charged residues) is recorded in various bins (0-2, 3-4, 5-6, 7-8, and >8) in the 262 protein-protein interfaces (where water is a part of the interface and) when the distance cutoff between C^β and C^β atoms of charged residues for calculating electrostatic interactions is increased from 7Å to 8Å, 8Å to 10Å, 10Å to 12Å and 12Å to 15Å.

PPCheck as a reliable tool for predicting native-like docking pose out of many decoys

In an earlier study,^26% PPCheck was applied on 270 non-redundant, high-quality protein-protein interfaces, and it was observed that the number of residues in a stable protein-protein interacting complex ranges from 51 to 150, whereas the normalized energy per residue is better (less) than -2 kJ/mol. These values can be considered as gold standard for optimal normalized energy at protein-protein interfaces and hence were used to differentiate the native-like docking poses from the non-native ones.

Results on a dataset of 30 dimers

PPCheck was applied on 30 dimers (15 homodimers and 15 heterodimers) that were earlier used to train DockScore,⁵⁶ an in-house algorithm for ranking docking decoys. For these 30 complexes, each chain was separated and then redocked using FRODOCK after altering its orientation. The aim was to check the consistent efficiency of PPCheck in predicting the native-like models in top-1, top-5, top-10, and top-20 ranks, respectively, out of 99 generated poses.

We observed that PPCheck and conservation and solvent accessibility scores could successfully rank native-like docking pose in top-1 position, from the 99 generated poses by FRODOCK (with an average C-alpha rmsd <4Å from the crystal structure of the complex) in 27 and 29 complexes out of 30. When poses within the top-5 PPCheck ranks are considered, 26 out of 30 complexes could be identified, but with increasing structural deviations from the native crystal structure.

Similarly, PPCheck could successfully rank native-like docking pose in top-1 position (>60% of common interface residues) between native pose and FRODOCK-generated pose in 25 out of 30 complexes. Figure 2 shows the performance of PPCheck in ranking the native-like poses in top-1, top-5, top-10, and top-20 positions.

Figure 2.

Performance of PPCheck in ranking docking decoys. Top-1, top-5, top-10, and top-20 decoys, as ranked by PPCheck, and their similarity with native pose with respect to rmsd (A) and percentage of common interface residues (B).

For analyzing the performance of PPCheck in differentiating native-like and non-native-like docking pose, CAPRI dataset was selected. A total of 1883 decoys, which were all submitted as best models by the respective CAPRI participants, corresponding to six CAPRI targets⁵⁰ (T24, T25, T26, T29, T32, and T36), were selected. When PPCheck alone was applied on these 1883 decoys/models, it could successfully identify 91 (out of 132; 68.9%) near-native-like models (Table 1).

Table 1

Performance of PPCheck (with and without conservation and accessibility) in identifying the native-like (TP) and non-native-like (TN) docking poses from six CAPRI targets.

METHOD(S)/SOURCE(S)	# OF TOTAL MODELS	TP	TN	FP	FN	ACCURACY (%)
PPCheck	1883	91	758	993	41	45.09
PPCheck + Conservation + Accessibility	1883	71	1057	694	61	59.90

Residue conservation, solvent accessibility, and PPCheck

Although PPCheck showed significant capabilities in differentiating the native-like docking poses from the non-native-like ones in a stringent dataset such as CAPRI decoys, we included residue conservation and solvent accessibility as additional parameters to further improve the accuracy of differentiation. Conservation score for all the residues of the interacting chains was obtained by collecting homologous sequences followed by multiple sequence alignment (please see the Methods section for greater details).

Out of the 1883 decoys from the six CAPRI targets, CAPRI assessment team recognized 132 decoys as native-like poses, while remaining 1751 decoys as non-native-like poses. PPCheck, along with residue conservation and solvent accessibility, could successfully differentiate 71 native-like poses and 1057 non-native-like poses. Thus, an overall accuracy of ~60% was achieved in differentiating the native-like (71 out of 132) and non-native-like (1057 from 1751) docking poses.

PPCheck as computational alanine scanning tool

PPCheck was applied on a set of randomly selected 40 mutations from ASEdb and the change in binding energy of the complex when a specific residue was mutated to alanine was recorded. A correlation of 0.716 was observed (Fig. 3) between the changes in binding energy as recorded from experimental studies and as obtained from PPCheck. This correlation should be considered with caution as it has been obtained from a comparatively small dataset and the proportionality constant between the two axes is not 1. Further, there appears to be better correlation for mutations with large changes in binding energy.

Figure 3.

Scatter plot showing the correlation between changes in binding energy of the protein complex (measured in kcal/mol) as measured from PPCheck and experimental studies. A correlation of 0.716 was observed for 40 mutations, selected randomly from ASEdb. Please note that the regression value is about 0.513 and the proportionality constant between the two terms is not 1.

Selection of optimum ESRI

Hotspots are interface residues that are generally seen to occur in clusters,⁵⁹ and they contribute to the stability of the complex,²⁷ along with providing specificity to the complex. Therefore, interface residues are expected to interact with large number of residues within the protein and form the partner chains. PPCheck was therefore applied on the 10-fold mixed training dataset (192 mutations obtained from ASEdb; 126 mutations obtained from BID), and the ESRI (please see the Methods section for explanation) of all the interface residues was calculated using residue-centric normalized PPCheck energies. We then checked how well the top-5 to top-15 residues, having the highest normalized ESRI (more than 1) and NEEC more than 1, were observed as hotspots in various protein-protein complexes in the training dataset. Figure 4 shows how the various parameters, such as true positive rate (TPR) and false positive rate (FPR), vary when top-5 to top-15 residues were considered as a hotspot from the training dataset. The best results were obtained when top-9 residues having the highest ESRI were considered as hotspots.

Selection of optimum EEC

Hotspots are the residues that bring a change of more than 2 kcal/mol when mutated to alanine,⁵⁹ and they contribute more energy than an average interface residue. Thus, we believe that those interface residues that contribute high pseudoenergies while interacting with residues present at the interface will also have the maximum tendency to act as a hotspot. In order to support our assumption, PPCheck was applied on the 10-fold mixed training dataset (192 mutations obtained from ASEdb; 126 mutations obtained from BID) and the EEC of all the interface residues was calculated. We then checked how well the top-5, top-6, top-7, and top-15 residues having the highest EEC (NEEC more than 1) and NESRI more than 1 were observed as hotspots in the various protein-protein complexes in the training dataset. Supplementary File 4 shows that the optimum results for hotspot prediction on 10-fold mixed training dataset were obtained when the EEC is selected as 8.

Figure 4.

TPR versus true negative rate curve for the ESRI, represented as D_i, for 10-fold mixed training dataset. Residues with various ESRI (from 5 to 15) were treated as hotspots in a 10-fold mixed dataset (192 mutations from ASEdb and 126 mutations from BID) in order to select an optimum ESRI that can be used in a generalized manner to predict hotspots. Accuracy, obtained at the respective cutoffs, is represented in the square [-] brackets.

Selection of optimum ESRI for predicting hotspots

A comparison between ESRI and EEC when top-5 to top-15 residues were treated as hotspots revealed that ESRI gave better results than EEC in almost all the cutoff values. Hence, top-9 residues having the highest ESRI (NESRI more than 1) and NEEC more than 1 were selected as hotspots.

How PPCheck predicts hotspots? Selection of ESRI or EEC for prediction

For every cutoff from top-5 to top-15 for ESRI and EEC methods of hotspot prediction, ESRI method gave improved results, ie, ratio of TPRs and FPRs from the ESRI method was better than that from the EEC method. Hence, the ESRI method was used for prediction of hotspots, ie, top-9 residues having the highest ESRI (NESRI more than 1) and NEEC more than 1 were selected as hotspots.

PPCheck as a hotspot prediction tool

PPCheck and other hotspot prediction tools, such as Robetta,⁶⁰ FOLDEF,^30% KFC,⁶¹ MINERVA,⁶² HotPoint,⁶³ KFC2a, and KFC2b,^64% were tested on 126 mutations from the BID dataset, and various parameters such as sensitivity, specificity, accuracy, F-score, and Matthews coefficient were computed and compared in order to evaluate the performance of each program for their ability to correctly predict hotspots. Table 2 shows that PPCheck achieved 58.8% sensitivity, which is much better than the existing programs such as FOLDEF, KFC, Robetta, HotPoint, and MINERVA. It also reported an F-score of 0.556, which is on par with other existing methods of hotspot prediction. Only KFC2 performed better (sensitivity-wise) than PPCheck in predicting hotspots. KFC2 performed better than PPCheck (raw data for comparison were collected from an earlier study),⁶⁴ perhaps because of their consideration of solvent accessibility of residues. However, there were some cases where PPCheck outperformed KFC2b (PPCheck results were a subset of KFC2a, and hence, they were not compared with each other.). We discuss two such cases in the following.

Table 2

Performance of various hotspot prediction programs on an independent test dataset.

METHOD	SENSITIVITY (%)	SPECIFICITY (%)	ACCURACY(%)	F-SCORE	MATHEW'S COEFFICIENT
FOLDEF	26.36	93.51	73.37	0.369	0.272
Robetta	44.85	92.21	77.99	0.548	0.432
KFC	47.27	89.61	76.91	0.549	0.410
MINERVA	51.82	93.51	81.00	0.619	0.517
KFC2b	56.06	90.91	81.73	0.631	0.509
HotPoint	57.58	83.12	75.46	0.583	0.410
PPCheck	58.79	77.92	72.18	0.556	0.356
KFC2a	78.49	83.12	81.73	0.720	0.590

Successful cases

Complex between soluble tissue factor protein and blood coagulation factor VII-A protein (PDB ID - 1FAK; interacting chains - T and L): Alanine scanning mutagenesis results show that two residues, T-LYS-20 position and T-ASP-58 position in soluble tissue factor, act as hotspots. Out of these two, KFC2b could predict only T-LYS-20 as hotspot, whereas PPCheck could identify both these residues as hotspots (Fig. 5A).

Complex between ATP-dependent HSLU protease ATP-binding subunit HSLU and ATP-dependent protease HSLV (PDB ID - 1G3I; interacting chains - A and G/H): Experimental studies reported six residues, A-ASP-438, A-LEU-439, A-ARG-441, A-PHE-442, A-ILE-443, and A-LEU-444, as hotspots. KFC2b could successfully predict only three of them (A-PHE-442, A-ILE-443, and A-LEU-444), whereas PPCheck could predict all the six hotspots (Fig. 5B).

Figure 5.

Experimental hotspots that were successfully predicted by PPCheck but were missed by KFC2b. (A) PPCheck could successfully predict both the hotspots (T-ASP-58 and T-LYS-20) in the complex between soluble tissue factor protein and blood coagulation factor VII-A protein (PDB ID - 1FAK; interacting chains - T and L), whereas KFC2b method could not predict T-ASP-58 as hotspot. (B) PPCheck could successfully predict all the six hotspots (A-ASP-438, A-LEU-439, A-ARG-441, A-PHE-442, A-ILE-443, and A-LEU-444) in the complex between ATP-dependent HSLU protease ATP-binding subunit HSLU and ATP-dependent protease HSLV (PDB ID - 1G3I; interacting chains - A and G/H), whereas KFC2b could predict only three of them (A-PHE-442, A-lLE-443, and A-LEU-444).

Cases where both the programs fared equally well

Complex between EPO receptor and EPO mimetics peptide 1 (PDB ID - 1EBP; interacting chains - A and C/D): Residues A-PHE-93, A-MET-150, A-PHE-205, and C-TRP-13 were found to be hotspots as per alanine scanning mutagenesis results. Both KFC2b and PPCheck could successfully predict three residues each as hotspots for this complex. While KFC2b reported A-MET-150, A-PHE-205, and C-TRP-13 residues as hotspots, PPCheck reported A-PHE-93, A-MET-150, and C-TRP-13 as hotspots (Supplementary File 5).

Unsuccessful cases

Either one or all of the available hotspot prediction programs could successfully predict majority of the hotspots. However, there are some experimentally determined hotspots that could not be predicted by any of the available programs. H-HIS-76 in the complex between DES-GLA factor VII-A (heavy chain) and peptide E-76 (PDB ID - 1DVA; interacting chains - X and H) (Supplementary File 6A), A-ASP-427 in the complex between Nidogen-1 and Basement membrane-specific heparan sulfate proteoglycan core protein (PDB ID - 1GL4; interacting chains - A and B) (Supplementary File 6B), and B-LYS-345 in the complex between beta-catenin and adenomatous polyposis coli protein (PDB ID - 1 JPP; interacting chains - B and D) (Supplementary File 6C) are some of the hotspots that none of the presently available programs could successfully predict. When analyzed in detail, it was observed that all these residues contributed more energy than an average interface residue (normalized energy per residue was greater than 1), but they were found to be exposed (solvent accessibility more than 7%) in the complexed form. These residues were also found to interact with fewer interface residues from interacting partner (ESRI less than 8) and were among moderate or less-conserved amino acids in the proteins. High solvent accessibility (exposed nature of residues in complexed form) and moderate/less conservation of these residues could be the possible reason why these residues could not be predicted by any of the available methods for hotspot prediction. These examples also show that it is informative to apply multiple algorithms for the identification of hotspots and the method with the best statistical measures may not be consistently performing best for every case.

Conclusion

PPCheck is an objective energy scoring scheme to analyze PPIs. It is a valuable resource that can be used for various purposes such as identification of non-covalent interactions at a protein-protein interface, given the coordinates of the two (interacting) chains in a single pdb file. An average docking algorithm generates hundreds of decoys for a given pair of proteins. Most of these generated conformations are incorrect, ie, they do not show any similarity with the native complex. Other successful scoring scheme, such as DockScore, provides scores on a relative basis within the sampled poses and is not meant to recognize cases where all the poses could be incorrect. If more than one software/algorithm is used for obtaining docked complexes, then the complexity of determining the correct pose further increases as the best predicted models by each algorithm can be entirely different. In such cases, PPCheck can be reliably used in differentiating the native-like docking poses from the non-native-like decoys since universal energy ranges have been obtained by studying a large number of protein-protein complexes.²⁶ PPCheck reported an accuracy of ~60% in differentiating the native-like and non-native-like docking poses over a range of CAPRI targets. Prediction of PPIs and recognition of hotspots at the interface region form the central focus for understanding biochemical pathways and for bioengineering/drug design experiments, respectively. It can also be used to successfully predict the critical residues, hotspots, at the interface, which provides stability and specificity to the complex. PPCheck also finds its application in calculating residue conservation and performing computational alanine scanning. Thus, PPCheck is the only webserver, to the best of our knowledge, which can be reliably used for identifying non-covalent interactions, predicting hotspots at the interface, calculating residue conservation, performing computational alanine scanning, and differentiating native-like and non-native-like docking poses. The availability of PPCheck, which provides objective measures of the strength of interactions, should be valuable.

Author Contributions

Designed the project: RS. Carried out the computational work, including scripting and data analysis: AS. Interpreted the results and included additional features: RS, AS. Wrote the first draft of the manuscript: AS. Provided critical comments to improve the manuscript: RS. Both authors reviewed and approved of the final manuscript.

Supplementary Materials

Supplementary File 1

Table showing best values for strong interactors + medium interactors at CD-HIT word-size of 2, threshold of 0.45, and conservation score of 0.65 or more when applied on earlier studied dataset of 270 protein-protein complexes. An interface is termed as strong interactor type if the two interacting proteins have 10 or more conserved, exposed residues in their monomeric form. An interface is termed as medium interactor type if one of the protein has 10 or more conserved, exposed residues in its monomeric form while the other has more than five (but less than 10) such residues.

Supplementary File 2

Detailed explanation of normalized extent of spatial residue interaction (NESRI) and normalized extent of energy contribution (NEEC).

Supplementary File 3

Table showing a set of 40 mutations, selected from Alanine Scanning Energetics database, and their corresponding changes in binding energy as recorded from experimental studies and PPCheck.

Supplementary File 4

True positive rate (TPR) versus true negative rate (TNR) curve for extent of energy contribution (EEC) for 10-fold mixed training dataset. Residues with various “extent of energy contribution” (from 5 to 15) were treated as hotspots in a 10-fold mixed dataset (192 mutations from ASEdb and 126 mutations from BID) in order to select an optimum degree of interaction that can be used in a generalized manner to predict hotspots. Optimum results for hotspot prediction were obtained when top-8 residues with the highest degree of energy (normalized degree of energy more than 1) and normalized degree of interaction more than 1 were treated as hotspots.

Supplementary File 5

Experimental hotspots that were predicted with equal success by both PPCheck and KFC2b. Both PPCheck and KFC2b could successfully predict three of the four experimental hotspots in the complex between EPO receptor and EPO mimetics peptide 1 (PDB ID - 1EBP; interacting chains - A and C/D). While PPCheck predicted A-PHE-93, A-MET-150, and C-TRP-13 as hotspots, KFC2b reported A-MET-150, A-PHE-205, and C-TRP-13 residues as hotspots.

Supplementary File 6

Examples of unsuccessful cases, where some experimentally determined hotspots could not be predicted by any of the available programs. (A) H-HIS-76 in the complex between DES-GLA factor VII-A (heavy chain) and peptide E-76 (PDB ID - 1DVA; interacting chains - X and H), (B) A-427-ASP in the complex between nidogen-1 and basement membrane-specific heparan sulfate proteoglycan core protein (PDB ID - 1GL4; interacting chains - A and B), and (C) B-LYS-345 in the complex between beta-catenin and adenomatous polyposis coli protein (PDB ID - 1 JPP; interacting chains - B and D).

Footnotes

Acknowledgments

This work and AS are supported by the Centres of Excellence (CoE), Department of Biotechnology (DBT), India. AS thanks the vice chancellor of SASTRA University for encouragement and support. The authors also thank Anu G. Nair and Oommen K. Mathew for technical assistance and U.S. Raghavender, Prashant Shingate, and Malini Manoharan for useful discussions.

References

Xenarios

, Rice

D.W.

, Salwinski

, Baron

M.K.

, Marcotte

E.M.

, Eisenberg

DIP: the database of interacting proteins.

Nucleic Acids Res. 2000; 28: 289–91.

Snel

, Lehmann

, Bork

, Huynen

M.A.

STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene.

Nucleic Acids Res. 2000; 28: 3442–4.

Stark

, Breitkreutz

B-J

, Reguly

, Boucher

, Breitkreutz

, Tyers

BioGRID: a general repository for interaction datasets.

Nucleic Acids Res. 2006; 34(Database issue): D535–9.

Andre

, Cedex

E.V.

A coarse-grained protein-protein potential derived from an all-atom force field.

Proteins. 2007; 111: 9390–9.

Ferna

, Totrov

, Abagyan

ICM-DISCO docking by global energy optimization with fully flexible side-chains.

Proteins. 2003; 117: 113–7.

Cheng

T.M.

, Blundell

T.L.

, Fernandez-recio

pyDock: electrostatics and desolvation for effective scoring of rigid-body protein-protein docking.

Proteins. 2007; 515(April): 503–15.

Bertonati

, Honig

, Alexov

Poisson-Boltzmann calculations of nonspecific salt effects on protein-protein binding free energies.

Biophys J. 2007; 92: 1891–9.

Vajda

, Sipplt

, Novotny

Empirical potentials and functions for protein folding and binding.

Curr Opin Struct Biol. 1997; 7: 222–8.

Pierce

, Weng

ZRANK: reranking protein docking predictions with an optimized energy function.

Proteins. 2007; 67: 1078–86.

10.

Andrusier

, Nussinov

, Wolfson

H.J.

FireDock: fast interaction refinement in molecular docking.

Proteins. 2007; 69: 139–59.

11.

Sippl

M.J.

Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins.

J Mol Biol. 1990; 213: 859–83.

12.

Moont

, Gabb

H.A.

, Sternberg

M.J.E.

Use of pair potentials across protein interfaces in screening predicted docked complexes.

Proteins. 1999; 373: 364–73.

13.

Glaser

, Steinberg

D.M.

, Vakser

I.A.

, Ben-tal

Residue frequencies and pairing preferences at protein-protein interfaces of known high-resolution residue - residue contact preferences. The residue statistical strength of the data set. Differences between amino acid dist.

Proteins. 2001; 102: 89–102.

14.

Huang

S-Y

, Zou

An iterative knowledge-based scoring function for protein-protein recognition.

Proteins. 2008; 72: 557–79.

15.

Neuvirth

, Raz

, Schreiber

ProMate: a structure based prediction program to identify the location of protein-protein binding sites.

J Mol Biol. 2004; 338: 181–99.

16.

Keskin

, Nussinov

, Gursoy

PRISM: protein-protein interaction prediction by structural matching.

Methods Mol Biol. 2008; 484: 505–21.

17.

Chen

, Zhou

H-X

. Prediction of interface residues in protein-protein complexes by a consensus neural network method: test against NMR data. Proteins. 2005; 61: 21–35.

18.

Porollo

, Meller

Prediction-based fingerprints of protein-protein interactions.

Proteins. 2007; 645: 630–45.

19.

Negi

S.S.

, Schein

C.H.

, Oezguen

, Power

T.D.

, Braun

InterProSurf: a web server for predicting interacting sites on protein surfaces.

Bioinformatics. 2007; 23(24): 3397–9.

20.

Tina

K.G.

, Bhadra

, Srinivasan

PIC: protein interactions calculator.

Nucleic Acids Res. 2007; 35(Web Server issue): W473–6.

21.

Reynolds

, Damerell

, Jones

ProtorP: a protein-protein interaction analysis server.

Bioinformatics. 2009; 25: 413–4.

22.

Rashid

, Ramasamy

, Raghava

G.P.S.

A simple approach for predicting protein-protein interactions.

Curr Protein Pept Sci. 2010; 11: 589–600.

23.

Krissinel

, Henrick

Inference of macromolecular assemblies from crystalline state.

J Mol Biol. 2007; 372: 774–97.

24.

Alva

, Syamala Devi

D.P.

, Sowdhamini

COILCHECK: an interactive server for the analysis of interface regions in coiled coils.

Protein Pept Lett. 2008; 15: 33–8.

25.

Sunitha

M.S.

, Nair

A.G.

, Charya

, Jadhav

, Mukhopadhyay

, Sowdhamini

Structural attributes for the recognition of weak and anomalous regions in coiled-coils of myosins and other motor proteins.

BMC Res Notes. 2012; 5: 530.

26.

Sukhwal

, Sowdhamini

Oligomerisation status and evolutionary conservation of interfaces of protein structural domain superfamilies.

Mol Biosyst. 2013; 9: 1652–61.

27.

Henrick

, Moult

, Eyck

L.T.

. CAPRI: a critical assessment of predicted interactions. Proteins. 2003; 9: 2–9.

28.

Joe

Assessing predictions of protein-protein interaction: the CAPRI experiment.

Protein Sci. 2005; 14: 278–83.

29.

Janin

Protein-protein docking tested in blind predictions: the CAPRI experiment.

Mol Biosyst. 2010; 6: 2351–62.

30.

Lensink

M.F.

, Wodak

S.J.

Docking and scoring protein interactions: CAPRI 2009.

Proteins. 2010; 78: 3073–84.

31.

Lensink

M.F.

, Wodak

S.J.

Docking, scoring, and affinity prediction in CAPRI.

Proteins. 2013; 81: 2082–95.

32.

Schymkowitz

, Borg

, Stricher

, Nys

, Rousseau

, Serrano

The FoldX web server: an online force field.

Nucleic Acids Res. 2005; 33(Web Server issue): W382–8.

33.

Keskin

, Ma

, Nussinov

Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residues.

J Mol Biol. 2005; 345: 1281–94.

34.

Thorn

K.S.

, Bogan

A.A.

ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions.

Bioinformatics. 2001; 17: 284–5.

35.

Fischer

T.B.

, Arunachalam

K.V.

, Bailey

. The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics. 2003; 19: 1453–4.

36.

Guerois

, Nielsen

J.E.

, Serrano

Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations.

J Mol Biol. 2002; 320: 369–87.

37.

Kortemme

, Kim

D.E.

, Baker

Computational alanine scanning of protein-protein interfaces.

Sci STKE. 2004; 2004: l2.

38.

Darnell

S.J.

, Page

, Mitchell

J.C.

Predicting protein interaction hot spots.

Proteins. 2007; 68: 813–23.

39.

Guney

, Tuncbag

, Keskin

, Gursoy

HotSprint: database of computational hot spots in protein interfaces.

Nucleic Acids Res. 2008; 36(Database issue): D662–6.

40.

Lise

, Archambeau

, Pontil

, Jones

D.T.

Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods.

BMC Bioinformatics. 2009; 10: 365.

41.

Ofran

, Rost

Protein-protein interaction hotspots carved into sequences.

PLoS Comput Biol. 2007; 3: e119.

42.

Tuncbag

, Gursoy

, Keskin

Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy.

Bioinformatics. 2009; 25: 1513–20.

43.

Gonzalez-Ruiz

, Gohlke

Targeting protein-protein interactions with small molecules: challenges and perspectives for computational binding epitope detection and ligand finding.

Curr Med Chem. 2006; 13: 2607–25.

44.

Huo

, Massova

, Kollman

P.A.

Computational alanine scanning of the 1: 1 human growth hormone-receptor complex.

J Comput Chem. 2002; 23(1): 15–27.

45.

Rajamani

, Thiel

, Vajda

, Camacho

C.J.

Anchor residues in protein-protein interactions.

Proc Natl Acad Sci USA. 2004; 101: 11287–92.

46.

Brinda

K.V.

, Kannan

, Vishveshwara

Analysis of homodimeric protein interfaces by graph-spectral methods.

Protein Eng. 2002; 15: 265–77.

47.

Brooks

B.R.

, Bruccoleri

R.E.

, Olafson

B.D.

, States

D.J.

, Swaminathan

, Karplus

Program for macromolecular energy, minimization, and dynamics calculations.

J Comput Chem. 1983; 4: 187–217.

48.

Forces

, Company

R.P.

Interaction models for water in relation to protein hydration For molecular dynamics simulations of hydrated proteins a simple yet reliable model for the intermolecular potential for water is required. Such a model must be an effective pair potential val.

Intermol Forces. 1981; 331: 331–8.

49.

Kabsch

, Sander

Dictionary protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Biopolymers. 1983; 22: 2577–637.

50.

Oliva

, Vangone

, Cavallo

Ranking multiple docking solutions based on the conservation of inter-residue contacts.

Proteins. 2013; 81: 1571–84.

51.

Altschul

S.F.

, Madden

T.L.

, Schäffer

A.A.

. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25: 3389–402.

52.

, Godzik

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Bioinformatics. 2006; 22: 1658–59.

53.

Larkin

M.A.

, Blackshields

, Brown

N.P.

. Clustal W and Clustal X version 2.0. Bioinformatics. 2007; 23: 2947–8.

54.

Johnson

M.S.

, Overington

J.P.

A structural basis for sequence comparisons. An evaluation of scoring methodologies.

J Mol Biol. 1993; 233: 716–38.

55.

Mizuguchi

, Deane

C.M.

, Blundell

T.L.

, Johnson

M.S.

, Overington

J.P.

JOY: protein sequence-structure representation and analysis.

Bioinformatics. 1998; 14: 617–23.

56.

Malhotra

, Sankar

, Sowdhamini

Structural interface parameters are discriminatory in recognising near-native poses of protein-protein interactions.

PLoS One. 2014; 9: e80255.

57.

Garzon

J.I.

, Lopéz-Blanco

J.R.

, Pons

. FRODOCK: a new approach for fast rotational protein-protein docking. Bioinformatics. 2009; 25: 2544–51.

58.

Mukherjee

, Zhang

MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming.

Nucleic Acids Res. 2009; 37: 1–10.

59.

Bogan

A.A.

, Thorn

K.S.

Anatomy of hot spots in protein interfaces.

J Mol Biol. 1998; 280: 1–9.

60.

Kortemme

, Baker

A simple physical model for binding energy hot spots in protein-protein complexes.

Proc Natl Acad Sci USA. 2002; 99(22): 14116–21.

61.

Darnell

S.J.

, LeGault

, Mitchell

J.C.

KFC Server: interactive forecasting of protein interaction hot spots.

Nucleic Acids Res. 2008; 36(Web Server issue): W265–9.

62.

Cho

, Kim

, Lee

A feature-based approach to modeling protein-protein interaction hot spots.

Nucleic Acids Res. 2009; 37: 2672–87.

63.

Tuncbag

, Keskin

, Gursoy

HotPoint: hot spot prediction server for protein interfaces.

Nucleic Acids Res. 2010; 38(Web Server issue): W402–6.

64.

Zhu

, Mitchell

J.C.

KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features.

Proteins. 2011; 79: 2671–83.

PPCheck: A Webserver for the Quantitative Analysis of Protein-Protein Interfaces and Prediction of Residue Hotspots

Abstract

Background

Results

Conclusions

Keywords

Background

Methods

Identification of non-covalent interactions

Implementation

Selection of right cutoff for electrostatic interactions

Dataset for prediction of native-like docking pose

Residue conservation as an additional parameter with PPCheck to predict accuracy of native-like docking decoys

Studies on an additional dataset for evaluating the effectiveness of methodology

Dataset for hotspot prediction

Extent of spatial residue interaction

Normalized extent of spatial residue interaction

Extent of energy contribution

Normalized extent of energy contribution

Comparison of hotspot prediction performance of various methods

Dataset for performing computational alanine scanning

Results and Discussion

Selection of optimum cutoff for electrostatic interaction

PPCheck as a reliable tool for predicting native-like docking pose out of many decoys

Results on a dataset of 30 dimers

Residue conservation, solvent accessibility, and PPCheck

PPCheck as computational alanine scanning tool

Selection of optimum ESRI

Selection of optimum EEC

Selection of optimum ESRI for predicting hotspots

How PPCheck predicts hotspots? Selection of ESRI or EEC for prediction

PPCheck as a hotspot prediction tool

Successful cases

Cases where both the programs fared equally well

Unsuccessful cases

Conclusion

Author Contributions

Supplementary Materials

Supplementary File 1

Supplementary File 2

Supplementary File 3

Supplementary File 4

Supplementary File 5

Supplementary File 6

Footnotes

Acknowledgments

References