Abstract
In this review, we summarize the progress on coarse-grained elastic network models (CG-ENMs) in the past decade. Theories were formulated to allow study of conformational dynamics in time/space frames of biological interest. Several highlighted models and their underlined hypotheses are introduced in physical depth. Important ENM offshoots, motivated to reproduce experimental data as well as to address the slow-mode-encoded configurational transitions, are also introduced. With the theoretical developments, computational cost is significantly reduced due to simplified potentials and coarse-grained schemes. Accumulating wealth of data suggest that ENMs agree equally well with experiment in describing equilibrium dynamics despite their distinct potentials and levels of coarse-graining. They however do differ in the slowest motional components that are essential to address large conformational changes of functional significance. The difference stems from the dissimilar curvatures of the harmonic energy wells described for each model. We also provide our views on the predictability of ‘open to close’ (open→close) transitions of biomolecules on the basis of conformational selection theory. Lastly, we address the limitations of the ENM formalism which are partially alleviated by the complementary CG-MD approach, to be introduced in the second paper of this two-part series.
Introduction
Protein has a dynamic nature. Dynamics encoded in the evolutionally optimized (Leo-Macias et al. 2005) structures are coupled with catalytic chemistry in facilitating protein functions (Eisenmesser et al. 2002; Wolf-watz et al. 2004; Yang et al. 2005a). The ‘jiggling and wiggling’ atoms were understood in further depth when Koshland turned the ‘lock-and-key’ paradigm (Fischer, 1894) into an ‘induced-fit’ fever (Koshland, 1958) along with the first determined protein structure of sperm whale myoglobin by X-ray crystallography (Kendrew et al. 1958), in the same year. However, theoretical physics in long-wait could not characterize such an intrinsic property of proteins, microscopically, despite accumulated X-ray-solved structures in the 60's and early 70's, until computational facilities were mature enough to accomplish the first Molecular Dynamics (MD) simulation by Karplus and coworkers (McCammon et al. 1977). This seminal work described atomic motions following Newton's 2nd law with an empirical potential energy function and suggested a fluid-like nature in the interior of the protein (McCammon et al. 1977).
Not before long, in early 80's, Noguti T and Gō first examined the fluctuations of globular protein by a set of collective variables (Noguti T and Gō N, 1982). The new application of Normal Mode Analysis (NMA; Goldstein, 1950) to the protein BPTI deciphered the mode compositions of protein fluctuations (motions in the range <30 cm– 1 dominate the fluctuations) and described crystallographic temperature (B-) factors (the degree of uncertainty in atomic positions) surprisingly well (Gō et al. 1983; Brooks and Karplus, 1983). Domain motions at the active sites of lysozyme and ribonuclease were seen to occur in low frequency normal modes (Levitt et al. 1985). Within the small fluctuations at the equilibrium (reached after energy-minimizing the crystal structure according to a given potential energy function), NMA approximates the complicated potential (comprised of multiple contributions including bond stretching, angle bending, dihedral, electrostatics and van der Waals) surface harmonically. The second derivatives of the potential (with respect to atom displacements), the Hessian, is singular-value decomposed to obtain the normal mode shapes (eigenvectors) and frequencies (the square root of the eigenvalues). The analytical approach, solving the eigen-problem of the 3Na by 3Na Hessian matrix (Na is the number of atoms in the protein) greatly reduces the computation time for obtaining the equilibrium dynamics of protein, as compared to MD. The low-frequency (slow) modes containing a certain degree of anharmonicity (Gō et al. 1983) not only are able to describe functional, configurational changes (Brooks and Karplus, 1985) but also help in the refinement of X-ray structures (Kidera and Gō 1992a, b).
NMA usually yields robust results, especially in the low frequency regime, because the results are not subject to statistical errors or sampling inaccuracies (unlike those retrieved from MD). However, MD, which makes no assumption about the underlying potential surfaces and allows transitions across energy barriers (which anharmonic motions include), is dearly needed to describe non-equilibrium dynamics involved in biologically important large conformational transitions, given a sufficient duration of simulations (Kitao et al. 1998; Arkhipov 2006a, b). However, the heavy computation of MD has limited its applicability to large biomolecular systems. The renaissance and further developments of coarse-grained MD (CG-MD) models, able to overcome the computational limit at a decreased resolution while maintaining key dynamic features of described systems (Tozzini, 2005), made possible simulations up to tens of microseconds (see our review on CG-MD in this series).
NMA gained unprecedented popularity in the late 90's along with two simplified schemes that resulted in a huge reduction of computational cost. One is the introduction of the Elastic Network (EN) concept, using a much simplified potential, being introduced by Tirion (Tirion, 1996) who proposed modeling molecules with their atoms within an interaction range being connected by Hookean springs of a universal strength. However, the first use of the word ‘network’, interpreting protein as junctions and elastic connections, was pioneered by Bahar and coworkers (Bahar et al. 1997) who took the idea from polymer science (Flory, 1976), using only Cα atoms to represent the protein. The description of proteins in reduced presentations is the so-called Coarse-Grained (CG) approach. Slow modes derived from both schemes were found to agree well with slow modes obtained from the standard NMA that uses a much more detailed potential. The saving of computational cost is tremendous: GNM, Bahar's model, required the diagonalization of a dimension-reduced Hessian, Γ (see below), which took 8.2 sec for T4-lysozyme (164 residues) on a single workstation (Bahar et al. 1997), as compared to 3 days by NMA (see Table 1) and much longer for nanosecond MD simulation for proteins of the same size to capture similar structural deformations.
Main features of CG-ENM.
Note that standard NMA has Hessian of the same size as Tirion's hence same diagonalization time.
with user-defined number of atoms.
when the structure is in the energy well of the conformer m at a given external parameter λ.
c is constant; ab = ij if not stated otherwise; i and j denote residues if n = N, or atoms if n = Na; eq 1 is not applied for RTB.
the dimension of the square matrix
tH and tΓ are the time taken to diagonalize the
Since then, the ease of programming and reduced computational cost due to the use of simplified potentials and smaller number of degrees of freedom resulted in wide-spread application of CG-EN models to deduce both the conformational dynamics of large structures and assemblies, including hammerhead ribozyme (Van Wynsberghe and Cui, 2005), CDK2/cyclin A (Dror and Bahar 2005), citrate synthase (Hinsen, 1999), hemoglobin (Xu et al. 2003), HIV reverse transcriptase (Bahar et al. 1999; Hinsen, 1999), hemagglutinin A (Doruker et al. 2002), aspartate transcarbamylase (Hinsen, 1999), F1-ATPase (Cui et al. 2004), an actin segment (Ming et al. 2003), GroEL-GroES (Keskin et al. 2002), the ribosome (Wang et al. 2004; Yang et al. 2006; Cui and Bahar, 2006) and viral capsids (Rader et al. 2005). Many intriguing biological systems as such, in a variety of sizes and extended applications (Tama et al. 2004; Leo-Macias et al. 2005) of ENMs have been carefully reviewed (Case, 1994; Kitao and Gō, 1999; Ma, 2005; Rader and Bahar, 2005; Tozzini, 2005). However, in-depth comparisons of the theories that underline the ENMs and their offshoots have been lacking.
In this review, ENMs (highlighted on Tirion's model, GNM, ANM and RTB/BNM) are illustrated in sufficient theoretical details: the basic hypotheses, the physical grounds, mathematical treatments and consequently achieved computational efficiency. The well comprehended ENM foundations serve to interpret data obtained from comparisons between predictions and experimental results, namely the observed equilibrium and non-equilibrium dynamics and those within ENMs themselves. Slow normal modes derived from different potentials and molecular resolutions are found robust within a subspace spanned by 5–6 dimensions (Nicolay and Sanejouand, 2006) but not on a one-to-one basis between the models.
NMA-based models show different levels of accuracy (Tama and Sanejouand, 2002) when examining the agreement between experimentally characterized large conformational changes and single slow-mode-driven structural deformations. These results together can be understood by motions taking place in harmonic energy wells with different curvatures approximated by different potentials and coarse-grained levels in the models. The profile of potential of mean force also helps in understanding the origin of a better predicted open → close transition than the closed → open counterpart by ENMs, using the concept of conformational selections (Ma et al. 1999; Dror and Bahar, 2006). Lastly, benefiting from the statistical study on comparisons of X-ray B-factors, RMSDs of NMR ensembles and GNM, we recently reported how experimentally characterized dynamics can comprise (and be affected by) motional components in different frequencies (Yang et al. 2007) and how the findings can be of use to understand the frequency dispersions of the models themselves. A separate review article on coarse-grained molecular dynamics simulations is presented back-to-back so as to address the non-equilibrium structural transitions that are beyond the reach of herein introduced EN models, given their basic hypotheses.
Theory—The ENM Models
Atomic-ENM
Tirion's model
To dispense with the problematic energy-minimization process prior to NMA while gaining computer efficiency, Tirion proposes a model to connect atom pairs with Hookean springs with a universal force constant γ (Tirion, 1996). The equilibrium structures are taken from the experimentally (X-ray or NMR) characterized structures assuming a zero energy. The resulting potential is a harmonic approximation that is much simplified than sophisticated potentials (Fig. 1a) used in NMA involving multiple bonded and nonbonded terms, which may or may not be harmonic depending on the instantaneous configurations of biomolecules in question. The total energy E of a molecule is

(a) The effect of simplified potentials. The energy landscape outlined by the conventional force field (detailed potential, as used by standard NMA) is drawn in thick lines. Simplified potential, (in thin lines) as used in Tirion's or CG-EN models, approximates the rugged potential surface crossing the local energy barriers. At coarse-grained level, the rugged potential can be described by RTB/BNM and the smoothed-out one by XNM {X = A, βG, C, D and HE …}. Despite the difference between the two potentials, equilibrium dynamics characterized by X-ray and NMR can be well described by both potentials from which the derived slowest modes cover the slowest ends of experimentally observed dynamics (see Discussion). In contrast, the slowest modes derived from the smoothed-out potential are slower than those derived from the detailed potential due to the narrower energy wells in the latter. As a result, large conformational transitions with high anharmonicity could be better predicted by the slowest modes derived from the simple elastic potential than by force-field-based potential. The blue long dashed line joins the equal energy points of two CG energy wells as described by PNM (see Supplementary Material).
where
CG-ENM
GNM
GNM, developed by Bahar and coworkers (Bahar et al. 1997), differs from Tirion's ENM in the following aspects. It was the first ENM that represents proteins with interacting ‘nodes’ at the amino-acid level (the CG scheme) while successfully reproducing X-ray B-factor data (Bahar et al. 1997), H/D exchange free energy costs (Bahar et al. 1998b) and 15 N-NMR relaxation order parameters (Haliloglu and Bahar, 1999). Its potential employs the vector form of the displacement for node pair i and j under the isotropic assumption (<ΔX ΔX T > = <ΔY ΔY T > = <ΔZ ΔZ T > = (1/3) <ΔR ΔRT >, T is transpose, see the Supplementary Material):
or
where
We can easily see the difference in the potentials of Tirion's and GNM. The inner product of vector differences, instead of the scalar difference of the i-j pair separations, penalizes not only the translational but also the rotational displacement, which partially accounts for its better B-factor agreement over other ENM models (Cui and Bahar, 2006).
The nodes in GNM are usually the Cα atoms of amino acid residues. Rc is generally set near or above 7 Å, the range of which covers the first coordination shell (Bahar et al. 1997; Cui and Bahar, 2006; see also Discussion). From the basics of Statistical Mechanics, one can easily derive the following results (see Supplementary Material for details)
where
λk–1 is the reciprocal of the kth nonzero eigenvalue (the frequency square of the kth mode) solved for Γ. The slowest mode (the 1st mode, with the lowest frequency) that has the most dominant contribution to the entire fluctuation is along the eigenvector
Hence, B-factors can be predicted from GNM whereas the needed spring constant is obtained as scaling the predictions to match up with the experiment, namely the magnitude of B-factors, assuming internal atomic fluctuations fully account for structural uncertainties (Bahar et al. 1997; Cui and Bahar, 2006). The correlation between theories and experiments on the B-factors is found around 0.6 for a wide range of cutoffs and temperatures (Yang et al. 2006, supplementary material) and is 0.65 if crystal contacts are considered (Kundu et al. 2002).
CNM, an isotropic model extended from GNM, has reported a 0.74 correlation with B-factor profiles (Kondrashov et al. 2006) of 98 high resolution (< 1.0 Å) structures while employing a few modification schemes including crystal contacts (also reported previously by Kundu et al. 2002), residue contacts determined in atomic level while maintaining a N × N connectivity matrix (Γ) and enhanced force constant for backbone connections by a factor 10 (Kondrashov et al. 2006). More details can be seen in the Supplementary material.
ANM/Hinsen's CG-ENM
The ‘restoration’ of predicted fluctuations from 1-D (magnitude only) to 3-D came no later than 1998, pioneered by Hinsen (Hinsen, 1998). Hinsen's ENM (HENM) is carried out at the residual level, the potential adopts the same form as Tirion's except for the spring constant being in exponential decay with increasing residual pair separations. The decay corresponds to a weakened interaction between pairs far apart which simply reflects the physicochemical reality, although there is no specific reason why an exponential form has to be taken (Hinsen, 1998). The suggested form is
The parameter r0 is set at 3-7 Å so as to best reproduce the low frequency normal modes obtained with the AMBER force field (Hinsen, 1998; Hinsen et al. 1999); c is a scaling factor. The design eliminates the need for assigning a cutoff distance (or interchangeably in this article, cutoff) as in other ENM models. However, an updated version of
Following a different path of derivation, Atilgan obtained the same result that differs from Hinsen's only at the spring constant being set as constant for simplicity hence the need to assign a cutoff distance of interactions (Atilgan et al. 2001). ANM is basically the CG version of Tirion's ENM except in assuming uniform mass for each amino acid (or bead) as done in HENM. HENM and ANM basically solve an eigen-problem involving a 3N × 3N force constant matrix (the second derivatives of the potential, see below), the Hessian (
The form is derived from the second derivatives (with respect to the node displacements) of the potential. Here, xij, yij and zij are the components of
As for the efficiency, Hinsen's Hessian is less sparse than ANM's and therefore takes minimal advantage of the regular sparse matrix solver hence a slower computation than ANM, despite similar low-frequency modes being obtained in both (Cui and Bahar, 2006).
RTB/BNM
The collective motions seen in low-frequency normal modes often occur at the levels of residues, secondary structures, or even domains. It provides the physical motivation to describe such motions as rigid-body translations/rotations of blocks (RTB) of atoms (Fig. 2a), the mathematical treatment of which is the projection of the 3Na by 3Na atomistic Hessian into a small 6nB × 6nB block-matrix, where Na is the number of atoms and nB is the number of blocks chosen for the molecule in question (see below).

(
Although Bahar physically coarse-grained proteins, Sanejouand and co-workers were among the first to coarse-grain the protein at the mathematical level as early as 1994 by breaking up the protein into residue blocks (the building-block approach) while introducing rotation-translation basis into the atomic Hessian (Durand et al. 1994). With the eigen-problem solved at a reduced dimension, RTB makes the dynamic analyses of supramolecules computationally tractable, in the same spirit as other CG models. The analysis on a series of proteins of various sizes is made possible and demonstrates a good reproducibility of standard NMA results especially in the low-frequency spectrum. (Tama et al. 2000).
Atomistic Hessian herein,
of size 6nB × 6nB, usually 25 fold smaller in memory storage and therefore 125 fold faster in computation than
Here,
The Block Normal Mode (BNM) approach is basically the same as RTB, yet employs a better computational implementation such that the required atomic Hessian elements for constructing the ‘blocks’ are computed on the fly and the big
Note that approaches such as RTB/BNM are different from other CG-EN models on two aspects. RTB and BNM inevitably need the preliminary energy minimization, as the standard NMA, before building the atomic Hessian elements. Moreover, the harmonic potential they describe, despite being blocked, is less smoothened out than models such as GNM or ANM that have a physically coarse-grained elastic potential (Fig. 1a, see also Discussion).
Other EN Models such as backbone-enhanced elastic network model (BENM), β Gaussian Model (βGM), quantized elastic deformation model (QEDM), plastic network model (PNM), double-well elastic network model (DWNM) and models based on linear response theory, also to readers’ great interest, are introduced in the supplementary material. Their potentials and resulting properties of Hessians are summarized in Table 1.
Online Access of the Cg-En Models and NMA Results
Web services for GNM (Yang et al. 2005b, 2006), ANM (Eyal et al. 2006), NMA (Wako et al. 2004) and others (see a review by Xiong and Karimi 2007) are developed in recent years to facilitate a high-throughput analysis on conformational dynamics via ‘biologist-friendly’ interfaces.
Discussion
The section is composed to first analyze the basic assumptions of EN models, the nature of the normal modes that are derived from them and the nature of the experimental observations that are often compared with ENM-derived predictions and eventually answer the question—which EN model is the ‘best’?
The Essence of the Cutoffs
The cutoff distance (Rc) is usually set to take account the physical reality and save the computation cost on those negligible interactions for atom pairs far apart (Leach, 2001). For Tirion's ENM or ANM, Rc was first chosen to best reproduce the frequency spectrums of NMA or GNM respectively (Tirion, 1996; Atilgan et al. 2001) whereas in GNM, it was chosen to include the contacts within the first coordination shell defined by the Cα-based radial distribution function (Bahar et al. 1997). However, Yang has shown that a range of Rc from 7 to 15 Å simply renders statistically identical correlations with B-factors over 1250 nonhomologous proteins (Yang et al. 2006). The robust isotropic nature of time-average fluctuations was also reported in Eyal and Kondrashov's studies (Eyal et al. 2007; Kondrashov et al. 2007).
Taking the correlation with B-factors of Protein/DNA/RNA biocomplexes as a function of residue-residue contact distance (Rc), nucleotide-nucleotide contact distance (R), residue-nucleotide contact distance ((Rc + Rp)/2) and the number of beads (from 1 to 3) used to represent a nucleotide given one bead per protein residue, Bahar and coworkers found that the result was maximized at Rc = Rp = 7 Å given 3 beads per nucleotide which is known to be roughly 3 times heavier than an amino acid (Yang et al. 2006). The setting of 3-nodes-per- nucleotide (P, C4* in the sugar and C2 in the base) plus 1-node-per-residue also made nodes distributed more evenly within the shape of the molecule than other settings.
The use of cutoff distance in these models simply serves to measure the local packing density (Halle, 2002) which is the counts within a fixed volume (consequently a fixed cutoff distance). The concept herein has been widely used in classical/statistical mechanics for sampling particle properties at a coarse-grained level (Nitzan, 2006). Mixed cutoff schemes, as first attempted in the study above, do not sample such a density in equal volume while the packing density is known to have a dominant contribution to residue fluctuations (Halle, 2002; Cui and Bahar, 2006; Yang et al. 2006). Biased local density sampled leads to an unphysical Hessian that preserves no cutoff information, causing impaired predictions. Hence, as long as the employed cutoff renders a good representation of local features (not too small for nodes to ‘see’ only the backbone neighbors or too wide for all the nodes to be connected together), similar prediction results for isotropic data are faithfully obtained. We should also note that anisotropic vibrations are more sensitive to employed cutoffs hence models based on detailed potentials giving better predictions for ADPs than ANM does (Kondrashov et al. 2007).
Slow Modes Rather than Fast Modes are Robust
Nicolay and Sanejouand asked how many normal modes are needed for a given NMA-based model to describe the normal modes obtained from other protein models that use different potential and coarse-grained schemes. The results suggested that 5–6 Tirion's EN modes in the lowest frequencies are enough for the description of a few slow modes obtained with the all-atom CHARMM potential (Nicolay and Sanejouand, 2006). The invariant nature of a robust subspace spanned by 5 to 6 normal modes was again seen in the crosscheck over the other two CG-EN models, including ANM (Nicolay and Sanejouand, 2006). Moreover, low-frequency subspace from essential dynamics analysis is found to be spanned well by a few low-frequency normal modes (Rueda et al. 2007). In fact, similar slowest (1st) modes can be obtained through a hierarchy of coarse-grained (HCA) schemes for a given EN model (Doruker et al. 2002; Ming et al. 2002).
Proteins with a similar architecture encode similar conformational dynamics, as natural as one might expect. However, slow components are more robust against structural variations than the fast ones (Keskin et al. 2000; Cox et al. 2007). Quantitatively speaking, a 2.1 Å RMSD between two structures of the same protein, separately solved by X-ray and NMR, gives a correlation of 0.94 (statistical average) between their slowest-mode profiles that are derived from GNM (Yang et al. 2007). The insensitivity to minor structural changes is understood to stem from the collective nature of the low-frequency modes. The collective oscillation is a joint effect of many interacting pairs, summed up to approach a universal form that is governed by the central limit theorem, regardless of the details of pair positions or potentials (Tirion, 1996; Atilgan et al. 2001). Another interesting observation made by ANM combined with a structural perturbation method is that low modes are robust to sequence variations or in other words, insensitive to mutations (Zheng et al. 2006).
Magnitude Rather than Directions of Fluctuations is a Robust Feature
On the other hand, the fluctuation magnitude is better predicted than the direction of the motions. Kondrashov used five different CG-EN models including BNM using the CHARMM potential to examine their agreement with crystallographically characterized isotropic and anisotropic dynamics (Kondrashov et al. 2007). The result showed almost the same correlation between the predicted time average magnitude and the reported isotropic fluctuations for the five models, whereas the predictions on the reported directions of motions are shown to be model-dependent (Kondrashov et al. 2007). Bahar and coworkers confirmed the same observation in a systematic study on a collection of ADPs (see the DNM model in Theory) reported in 93 high-resolution PDB structures and found the sums of the diagonal elements (the magnitude) in the inverse Hessian to agree better with experiment than the off-diagonal elements (indicating the directions) (Eyal et al. 2007). In fact, Kondrashov and Eyal found experimentally reported ADPs are highly refinement package dependent (average anisotropy given by Refmac is 0.64 and is 0.51 by SHELX; Kondrashov et al. 2007) and greatly sensitive to the forms of crystal packing symmetry (substantial difference in ADPs reported for the same proteins packed in different space groups; Eyal et al. 2007). One should note that a model that is tuned to best predict the directions of motions does not necessarily best describe the magnitude of the motions (Kondrashov et al. 2007; Eyal et al. 2007), indicating strong experimental artifacts. Use of ANM to predict RMSDs of the 64 NMR ensembles also found better agreements in the magnitude (0.69) rather than the directions (0.62) (Yang et al. 2007). The better reproducibility in the magnitude rather than the directions is not only seen between experiment and theory but also between theoretical results (Cox et al. 2007).
Understanding Dynamics Hidden in the Electronc Cloud
In X-ray crystallography, the iso- and anisotropic B-factors are obtained via a fitting process to position the atoms that best represent the electron density distribution. They have been understood more as the structural uncertainty (or errors) rather than quantization of dynamics. The difficulty to fully count B-factors as dynamic quantities is that they contain strong contributions from the crystal packing. In the early 90's, Kidera and Gō have shown through the use of the standard NMA that the external contribution (58%) to the B-factors are actually larger than the internal ones (42%) in human lysozyme (Kidera and Gō 1992 a, b). As EN models describe internal fluctuations, only, how can a model like GNM score a good correlation with B-factors?
The reason is explained as follows. We have to note that GNM is a 1-D model that motions are carried out in the 1-D magnitude space with a rigid-body translational shift (the trivial mode led by a zero eigenvalue) for the entire molecule. B-factors can therefore be fitted as
As a result, the correlation between the profiles (as a function of residue index) of
Where
Understanding NMR Characterized Dynamics
NMR characterizes protein structure and dynamics in the solvated state. Predictions from ENMs have been in a good agreement with such NMR-characterized dynamics, namely the order parameters (Yang and Kay, 1996), derived from NMR relaxation data (Haliloglu and Bahar, 1999; Ming and Bruschweiler, 2006). Accordingly, Chen recently uses such quantity as a benchmark to rationally select ensembles from MD snapshots that best reproduce the order parameters (Chen et al. 2007). In addition, it is interesting to see that GNM has a 0.74 correlation with the RMSDs of NMR ensembles as opposed to a 0.59 correlation with the B-factors of their X-ray counterparts (same proteins alternatively solved by X-ray). Deleting the slowest GNM mode that contributes to the time-average fluctuations and then comparing with the same aforementioned quantities leaves the correlation with X-ray unchanged but dramatically decreases the correlation with NMR, indicating the differences in the spectrum of modes accessible in solution and in the crystal environment. Specifically, large amplitude motions sampled in solution are subdued in the crystalline environment of X-ray crystallography due to the restraints from crystal contacts (Kundu et al. 2002) and low temperatures (Yang et al. 2007).
Refined NMR conformers are obtained from simulated annealing runs and energy minimization (Brünger, 1991a, b) over the detailed potential surface defined by the target function (Schwieters et al. 2003) that comprise both the empirical force field and NMR restraint-derived penalty terms (Yang et al. 2007). Although more studies are needed for a clear understanding of the correlation between NMR and GNM, surprisingly, anharmonic procedures as such to populate the NMR conformers in distributed local wells can be approximated by GNM that uses simplified elastic potential. The statistical result suggests NMR ensembles should not be deemed solely as the range of ‘errors’ in structure determination but more as a set of conformations accessible to the molecule in question under the experimental conditions.
Open-to-close Transitions Being Better Predicted than Contrariwise
There have been studies showing protein open→close transitions (meaning that the open form of the structure is used by NMA- or MD-based models to generate low frequency normal or PCA modes in order to compare with experimentally identified structural transition vectors) are better predicted than their close→open counterparts in quite a few systems including adenylate kinase (Temiz et al. 2003; Miyashita et al. 2003; Maragakis and Karplus, 2005), citrate synthase (Hinsen, 1999), LAO binding protein (Tama and Sanejouand, 2001), hemoglobin T→R2 transition (Xu et al. 2003) and E. coli ABC Leu/Ile/Val transport system (Trakhanov et al. 2005). A systematic study over 10 structure pairs (open/close) further confirmed this intriguing tendency (Tama and Sanejouand, 2001). The statistics in average, when ANM is used, is 0.58 and 0.43 for open→close and close→open, respectively (Tama and Sanejouand, 2001). The trend does not seem altered when different models or potentials are used. For adenylate kinase, which undergoes large, functional conformational transition that is crucial for life-related signaling cascades when triggered by hormone or metabolite cues, the correlations between prediction and experiment for open→close transition when simplified or detailed potentials are used in the ENM are 0.62 and 0.53 respectively, while those for close→open are 0.38 and 0.37, respectively (Tama and Sanejouand, 2001).
The reason is explained in Figure 3. The concept of conformational selection (Ma et al. 1999; Dror and Bahar, 2006) states that protein binds its ligands/substrates at its preexisting equilibrium state and therefore a certain conformational state is ‘selected’ or being ‘locked up’. Hence, an unbound structure has natural tendency to deform along the slowest normal modes to a state that resembles the bound conformation before the ligand comes in and locks the very state. For structure in the bound state, new contacts (or bonds) are formed not only at the binding site but also throughout the distal domains. The newly defined architecture due to altered pair contacts gives a Hessian distinct from that of the open state. The ‘open’ structure is therefore hardly found along the smoothest path (at the lowest energy cost) of the narrowed energy well (see Fig. 3) of the ‘close’ structure. The disallowed returning journey back to ‘open’ is permitted again upon the ligand release/bond breakage or the second incoming chemical cues.

Conformational selection (Ma et al. 1999; Dror and Bahar, 2006) explains why open → close is easier predicted than close → open. Assuming only the protein takes the conformational change but ligand does not in either the bound or unbound state, the binary system, ligand + protein, evolves along the energy landscapes defined by (1) protein conformational change (with or without the contact of ligand) and (2) the binding energy ΔG, only. The conformational change is approximated harmonically by either atomic- or CG-ENM. “Close bound” state herein is referred to as ‘close state’ in the literature. Protein at the ‘open’ state access a close but unbound state (Dror and Bahar, 2006) along the smoothest deformational path (thin line), namely the slowest few normal modes. The protein in the disfavored “close unbound” state may further change the conformation a bit as being ‘induced’ by the ligand which then draws the whole binary system down to a new energy funnel at the big ΔG relief, in the end of the ligand docking. Since the architecture of protein is redefined by the newly formed contacts (Fig. 1 in Tama and Sanejouand, 2001), in either the “close unbound” or “close bound” (more so) state, the energy profiles (dash and solid lines, respectively) change their shape and curvature (mostly narrower) and the groups of atoms that undergo collective motions in the path open → close may not be identifiable again in the path close → open as NMA being performed on both of these close states. Not until the catalytic reaction on the substrate is complete or the ligand is released upon other chemical cues and in turn ‘pushes’ the structure back open, anharmonically, does the protein architecture resume its ‘open’ state again.
Which CG model is the Best?
Since the late 90's people regained interest in NMA-based models, due to aforementioned simplifications, the initial of ‘X'NM has had a decent coverage over the 26 alphabets. A natural question that arises is: which one is the best? To answer this, we shall first define what ‘good’ is? For a long time, ‘good’ has been acknowledged as reproducing (1) results derived from detailed, atomistic potentials (NMA or MD) and/or (2) experimental results (spectroscopic data, free energy measurements etc). Depending on the type of questions in study, (1) and (2) are not necessarily always the same thing (see below).
Great efforts have been expended on CG-models to reproduce the results obtained from detailed (force-field-based) potentials (Hinsen, 1998; Tama et al. 2000; Li and Cui, 2002). The coarse-grained schemes therein seem motivated from a purely computational point of view. Is there any physical insight gained from coarse-graining and its concomitant simplified potential used, besides the mathematical convenience?
Simplified and Detailed Potentials
Although slow normal modes derived from different potentials and molecular resolutions are found robust within a subspace spanned by 5-6 dimensions (Nicolay and Sanejouand, 2006), the correspondence between modes of different models is not on a one-to-one basis. Kondrashov examined five CG models and found that BNM/RTB (using detailed potential) gave lower mode-to-mode correspondences with the other three CG-EN models investigated in the study than those within CG-ENM themselves: 15-25% lower in the lowest 17 modes (Kondrashov et al. 2007: Fig. 3, statistical results from 83 proteins). In fact, ANM and BNM showed the largest differences in the study: ‘only’ 0.6 to 0.7 agreements were seen between them in the slowest three modes (Kondrashov et al. 2007). Tama and Sanejouand also demonstrated that the simple potential used in ANM actually outperformed the detailed one used in RTB in predicting the protein open→close conformational transitions for 4 out of 5 proteins (Tama and Sanejouand, 2001).
On the other hand, ANM and BNM show identical accuracy in reproducing isotropic displacements (B-factors) although BNM outperforms ANM in predicting ADPs (Kondrashov et al. 2007). Note that to predict B-factors or ADPs requires a summation of all the normal modes. A question that follows is why the sum of all the modes of ANM and BNM agree with B-factors equally well when they differ in their slow modes that should contribute the most to overall fluctuations?
Recently, Bahar and coworkers demonstrated that deleting the slowest mode of GNM does not deteriorate its theoretical agreement with crystallographic B-factors due to the slowest motions being restrained by crystal contacts at low temperature (Yang et al. 2007). A subsequent study along this line, for the same set of 64 proteins, has shown that consecutive deletions of the slowest 8th modes in ANM and > 140th modes in Tirion's model are needed before a reduced agreement with B-factors can be seen (unpublished data). This indicates that the lowest frequency components are not required for a good prediction of isotropic motions of molecules in the crystal although adding those components back barely (if any) decrease the correlations. This more or less explains why almost all the models give reasonable predictions on B-factors.
Coarse-grained and fine-grained ENMs
GNM, ANM and Tirion's ENM have different curvatures in their ‘slowest’ harmonic wells, the curvatures of which are simply captured by the second derivatives of the potentials in the Hessian(s) spanned by the slowest mode(s). However, they show nearly identical accuracy to reproduce B-factors (Eyal et al. 2007). The reason of that can be understood similarly as for simplified and detailed potentials. The study in influenza virus hemagglutinin A (Doruker et al. 2002) nonetheless shows that reduced representations of molecules produce similar shape of slow mode profiles. In fact, the slower the modes are, the more similar they are with each other across a hierarchical, reduced representation. Other evidence shows that the slowest 50 modes derived from fine-grained model (Tirion's) or from coarse-grained model (ANM) can drive the docking of high-resolution structures into the corresponding low-resolution electron-density maps (Tama et al. 2004; Delarue et al. 2004; Hinsen et al. 2005) equally well in the Normal Mode Refinement (Kidera and Gō, 1992a, b).
Harmonic Approximations of Potentials Used in ENMs
The observed difference between detailed-potential- and simplified-potential-derived normal modes is a natural result of harmonic approximations taken at energy minima with different curvatures. The difference remains even when the atomistic Hessian being projected into reduced subspace in the RTB/BNM.
So, which model is the best? For structures staying near their equilibrium states where the dynamics can be characterized by NMR or X-ray, almost all the models perform equally well. GNM and Tirion's ENM predict the size of RMSDs of NMR ensembles equally well (0.74) and slightly outperform ANM (0.68) (Yang et al. 2007) whereas GNM, ANM, Tirion's ENM, βGM and standard NMA predict isotropic B-factors (or the trace magnitude of the anisotropic fluctuations) equally well with a correlation from 0.55 to 0.59 when the crystal contacts are not taken into account (Yang et al. 2006; Eyal et al. 2007).
Large conformational transitions that span across multiple local wells or a hierarchy of energy wells are beyond the reach of harmonic approximations discussed herein. Atomistic- or CG-MD are better approaches to study such transitions (usually accompanied with partial unfolding; see Okazaki et al. 2006) although CG-ENM could still give reasonable predictions along their slowest motional path due to the similarity of the shape of hierarchical global potential envelopes and the approximations by simplified potentials (Fig. 1b; Hinsen, 1998; Tama and Sanejouand, 2001; Tama et al. 2004; Cui and Bahar, 2006), which exhibits a fractal character. More rigorous systematic studies are needed to examine how the difference in the slowest normal modes from different CG-EN models impacts the prediction accuracy in dynamic events at an extended time scale.
Note that slow modes obtained from Principle Component Analysis (PCA) on MD trajectories can well agree with the slow modes obtained from both standard NMA (Kidera et al. 1992b; Kitao et al. 1998) and CG-ENMs (Doruker et al., 2000; Rueda et al. 2007) as long as sufficient length of the simulation is carried out (Kitao et al. 1998). CG-MD models are subject to the sampling problems as much as seen in conventional atomistic-MD simulations but more capable of overcoming such problem given the advantage of much enhanced computational efficiency (see our back-to-back paper in this issue).
Limitation of CG-ENMs
As mentioned, NMA-based models, at fine or coarse-grained levels, are not as valid in handling large configurational changes in protein, which demand crossings of multiple energy barriers, as handling small changes, due to their harmonic approximations for energy minima at equilibrium. However, large conformational changes are generally predicted well along the slowest few normal modes for the aforementioned reasons (see end of the last section). On the other hand, coarse-graining inevitably has inherited problems. As in all the CG models, the dynamics that occur within the level of coarse-graining are not sampled; for instance, the bond vibrations or the side chain reorientations cannot be evaluated in residue-based CG-ENMs. The restoration from CG to full atomic details involves the reconstruction of the backbone atoms and then side chain atoms, which pays computationally. The development of methodology as such is nevertheless nicely addressed (Heath et al. 2007).
Closing Remarks
NMA-based methods, despite the limitation stated above, describe well the equilibrium motions. As for X-ray or NMR-characterized dynamics, the Tirion's or CG-EN models seem sufficient to cover the slowest end of such motions. The deletion of the slowest GNM mode does not hurt the correlation between predicted and experimental B-factors. In fact, the correlation continuously goes up (although moderately) in sequential deletion of the first 10 slowest modes in ANM before it decays back down (unpublished data). Use of simplified or detailed potentials do not change much (if any) of the agreement with experiment. On the other hand, the understanding of multi-barrier-crossing conformational changes that involve partial unfolding and/or induction/perturbation from ligand demands the study from more sophisticated methods such as conventional atomistic-MD, CG-MD or theories such as LRT.
Acknowledgement
LWY and CPC thank Drs Nobuhiro Gō, Ivet Bahar, Qiang Cui, Florence Tama, Dmitry Kondrashov and Eran Eyal for advancing our understanding on LRT, GNM/ANM, BNM/RTB, CNM/DNM and statistical studies of ADPs, respectively. Dr Ivet Bahar's cogent suggestions on manuscript organization are heartily appreciated. LWY and CPC thank the support of the Japan Society for the Promotion of Science and the Ministry of Education, Culture, Sports, Science and Technology of Japan respectively.
