Abstract
Abstract
The present article introduces a set of novel methods that facilitate the use of “natural moves” or arbitrary degrees of freedom that can give rise to collective rearrangements in the structure of biological macromolecules. While such “natural moves” may spoil the stereochemistry and even break the bonded chain at multiple locations, our new method restores the correct chain geometry by adjusting bond and torsion angles in an arbitrary defined molten zone. This is done by successive stages of partial closure that propagate the location of the chain break backwards along the chain. At the end of these stages, the size of the chain break is generally reduced so much that it can be repaired by adjusting the position of a single atom. Our chain closure method is efficient with a computational complexity of O(Nd), where Nd is the number of degrees of freedom used to repair the chain break. The new method facilitates the use of arbitrary degrees of freedom including the “natural” degrees of freedom inferred from analyzing experimental (X-ray crystallography and nuclear magnetic resonance [NMR]) structures of nucleic acids and proteins. In terms of its ability to generate large conformational moves and its effectiveness in locating low energy states, the new method is robust and computationally efficient.
1. Introduction
The choice of optimization algorithm has a significant effect on the quality of the generated conformations giving rise to a demand for fast and accurate conformational optimization algorithms with widespread applicability. Energy minimization methods such as the conjugate gradient (Hestenes and Stiefel, 1952) or the truncated Newton (Schlick and Overton, 1987) methods are well suited for finding a conformation that corresponds to a local minimum of the energy surface. Locating the global minimum conformation is generally better achieved by simulated annealing type methods (Kirkpatrick et al., 1983; Nayeem et al., 1991). In order to find a diverse set of local minimum conformations, either Monte Carlo minimization (MCM) (Li and Scheraga, 1987; Wales and Scheraga, 1999; Brooks et al., 2001) or stochastic tunneling (Barhen et al., 1997) is preferred. If the aim is to generate ensembles of conformations around each local minimum, then both molecular dynamics (Voter, 1997; Minary et al., 2003, 2004) and Monte Carlo (Metropolis et al., 1953; Swedensen and Wang, 1987; Tanner and Wong, 1987) based methods can be used. The robustness and performance of all the above algorithms can be significantly improved by methods that involve multiple replicas (Geyer, 1991; Kou et al., 2006; Minary and Levitt, 2006) of the system, smoothing transformation (Rahman and Tully, 2002a,b) of the energy surface and global reformulation of the classical partition function (Minary et al., 2008).
The choice of degrees of freedom is also critically important in global optimization but has received less attention. Two sets of degrees of freedom used from the earliest days of biomolecular modeling are the atomic Cartesian coordinate and the single-bond torsion angles. Torsion angles were used first on small peptides as a way to combat the many degrees of freedom of these systems (Gibson and Scheraga, 1967a,b). Soon afterwards, Cartesian coordinates were used in the first all-atom energy minimization of an entire protein (Levitt and Lifson, 1969). While useful for energy minimization, the large number of Cartesian degrees freedom (3 times the number of atoms) makes any conformational search practically intractable. Torsion angle coordinates are more economical with about 4 single-bond torsion angles per amino acid residue on average. Unfortunately, conventional Markov Chain Monte Carlo algorithms (Metropolis et al., 1953) have a very slim chance of accepting a torsional move in a long all-atom biomolecular chain. The problem, which is caused to the long lever arm moving atoms distant from the bond about which the rotation is applied, can be partially circumvented by local torsional moves applied to a segment of the molecule while the rest of the system is kept fixed. The most widespread method in this category is CONROT (Dodd et al., 1993), but there are several alternative approaches such as symmetric (Krishna and Theodoru, 1995) or analytical rebridging (Wu and Deem, 1999), window moves (Hoffmann and Knapp, 1996), and various modifications of the original CONROT method (Umschneider and Jorgensen, 2003). A common result of such local moves is that the polypeptide chain becomes broken; this then needs to be fixed by solving chain closure equations numerically at considerable computational cost. Recently, a new computationally more efficient constant bond length closure algorithm has been proposed by Sklenar et al. (2006); it works by adjusting the position of a common bridging atom and changing corresponding bond and torsion angles so that broken bond lengths are restored. These and former studies by Umschneider and Jorgensen (2003) showed that augmenting torsional with bond-angle space leads to a significant softening of the energy landscape.
In this article, we introduce a new chain breakage and closure method that allows us to use arbitrary degrees of freedom for conformational optimization of biomolecular structures. Thanks to its modest computational cost and its ability to handle arbitrary large chain breaks, our approach combines the beneficial features of the best currently used methods. In our approach, chain breaks caused by independent moves of arbitrary type (for both proteins and nucleic acids) and magnitude can still be restored by adjusting a sufficient number of torsion and bond angles. Furthermore, moves based on our new chain breakage-closure method can be incorporated into any advanced sampling/optimization protocol (Geyer, 1991; Kou et al., 2006; Minary and Levitt, 2006; Rahman and Tully, 2002a,b; Minary et al., 2008). In the present study, we have chosen to use conventional Metropolis Monte Carlo (Metropolis et al., 1953) as our conformational search tool. In this way, improvement in computational efficiency is exclusively a result of our new algorithm.
Our results show that our new stochastic approach is one of the simplest, cheapest, and at the same time, most robust of current closure-based algorithms. In particular, it can restore an order of magnitude larger chain breaks than other closure algorithms at comparable computational cost. Combining collective natural moves and our stochastic closure leads to very efficient conformational optimization protocols that locate conformational energy minima not easily found by other chain closure or torsional sampling algorithms.
2. Methods
2.1. Stochastic partial closure
The central part of our method is called Stochastic Partial Closure (SPC). It is better than the previously used Deterministic Full Closure (DFC) approach by Sklenar et al. (2006) in that it always provides a solution to the closure problem. More specifically, under certain conditions, DFC provides a favorable solution, but otherwise it provides no solution (Fig. 1a).

(
In the absence of the chain break, SPC done with σspc = 0 leaves the conformation unchanged, whereas SPC done with σspc > 0 may still produce a new position for atom 4. In further context, both SPC and DFC also refer to a unary operator that acts on the conformation of a single atom.
2.2. Recursive stochastic closure
Consider a segment of an atomic chain and the next chain atom called the “head.” Recursive Stochastic Closure (RSC) repairs a broken connection between the terminal atom of the segment and the fixed head atom by distributing a bond length preserving deformation along the segment. The segment region that participates in the deformation is called the molten zone whose atoms are numbered from 1 to Nm. Atom 1 is adjacent to the head atom. Succeeding atom Nm, the segment has at least two more “anchor” atoms that do not participate in the deformation.
When RSC is used with σspc = 0, it acts like a projection onto the surface of constant bond length. When RSC is used with σspc > 0, then the chain break is repaired (Fig. 1c) and bond and torsion angles are changed by randomly changing atomic positions along the molten zone. Thus, repeated applications of RSC will produce a series of different conformations for the molten zone. Our algorithm is flexible in that it allows the use of any kind of DFC following after a recursive sequence of an arbitrary number of SPC steps. In this article, we always use the cost-effective algorithm described in Sklenar et al. (2006) for DFC. Note that the value of n depends on the geometry of the system under consideration.
One of the arguments of subroutine “rsc” is the atomic index, i that monitors the internal progress of the RSC cycle. Prior to calling “rsc,” initial condition i = 1 is set so that the cycle starts with the first atom of the molten zone. The other argument, group MG holds the coordinates of molten zone, head, and anchor atoms. The referenced functions, “spc” and “dfc,” implement one step of SPC and DFC, respectively. Besides, both of these functions return the modified conformation of a single atom. Subroutine “rsc” is called recursively until the chain break is repaired.
Lemma 1
Given that Nd is the number of dependent degrees of freedom changed by an RSC cycle, the asymptotic cost of the chain closure problem is ½ cspc O(Nd), where cspc is the number of operations used in one SPC step.
2.3. Monte Carlo recursive stochastic closure
This program performs Nmc MCRSC steps and uses the following key functions and variables: (1) Function E[ … ] takes argument X and returns the energy score for the whole system. (2) Function dE[ … ] takes argument dX and returns the energy associated with the state of all bond and torsion angles that are used to solve chain breaks. (3) Variables Ep and Ea hold the energy score of proposed and accepted conformations. (4) Variables dEp and dEa hold the energy score returned by function dE[ … ] for proposed and accepted closure states. (5) In one-dimensional arrays sX and sXd, the state of X and Xd are saved. At the beginning of each MCRSC step all independent degrees of freedom are updated (line 4) by subroutine update_idof, which is defined in Section 7.1. This is followed by one cycle of RSC by subroutine general_rsc that updates all dependent degrees of freedom and repairs all chain breaks. This subroutine is presented in Section 7.2. The following, m − 1 RSC cycles drive an encapsulated Monte Carlo scheme (lines 7–15) guided by cheaply evaluated energy function dE[ … ]. This leads to conformational relaxation along dependent variables so that energetically favorable chain closures are found. Further conformational relaxation along dependent variables may be achieved by simulated annealing. Finally, the energy score of the new proposed conformation, X is evaluated (line 16) and X is accepted (lines 17–21) according to the Metropolis criterion. In theory, we could choose such a large move size along the independent variables so that some of the chain breaks may not be repaired due to the lack of a geometric solution (line 5). While not indicated explicitly, in this case the whole step is rejected.
Lemma 2
Consider a molecular assembly that has Nc chain breaks and adjacent molten zones. If Nd is the number of dependent degrees of freedom, the asymptotic cost of applying an RSC cycle to all the molten zones is ½ cspc O(Nd) + cdfc O(Nc), where cspc and cdfc are the costs of a single SPC and DFC steps, respectively.
Lemma 3
Consider the molecular assembly of Lemma 2 with N atoms. For any value of Nc and Nd, the asymptotic cost of applying an RSC cycle to all the molten zones has an upper bound (1.5 cspc + cdfc)O(N), where cspc and cdfc are the costs of single SPC and DFC steps.
Corollary 1
Assume that ceO(N) is the asymptotic cost of the total energy calculation in an N atom system. For any Nc and Nd, the asymptotic cost of repairing all chain breaks by RSC cycles does not exceed the asymptotic cost of the energy calculation. If both Nc and Nd are orders of magnitude smaller than N (e.g., the present applications), the total energy calculation will dominate the cost of an MCRSC step.
Our new method is similar to the MCM technique reviewed in Wales and Scheraga (1999) in which the Monte Carlo sampling is guided by the transformed energy function
where min refers to minimization along X. This transformation preserves all local minima and also leads to an energy surface
2.4. Using Monte Carlo recursive stochastic closure for nucleic acids
Figure 2 illustrates MCRSC for nucleic acid structure using the rigid-body motion of the base pairs as the natural variables. In Figure 2a, RSC repairs a chain break between two nucleotides by randomly adjusting the positions of atoms in the molten zone. The number of atoms in molten zone, Nm determines the number of SPC steps in each RSC cycle, n = Nm − 1, because one atom is always moved by DFC. In this study, each molten zone of all-atom nucleic acids is composed of three (Nm = 3) consecutive atoms P, O5′, and C5′. Therefore, in each RSC cycle, the number of SPC steps n = Nm − 1 = 2. Unless otherwise noted, randomness was introduced by setting σspc = 10−4 Å and we used five consecutive RSC cycles (m = 5) in each MCRSC step.

Illustration of how our Monte Carlo Recursive Stochastic Closure (MCRSC) algorithm works on a simple example of a dinucleotide. (
The ten independent degrees of freedom, Xi, of each nucleotide in a DNA double helix are depicted in Figure 2b. Here, six out of ten degrees of freedom are responsible for the global orientation of nucleotides and four provide the internal flexibility of the furanose ring. Unless explicitly noted, the move sizes (see Section 6.3) of independent natural degrees of freedom were set to σ = 2.5 × 10−5 Å for translational, σ = 10−5 radian for rotational and internal angle moves. In addition to breaking the chain along the P – O3′ bond, changes in the independent degrees of freedom may also break the furanose ring bond C4′ – O1′; this ring break is always restored by DFC.
2.5. Using Monte Carlo recursive stochastic closure for proteins
The molten zones are located along loops between structural segments. Here, we place the chain break within the residue that marks the starting point of molten zones which include three additional residues. Because we use two backbone atoms per residue in our protein representation (see Section 6.2), the molten zones consist of Nm = 2 × 3 + 1 = 7 atoms. Thus, the number of SPC steps per RSC cycle, n = Nm – 1 = 6. Randomness was modeled by σspc = 10−4 Å, and we used five RSC cycles (m = 5) in each MCRSC step.
Sets of independent degrees of freedom, Xi, for the relative orientation of segment pairs composed of two rigid β-sheets, two α-helices and an α-helix with a β-sheet are shown in Figure 3a–c. Here, the move sizes (see Section 6.3) of independent degrees of freedom were set to σ = 10−4 Å for translational and σ = 10−5 radian for rotational moves.

Natural degrees of freedom that are used on protein structure. (
3. Results
3.1. Robust handling of chain breakages in nucleic acids
We investigated how the proposed move size along independent degrees of freedom effects the acceptance rate of MCRSC steps. In doing this, translational and rotational degrees of freedom are treated separately because they are expressed in units of Ångstroms and radians, respectively. Figure 4 shows how the average acceptance rates depend on the move size of translational nucleotide degrees of freedom such as Shear, Stretch, Stagger, Shift, Slide & Rise. Clearly, MCRSC works with significantly larger move sizes than does MCDFC introduced by Sklenar et al., (2005). The improvement can be quantified by comparing move sizes that lead to an acceptance rate of 0.3, which is a standard value for practical applications. This gives an improvement of between 0.5 to 1 log10 units (3 to 10 fold) when MCRSC is used with m = 1 or m = 5 cycles of RSC, respectively. Thus, increasing the number of consecutive RSC cycles improves performance but this advantage is reduced for very large step sizes due to large fluctuations in the intermolecular energy. Because evaluating the intra-molecular energy terms of the all-atom force field (Cornell et al., 1995) is inexpensive both MCDFC and MCRSC steps have the same computational cost.

Showing the dependence of the Monte Carlo acceptance ratio on the log of the step size used to propagate the nucleotide base degrees of freedom (Fig. 2b) for the 12 base pair pappilomavirus DNA binding site. Curves are presented for Monte Carlo Deterministic Full Closure (DFC; red circles) and Monte Carlo Recursive Stochastic Closure (MCRSC) with one cycle (m = 1, blue circles) and five cycles (m = 5, blue triangles) of Recursive Stochastic Closure (RSC), respectively. The step size, which is the variance of the normal distribution for all random Monte Carlo steps, is expressed in Å for the independent nucleotide base degrees of freedom, Xi, which are “structural translations” such as Shear, Stretch, Stagger, Shift, Slide, and Rise. The dependent degrees of freedom, Xd, are the same as in Figure 2b. An optimal value of the acceptance rate (Pacc = 0.3) is marked by a dashed red line. Each point represents the average from a trajectory with 1,000,000 Monte Carlo iterations.
3.2. Loop torsion angle sampling in proteins
The exploration of protein tertiary assembly can be effectively performed by Loop Torsion Monte Carlo (LTMC) search (Minary et al., 2008a) which we successfully applied to study protein fold space. Loop torsion angle space, which controls the assembly of rigid segments, can also be explored by using MCRSC with independent degrees of freedom that directly alter the position of rigid segments such as α-helix, β-strand or β-sheet. The deterministic MCDFC method of Sklenar et al. 2006 used as for comparison in Figure 4, is not able to keep the rearranged structural segments connected and other closure based methods (Dodd et al., 1993; Umschneider and Jorgensen, 2003; Krishna and Theodoru, 1995; Wu and Deem, 1999; Hoffmann and Knapp, 1996) that works with multi-atomic molten zones are significantly more costly than MCRSC. Thus we compare MCRSC to LTMC.
Figure 5 shows the average energy of conformational minima found as a function of bond angle stiffness. As bond angles become very stiff, varying the orientation of structural segments is the same as varying the conformations of the loops between them so that the energies found by MCRSC approach those found by LTMC. With softer bond angles lower energies are reached, which suggests that LTMC may not be able to explore space as well as MCRSC.

The average of energies of all conformational minima found by MCRSC (blue) as a function of the bond angle force constant kb. All averages are calculated for twenty independent trajectories started from the crystal structure of a 55-residue protein (α + β fold class; SCOP id: d1div_2) having an α-helix and a small three-stranded β-sheet, and modeled by a previously used coarse-grained knowledge-based energy function (see Section 6.2). The independent degrees of freedom, Xi, that are reduced to model the relative orientation of the α-helix and the central strand of the β-sheet are Shear, Stretch, Stagger, Buckle, Propeller, and Opening (see Fig. 3c). The rigid segments are connected by a loop which accommodates the flexible molten zone with dependent degrees of freedom, Xd, depicted in Figure 3d. The average energy obtained by Loop Torsion Monte Carlo (LTMC) is shown by the red line.
3.3. Conformational optimization in nucleic acids
In Figure 6, we benchmark MCRSC and MCDFC algorithms according to their effectiveness in locating low energy DNA conformations; both the depth of energy minima and the number of Monte Carlo iterations needed to reach them can be used as a measure for performance. The figure shows the three sets of independent degrees of freedom used in this test and marks the location of chain breaks. Figure 6a shows that MCRSC reaches DNA conformations with lower energy than MCDFC. MCDFC works better with rigid body coordinates, which is consistent with earlier studies (Rohs et al., 2005). Figure 6b also shows that MCRSC converges most rapidly. Both tests of MCDFC found new low energy conformational states towards the end of the simulation run.

Analyzing energetic relaxation of chain breakage-closure Monte Carlo trajectories started from regular B-DNA conformation. Location of the chain break (dotted line) and the independent degrees of freedom, Xi (marked brown), for the three protocols used. In the first protocol (RSC-10, blue), MCRSC is employed on the closure problem of Figure 2b; the nucleotide conformation is changed by “natural” moves to give 10 independent degrees of freedom. In the second protocol (DFC-13, red), MCDFC is employed on P–O5′ bond closure; the nucleotide conformation is changed by “natural” moves with two additional main chain torsion angles and one additional main chain bond angle to give 13 independent degree of freedom. In the third protocol (DFC-14, green), MCDFC is employed on P–O5′ bond closure; the nucleotide conformations are changed by “physical” moves that include rigid body motion of the base to give 14 independent degrees of freedom. In all protocols, the dependent degrees of freedom, Xd, include torsion angle variables (magenta) that are depicted next to the bond around which the rotation is applied. Unless indicated otherwise, bond angles at all atoms along the chain also contribute to Xd. (
3.4. Conformational optimization in proteins
In Figure 7, we benchmark the LTMC and MCRSC algorithms for conformational optimization of protein structures. The figure shows the two sets of independent degrees of freedom used in this test; loop torsion angles and natural coordinates serve as independent degrees of freedom for LTMC and MCRSC, respectively. Figure 7a shows that LTMC and MCRSC trajectories are similar when stiff bond angles are used (kb2). When softer bond angles are used (kb1), MCRSC trajectories visit conformational minima with lower energy. Figures 7b shows that the number of MCRSC iterations that leads to convergence does not increase with softer bond angles. Figure 7d shows that MCRSC with soft bond angles not only finds lower energy states but explores the conformational space significantly better in that there are more structures in the low energy values.

Energy relaxation of the native structure of a 55- residue protein (α + β fold class; SCOP id: d1div_2) with an α-helix and a three-stranded β-sheet. Each amino acid residue is modeled using three centers: the alpha carbon, the carbonyl oxygen, and a side-chain atom that is closest to the center of mass of the side chain. All pairwise interactions are described by a statistical/knowledge-based energy (see Section 6.2). Simulations are performed using two different Monte Carlo protocols. In the first one, Monte Carlo Recursive Stochastic Closure (MCRSC; blue box on the upper right), six independent rigid-body degrees of freedom, Xi (shown in brown), describe the relative arrangement of the central β-strand and the α-helix. Recursive Stochastic Closure (RSC) works on the molten zone between magenta atoms and sets the value of the dependent variables, Xd (magenta). In the second one, LTMC (red box on the middle right), the relative arrangement of the two segments is described by six independent loop torsion angles (LT). As torsion angle moves in loops move the entire chain, no chain breakages occur. (
4. Discussion
4.1. Methods that facilitate natural degrees of freedom in nucleic acids
When comparing MCRSC to MCDFC, an established method that works with natural degrees of freedom in nucleic acids, we found that MCRSC works with chain breaks that are an order of magnitude larger. While its simplicity and low computational cost makes DFC a practical closure method for biological applications, our results show that its use largely depends on the break in the chain being small. We found that RSC is more robust while still maintaining the simplicity and low computational cost of DFC. Our result also indicates that consecutive RSC cycles increase the acceptance rate because they correct the stereochemistry that may not be corrected by the first RSC cycle. Therefore, additional RSC cycles help in that they produce an energetically favorable closure by removing bond and torsion angle strain of the atomic chain spanning the molten zone.
4.2. Robust algorithms for chain closure
Our results indicate that replacing the single atom molten zone used by DFC with the multi-atom molten zone used by RSC enhances robustness in restoring large chain breaks. In general, increasing the number of atoms in the molten zone or equivalently the number of degrees of freedom, Nd, that is used to restore chain breaks enhances the robustness of RSC. In addition, as it was demonstrated in Section 2.2, the computational complexity of RSC is O(Nd). This linear scaling in Nd enabled us to use up to Nd = 2 Nm + 5 = 19 (Nm = 7 was used for proteins) degrees of freedom for chain closure. In contrast, the computational complexity of current multi-atom closure algorithms is O(fNP(Nd)), where fNP refers to a non polynomial function. Thus, current algorithms are limited to a value of Nd = 6 and the degrees of freedom include either six torsion angles (Dodd et al., 1993) or three bonds and three torsion angles (Umscneider and Jorgensen, 2003). By defining six constraints, the complicated geometric problem of six equations with six unknowns could have from 2 to 12 numerical solutions (Dodd et al., 1993) with average dihedral angle variations of 40°. Jump between different solutions usually leads to a steric clash between the rigidly attached side chains and surrounding atoms (Hoffmann and Knapp, 1996). In contrast to these algorithms, solutions are iteratively improved by each RSC cycle that builds on and refines the solution found by the preceding cycle.
4.3. Methods that facilitate natural degrees of freedom in proteins
We investigated how MCRSC relates to LTMC which we have shown to efficiently explore protein conformations along natural degrees of freedom in applications including model peptides (Minary and Levitt, 2006) and structures representing all protein folds (Minary and Levitt, 2008). Our finding implies that as bond angles become stiff MCRSC and LTMC sample from the same space of arrangements of secondary structure segments. Our results also indicate that soft bond angles permit more variety in the orientation of structural segments so that MCRSC is able to discover energetically more favorable tertiary arrangements. This finding high-lights some limitations of LTMC that could be overcome by varying bond angles in addition to torsion angles. Unfortunately, this would introduce additional independent degrees of freedom, which would slow down the search of conformational space. By contrast, the independent degrees of freedom of MCRSC are strictly restricted to the orientation of segments. All loop conformations are changed by the dependent degrees of freedom so as to keep the segments connected; this does not slow down the search. When studying larger systems, simple variations in the loop torsional degrees of freedom lead directly to steric clashes that reduce the search efficiency. Again MCRSC is more successful because of local moves that vary the orientation of a segment but leave the rest of the molecule unchanged. Without bond angle variation there is a rough energy landscape that LTMC is unable to explore by simple Monte Carlo protocols as described in Metropolis et al. (1953). None of these limitations are present with MCRSC which explores the smooth loop torsional & bond angle energy surface and samples short and long protein chains with equal efficiency. Note, that the use of natural degrees of freedom for conformational exploration by the popular fragment based algorithms of Simons et al. (1997) is restricted to non-continuous sampling.
4.4. Conformational optimization of proteins and nucleic acids
In our study of conformational relaxation of DNA, we compared the two most cost-effective algorithms, MCRSC and MCDFC. Using the natural degrees of freedom, we found that relaxation driven by MCRSC leads to lower energies and converges faster than relaxation driven by MCDFC. In addition, the distributions of the energy imply that MCDFC gets easily trapped in various conformational basins with diverse energies. The superior performance of MCRSC is partly due to treating some of the independent degrees of freedom of MCDFC as dependent degrees of freedom that are adjusted to further accommodate natural moves. For example, in the present case the three variables are the two torsion angles (γ, ɛ) and the bond angle at O3′. Treating these variables as dependent in MCRSC not only facilitates natural moves but also improves convergence by reducing the number of independent variables. In accordance with previous studies by Rohs et al. (2005), we found that MCDFC performs well if the global orientation of individual nucleotides is varied by simple translational and rotational moves.
MCRSC was also benchmarked in relaxing the tertiary structure of our protein example. In accordance with our previous findings, MCRSC with stiff bond angles and LTMC relaxed the initial structure with the same efficiency. This implies that in average they converge to very similar conformational states. Removing the stiffness from bond angles opens a passage around torsional energy barriers so that MCRSC could discover new favorable tertiary arrangements. The large variation of their RMSD distance from the initial state implies that the new arrangements are not just energetically refined structures found by LTMC but ones that may represent new fold topologies.
4.5. Limitations of our work
Any conformational modeling study is as good as the force field that is used by the sampling or optimization algorithm. In this article, we focus on the description and testing of a novel algorithm rather than characterizing conformations we generate. While successful biological applications that use our algorithm may rely on a particular force field, testing force fields or reproducing biologically relevant findings is not in the scope of the current article. Benchmarking the performance of alternative multi-atomic closure methods has also not been done in the current study, because either they have not been shown to work with the present type of natural moves, or their computational complexity does not allow the use of a sufficient number of closure degrees of freedom.
Like MCM, MCRSC is designed to sample a transformed energy surface in which some of the transition energy barriers are removed. Thus, both MCM (Li and Scheraga, 1987; Wales and Scheraga, 1999) and MCRSC are not expected to generate canonically distributed conformational ensembles along all degrees of freedom of the original energy surface. Nevertheless, MCRSC may be used as a canonical sampling method along the independent degrees of freedom. Proof of this property is not in the scope of our current work, but it is similar to that used to show that the very successful MCM explores conformational space well.
5. Conclusion
This article has presented a novel approach to the chain closure problem. It differs from existing work in that it combines their beneficial features such as simplicity and cost effectiveness, but is more robust and generally applicable. Thus, large chain breaks that are caused by functionally relevant natural moves can still be restored. As a result, the new chain breakage-closure protocol has been demonstrated to explore conformational space more effectively then currently used protocols for both nucleic acids and proteins. Given its capabilities, use of MCRSC is expected to advance our understanding of a large variety of problems in structural biology and macromolecular biochemistry. Such problems include protein and RNA folding, small molecule docking, protein-protein, and protein-nucleic acid interactions and the refinement of experimental structures.
6. Materials
6.1. Nucleic acid structures and models
Initial DNA conformations are obtained from the idealized B-DNA form of the 12 base-pair pappilomavirus E2 protein DNA binding site (5′-ACCG
6.2. Protein structures and models
The protein studied here has 55 residues and represents one of the α + β fold classes from the SCOP-1.71 database introduced by Murzin et al. (1995). The structure of this particular protein was obtained through the ASTRAL database (Brenner et al., 2000) by following SCOP id: d1div_2. Built on previous work (Minary and Levitt, 2008) the protein was described by a three center per residue knowledge based model. Here, the three centers used for each residue are the Cα atom, the carbonyl oxygen and a single side chain atom used to represent the center of mass of the side chain. Any side chain atom used for a particular residue is taken as the atom that is most commonly closest to the actual center of mass of the side chain in known proteins. In this way the knowledge based energy function derived on atomic positions can be easily adopted. The current coarse grained potential has been extensively tested in Minary and Levitt, (2008) to reproduce near native state features of a large number of folds.
6.3. Monte Carlo move
In all Monte Carlo methods, new conformations are selected from an N dimensional normal distribution,
6.4. Software
Idealized B-DNA form was generated by the nucgen software package described in Bansal et al. (1995). All methods and algorithms described in this article are implemented in our MOSAICS software package (Minary, 2010), which is available on request from the corresponding author.
7. Appendix
7.1. Updating independent degrees of freedom, Xi
We can divide any molecular assembly into Ns arbitrary structural segments, S k , for k = 1, … , Ns. We chose a center group, Ck, comprising of three atoms in the center of each segment and group, SGk containing the coordinates of all the remaining atoms in S k . The independent degrees of freedom of the system, which can be internal to a segment or else global, are updated using the following subroutine:
First, the positions of each atom in SGk are resolved into internal coordinates (line 2) using subroutine resolve defined in Section 7.3. Next, all independent coordinates are updated (lines 3 and 4, see Section 6.3). Internal independent degrees of freedom, iXi may include bend and torsion angles whereas all information about the global independent degrees of freedom, gXi is encoded into the Cartesian position and orientation of each center group, Ci; updating the position and orientation of the center groups effectively updates gXi (line 4; Figure 3a,b). Finally, we calculate the Cartesian coordinates of all segments by subroutine rebuild defined in Section 7.3.
7.2. Setting dependent degrees of freedom, Xd
After all independent variables are updated (see Section 7.1), there may be chain breaks present in the structural assembly. Adjacent to the break we have a molten zone. Following Section 2.2, the atom across the break from the molten zone is known as the “head” atom; the two atoms on the other side of the molten zone are known as the “anchor” atoms. In general, we can assume that there are Nc chain break locations and adjacent molten zones, M l , l = 1, … , Nc. In MGl, we store coordinates of the molten zone plus the head and anchor atoms associated with M l . In order to maintain all connectivity (e.g., branching atoms, side chains) when the position of M l is changed, additional atoms need to be moved. For each M l , the coordinates of all these atoms are stored in BGl. The execution of one RSC cycle in each molten region is done by the following subroutine.
First atoms in BGl are resolved into internal coordinates (line 3). Next subroutine rsc (see Section 2.2) is executed (line 4) where the recursive iteration is started with the first atom. Finally, all atoms in BGl are rebuilt into Cartesian coordinates (line 5). The loop is going through all molten zones.
7.3. The missing subroutines
Given group G = {
For each atom function “base” returns the indices of three atoms whose Cartesian coordinates define the local reference frame used to express the atom position in internal variables. Therefore, ordering of atoms from 1 to Ng is done in a way such that the base atoms of an atom are expressed in Cartesian coordinates every time when the atom coordinates are transformed. Note that base atoms that are not in G are always assumed to have Cartesian positions while these transformations are performed. To obtain rotation matrix,
7.4. The missing proofs
Footnotes
Acknowledgments
This work was supported by the NIH grants GM-41455 and GM-63817, as well as Human Frontier Science Program (HFSP) award RGP0024/2008.
Disclosure Statement
No conflicting financial interests exist.
