d d chains. In particular, Xi of Asn, X2 of Gln, and Xi of His have been shown to be difficult to model based on van der Waals interactions alone, because they all have multiple rotamer states with similar packing density. Incorporating the hydrogen bonding potential can help resolve the ambiguity (Word et al. 1999). Recently, Marshall and coworkers presented a computationally efficient formulation for a finite difference Poisson-Boltzmann model that uses one- and two-body terms to describe electrostatic interactions (Marshall et al. 2005). The torsion energy can be described using either a molecular mechanics function or a pseudoenergy function in the form of -w log (p) where w is a constant and pt is the probability of rotamer i. The pseudoenergy is derived from known protein structures to favor the rotam-ers that are observed most often. The inclusion of the term was shown to increase the prediction accuracy (Bower et al. 1997; Liang and Grishin 2002). Finally, the desolvation energy is important to model hydrophobicity (Kono and Saven 2001; Liang and Grishin 2002), but at the moment there is no method that is consistently used to model the effect.

Methods for Placing the Side Chains

Several algorithms have been developed to find optimal side-chain combinations for a given, fixed backbone using the energy functions mentioned in the previous section, including a systematic search for rotamers that avoid steric clashes (Dunbrack and Karplus 1993; Wilson et al. 1993; Bower et al. 1997), Monte Carlo search (Holm and Sander 1992; Vasquez 1995; Liang and Grishin 2002; Jain et al. 2006), genetic algorithms (Tuffery et al. 1991), neural networks (Hwang and Liao 1995; Kono and Doi 1996), mean field optimization (Koehl and Delarue 1994; Mendes et al. 1999; Kono and Saven 2001), dead-end elimination (Desmet et al. 1992; De Maeyer et al. 1997; Voigt et al. 2000; Looger and Hellinga 2001), and graph theory algorithm (Canutescu et al. 2003). Not all algorithms are equally effective and they all have various limitations. For example, the dead-end elimination has been shown to find the optimal combinations of side chains in a number of examples, but the method may not be compatible with a fine-grained rotamer library composed of a significantly larger number of rotamer states. Similarly, Monte Carlo methods can easily implement multibody interactions that may be important for modeling solvation, but the methods do not guarantee a global energy minimum state. Some of these algorithms are discussed in Chapter 16.

Current Model Accuracy

In general, the choice of search algorithm does not seem to affect the accuracy of prediction as much as the energy function or the rotamer library. Model accuracy is typically evaluated in terms of the root mean square deviation (RMSD) of the side chains and the deviation of the torsion angles, where a deviation of less than 20° or 40° from the crystal structure is typically regarded as correct. The residues with a rotational symmetry axis (Asp, Glu, Phe, and Tyr) are evaluated by considering both symmetric conformations and choosing the one with a lower RMSD.

When modeling the side chains, it is useful to be mindful of the upper limits of the accuracy achievable given the choice of a rotamer library. This is because rotamer libraries are usually created so that a minimum number of rotamers (typically, 100 to 200 rotamers) can account for as wide a population of crystal structures as possible, and such rotamers cannot adequately model the tightly packed side chains in the protein cores (Shetty et al. 2003). For example, when using the backbone-dependent rotamer library (Dunbrack and Cohen 1997), the average RMSD for the side chains of 30 high-quality proteins tested could reach 0.56 A, and the percentages of correct prediction (i.e., within 40° deviation) for Xi and a combination of Xi and x2 (Xi+2) from the crystal structures were 98.5% and 94.3%, respectively (Liang and Grishin 2002). The gap between the upper limit values and those predicted may be mostly due to the side-chain contacts of buried residues. However, others have shown that a rotamer library with as many as 7560 distinct rotamers can be used to obtain accuracies of about 0.7 A RMSD for buried residues, or accuracies of about 94% and 89% in terms of x1 and x1+2 (within 20° from native), respectively (Xiang and Honig 2001). And the use of a much larger rotamer library containing 49,042 rotamers can lower the RMSD for buried residues further down to 0.21 A (Peterson et al. 2004). These studies suggest that more finely grained rotamer libraries should be used for buried residues and different accuracy criteria should be used for buried and exposed residues during model evaluation.

Variation in the Magnitude of the Flexibility of Different Residue Types

Buried residues are more accurately modeled than exposed residues, because they are surrounded by large numbers of atoms that sterically restrict their conformations, while exposed residues experience fewer restrictions. The accuracy for modeled buried residues may have reached a maximum because there has been essentially little improvement in recent years (Eyal et al. 2004; Hartmann et al. 2007). However, additional work is required to improve the accuracy of modeled exposed residues. Comparison of several different crystal structures of the same proteins revealed that residues exposed on the protein surface are flexible and that the magnitude of their flexibility is residue type-dependent (Zhao et al. 2001). In another words, the ambiguity of a modeled side-chain conformation is residue type-dependent, particularly for residues on the protein surface. For example, Ser was found to be the most flexible amino acid, followed by Lys, Glu, Gln, Arg, and Met, in that order. Taking their distinct flexibilities into account, it may be necessary to adopt different angular thresholds for different residue types when evaluating a model rather than a common angular threshold for all residue types (Zhao et al. 2001).


Separately modeling the main chain and side chains frequently leads to model structures with severe van der Waals overlaps. Introducing backbone flexibility can alleviate the problem to a certain degree, but even this approach has a limitation. Especially in the protein cores, a simultaneous modeling of both main chain and side chain may be necessary to achieve a desired degree of accuracy. Additionally, a more detailed rotamer library may be necessary to model a region where atoms are tightly packed. For example, in one study (Shetty et al. 2003), a library about 40 times larger than typical libraries of around 100 to 200 rotamers was used to model a buried loop ab initio without atomic clashes.

A Recommended Procedure

Based on the discussion presented in this chapter, it may be useful to think of a procedure one may follow when modeling side chains. Starting with the backbone-independent library of Lovell and coworkers or the backbone-dependent library of Dunbrack and coworkers, one would first generate additional rotamers around the original ones for buried residues. Whether a site is buried or exposed can be roughly determined by counting the number of CP atoms within a certain radius from the center of mass of side chain (Kono and Saven 2001). Alternatively, an 8 A probe sphere centered on the Ca atoms can be used to generate a working definition of buried residues (Marshall and Mayo 2001). This information is very useful because it can be used to limit the amino-acid types at each residue site, thus reducing the number of amino-acid sequences that need be considered. One needs to use an energy function with a minimum of van der Waals interactions but preferably with additional terms to account for the electrostatic interactions, hydrogen bonding energy, and torsion energy (or pseudoenergy-based rotamer probability). Energy minimization can be achieved using any one of a number of available algorithms, such as dead-end elimination, Monte Carlo, or mean field optimization. They are likely to optimize side-chain conformations with reasonably low energy.

0 0

Post a comment