Backbone flexibility

Power Efficiency Guide

Ultimate Guide to Power Efficiency

Get Instant Access

The initial computational protein design methods were developed to accept a predetermined backbone structure as an input and keep it fixed throughout the design process. This was necessary because it lowered the complexity of the search space by eliminating the need to consider the degrees of freedom for the backbone and as a result decreased the computation time. Also the energy function did not need to discriminate between energetically favorable backbones with unfavorable ones, so by utilizing it together with rotamer libraries one could disregard covalent interactions from the energy function altogether. Although fixed-backbone design methods have been very successful at various applications (Kuhlman and Baker 2004; Lippow and Tidor 2007), they have severe limitations (Figure 18.1). One is their tendency to predict false negatives. By keeping the backbone fixed, the energy landscape for potential sequences becomes more rugged and less physical. Certain residue combinations, which may be compatible with slight changes in the backbone, would be considered sterically incompatible (Desjarlais and Handel 1999). Another problem arises when applying the fixed-backbone protein design methods to a de novo backbone. Although a natural protein backbone is known to be the ground state or near ground state conformation for a natural sequence, the designability of a de novo backbone structure is not known. To select a designable backbone, you may have to sample many different local backbone conformations and find an energetically favorable structure for some sequence(s) to fold into, which necessitates the consideration of explicit backbone flexibility (Harbury et al. 1998; Kuhlman et al. 2003).

These limitations with fixed-backbone protein design have been pointed out since the beginning of the field (Vasquez 1996). Many attempts have been made to incorporate backbone flexibility into computational protein design methods by utilizing different approaches that either simplify the backbone structure, use an ensemble composed of different backbone structures and design each structure individually, sample multiple backbones by optimizing the backbone and the side chains simultaneously, or a combination of these methods. Different methods resulted in different levels of success. Descriptions of various flexible-backbone protein design methods with their pros and cons will be given in this section.

Parameterization of Structures

Algebraic parameterization of regular secondary structures or folds can introduce backbone flexibility while minimizing the degree of freedom of the backbone. Parametric representation of supercoiled helices was used by Harbury and cowork-ers to design a de novo right-handed coiled coil trimer and a tetramer (Harbury et al. 1995; Harbury et al. 1998; Plecs et al. 2004). Although the crystal structures of the trimer and the tetramer showed high correlation to the design, the method was restrictive in that the parameterization was used to search for backbone coordinates that satisfied a predetermined amino-acid sequence. Su and Mayo also utilized secondary structure parameters for a/p proteins to redesign the core residues of protein G p1 domain (Su and Mayo 1997; Ross et al. 2001). They did not see any drastic differences between the design sequences obtained using backbones that

FIGuRE 18.1 (see color insert following page 178) The fallacy of fixed-backbone protein design. (1) Mutation of residue 59 of 434 cro from Leu to Phe is disruptive and is not allowed by fixed-backbone design (Desjarlais and Handel 1999) (a). When the backbone is allowed to move the clashes seen with fixed backbone are alleviated (b). (2) Crystal structure of Top7, the first de novo design of a novel backbone fold. The design of Top7 showed that to select a designable backbone, backbone flexibility is necessary. Clashes are shown as red disks. All figures were generated using PyMOL (Delano 2002).

FIGuRE 18.1 (see color insert following page 178) The fallacy of fixed-backbone protein design. (1) Mutation of residue 59 of 434 cro from Leu to Phe is disruptive and is not allowed by fixed-backbone design (Desjarlais and Handel 1999) (a). When the backbone is allowed to move the clashes seen with fixed backbone are alleviated (b). (2) Crystal structure of Top7, the first de novo design of a novel backbone fold. The design of Top7 showed that to select a designable backbone, backbone flexibility is necessary. Clashes are shown as red disks. All figures were generated using PyMOL (Delano 2002).

deviated from the parent backbone and concluded that the protein design method is robust enough to tolerate significant amount of perturbation to the backbone. However, the nuclear magnetic resonance (NMR) structure of a design with a large translational perturbation of the helix along the sheet axis had backbone that was closer to the parent backbone than the designed. A recent study by Fu and cowork-ers used normal mode calculations for helices to parameterize the backbone of Bcl-xL (Fu et al. 2007). Based on a study that shows that backbone movement on a helix can be mostly captured by three low energy modes, they generated multiple backbones using normal mode analysis. Self-consistent mean field method was used to prune the rotamer library and then sequences were designed onto the backbone using a Monte Carlo procedure. By using this protocol they designed peptides that bind to Bcl-xL with nanomolar affinity. The limitation of this method is that the normal modes used have to capture most of the structural variation; thus guidance from pre-existing structures might be necessary to produce designable backbones. Parameterization of structures is a simple way to decrease the degree of freedom for backbone modeling but some of the general limitations are that it does not allow explicit backbone flexibility at every position of the protein and that it may not be generally applicable to complex motifs with nonsymmetrical folds or to irregular structural changes.

Ensemble Approach

Various methods have been attempted that incorporate backbone flexibility into protein design by using an ensemble of structures with slight differences in the backbone. In the ensemble approach, starting backbone structures are generated from multiple x-ray structures, from different models in a single NMR structure, from Monte Carlo perturbation runs, or from multiple snapshots along a molecular dynamics simulations trajectory. Then each individual structure in the ensemble is designed using the fixed backbone assumption. For the search algorithm, mean field methods have been applied to many designs that use the ensemble approach. The sequence information from the large numbers of designed structures allows you to calculate the specific probabilities and entropies of residues at each position in the context of all other residues and the backbones. Koehl and Delarue thus optimized the backbone and side chains simultaneously in loop designs of bovine pancreatic trypsin inhibitor by using the self-consistent mean field method with the ensemble approach (Koehl and Delarue 1995). Although in the loop designs a relatively small set of backbone conformations (10 or less) from short protein segments (five residues or less) were used, it was suggested that this method could be applicable to full sequence protein design problems. Kono and Saven utilized an entropy-based statistical method analogous to the self-consistent mean field method and 21 model backbones from an NMR structure to sample the sequence space of Protein L (Kono and Saven 2001). Using 30 backbone structures generated by a Monte Carlo simulation, Kraemer-Pecore and coworkers simulated the full sequence design of the WW domain (Kraemer-Pecore et al. 2003). In this study, each member of the ensemble was designed individually before an exhaustive search of all rotamers was carried out at each position with all other positions fixed. The energies from these runs were used to calculate the probability of all amino acids at each position. When the amino acids with the highest calculated probabilities were selected to produce a designed WW domain, the designed protein had the correct fold but was less stable than the native WW domain. Design of a large ensemble of backbone variants using a distributed computing network has also been explored (Larson et al. 2002; Larson et al. 2003). Generation of a large ensemble of 100 structural variants of the target structure was produced by a Monte Carlo procedure, and fixed backbone protein design was conducted on each of the individual structures by the [email protected] distributed computing system. The results of such calculations were used for fold recognition for structural and func tional genomics. A general restriction with the ensemble approach is that a limited number of backbone conformation needs to be specified in advance.

Simultaneous Optimization of Sequence and Structure

Methods incorporating backbone flexibility by optimizing the backbone and side chains simultaneously might represent a more accurate relaxation of local structures because of cross talk between the two components during optimization. The self-consistent mean field method used in the loop design mentioned previously, where the mean field of both the backbone and the side chains are simultaneously considered, is an example of this method. Another approach to this method is the use of molecular dynamics to allow backbone flexibility synchronously with side-chain adjustments. However, during side-chain repacking simulations the protein structure could get trapped in a local minimum. To prevent this from happening simulated annealing protocols are used but this also poses problems because explicit waters and overall protein folds can be distorted. Riemann and Zacharias suggested the use of a potential scaling molecular dynamics method to overcome these issues (Riemann and Zacharias 2005). The smooth rescaling of the potential during simulation allows the lowering of energy barriers, while minimizing the distortion of the protein fold and explicit waters. Using initial structures with arbitrary perturbed buried side chains and backbone, the potential scaling molecular dynamics method resulted in a better side-chain prediction compared to a fixed backbone side-chain packing algorithm (SCRWL3.0). A third approach of simultaneous backbone-side chain optimization is to iterate between backbone deformation and sequence optimization at all positions in the protein. The iterative strategy was adopted by Desjarlais and Handel, who used genetic algorithm for backbone perturbation and sequence selection on a pool of backbone structures obtained by altering random phi and psi angles of the parent template (Desjarlais and Handel 1999). Typically, a refinement step follows in the end where Monte Carlo is used for rotamer optimization and small backbone movements. Using this method, they designed a core variant of 434 cro protein with a mutation known to be incompatible with fixed-backbone design and a melting temperature slightly lower than the wild type. Unfortunately, the stability measurements of various 434 cro and T4 lysozyme protein variants showed that the predictive power of the flexible-backbone method is worse than those of the fixed-backbone method. Interestingly, in side-chain structure prediction comparison, the more the backbone deviates from that of wild type, the better the flexible-backbone prediction becomes compared to the fixed-backbone prediction. A similar result was obtained by Yin and coworkers, who compared the experimental AAG with the computational AAG for five proteins and their mutants using either a flexible- or a fixed-backbone prediction (Yin et al. 2007). They observed that in three out of five cases, the fixed-backbone prediction gave better correlation compared to flexible-backbone prediction method, although flexible-backbone method gave slightly better prediction if the mutations are classified into different types. In general, there is a concern with flexible-backbone design methods where an increased number of false positives are outputted by being too permissive in the prediction of mutations. As a result, mutations that are calculated to be energetically favorable in a flexible-backbone design often turn out to be destabilizing.

Another example of iterative backbone and side-chain design is the method used in the design of a novel protein fold (Kuhlman et al. 2003). Because the designed protein adopts a nonpreexisting fold, it is considered to be a more rigorous test of the flexible backbone design method. First, a pool of backbone models was created by using fragments from the PDB database that had the secondary structure of interest. Each of these backbones was used for design. Then based on the resulting sequence, the backbone was optimized using a Monte Carlo minimization procedure. Iterations of the sequence and backbone optimization were carried out to produce a novel a/p fold protein that folded into the topology of interest with atomic accuracy and was stable. An interesting observation from this study is that the energies of the designs from the initial starting backbone structures had worse energies compared to natural proteins, corroborating the fact that not all backbones are designable and that optimizing the backbone for the sequence is a critical step in the flexible-backbone design procedure. The ensemble approach, however, does not optimize the backbone and sequence simultaneously, but rather the backbone of the design is determined prior to sequence design. The same iterative flexible-backbone protocol was used to design a novel 10-residue loop on the tenascin protein (Hu et al. 2007). A detailed study of this iterative method, subsequently carried out using native protein structures and sequences (Saunders and Baker 2005), showed that in order to construct a designable backbone, it is important to sample a larger structural space around the starting structure. This was achieved by utilizing a high temperature Monte Carlo melting procedure, torsional minimization, and systematic substitution of the fragments that cause the greatest disruption in the global structure. However, when this modified iterative flexible backbone design method was used, there was no energy discrimination between the correct backbone configuration and the incorrect one. A similar difficulty was reported by Desjarlais and Handel, who observed no significant correlation when predicting the stability of proteins using the Amber/OPLS potential to represent the backbone (Desjarlais and Handel 1999). Although the current energy function may be sufficient to design a backbone, there is a clear need for improvement in the current energy function for a more reliable backbone selection (Bradley et al. 2005). Recent fold prediction studies also show that selection of good models cannot be achieved by considering energy alone (Bradley and Baker 2006). In this study, clustering based on structural similarities was more robust than using energies. In the loop design of tenascin, designs were filtered based not only on energy but also on solvent accessible surface area pack score and the number of unsatisfied hydrogen bonds (Hu et al. 2007). In the study by Fung and coworkers, root mean squared deviation (RMSD) from the template structure was used to rank structures (Fung et al. 2008). These results show that the development of energy functions, which can accurately predict a favorable backbone from an unfavorable one, is still needed as we migrate from fixed- toward flexible-backbone design. Furthermore, other methods to rank good structures may be helpful at times, since the energy function is currently not accurate enough for some applications. An interesting conclusion from the Saunders and Baker study (Saunders and Baker 2005) is that the occupancy of the sequence space of iterative flexible-backbone design overlaps more with those of the natural homologs compared to the ensemble approach. This shows that the iterative approach does not result in increased false positives, which are often seen with flexible-backbone design.

The Future of Flexible-Backbone protein Design

Although the successful flexible-backbone design results described in the previous section reflect the extent to which the field has progressed, there is still room for improvement, as can be seen from the inconsistent results of sequence recovery and the modest correlation between computational and experimental stability of known proteins. The complexity of the flexible backbone design is still considered to be too large to obtain complete coverage, and extensive conformational sampling of structurally diverse populations is critical for its success. Although conformational sampling has led to many successful results, proper sampling of the conformational space still remains the primary bottleneck for accurate structure prediction (Bradley et al. 2005). This is probably the case for flexible-protein design as well, since con-formational sampling does not guarantee identification of an optimal solution over the backbone/sequence search space. The line that separates inverse folding and structure prediction has gotten very vague in the recent years. Utilization of structure prediction protocols in protein design and vice versa are common, and there are also examples where structure prediction is used to evaluate the design result (Bradley et al. 2004; Hu et al. 2007).

Despite many significant achievements in the past incorporating backbone flexibility into protein design, myriad challenges lie ahead. The infinite complexity of the sequence and structural space and the limited accuracy of the current energy functions are some of these grand challenges, and resolving these issues would require novel approaches. One approach to reduce complexity might be allowing not random but "realistic" movement in the backbone, such as the application of "backrub" motions of natural proteins to simulate backbone flexibility (Davis et al. 2006). A precise energy representation of the backbone is not necessary to describe the "back-rub" motion, as it is characterized by low energy and results in only favorable backbone. It is also local with no motion beyond two residues, which is appropriate for protein design, since the goal is not to sample a completely different fold but to allow small deviations from the parent backbone to accommodate for sequences and conformations that would be incompatible with fixed backbone (Smith and Kortemme 2008). Novel algorithms for flexible-backbone design that restrict the complexity are surfacing, such as the development of dead-end elimination (DEE) for flexible backbone (Georgiev and Donald 2007). Georgiev and Donald suggest reducing the complexity of the problem by allowing flexibility in both backbone and side chains but utilizing DEE algorithms to prune the rotamers not in the GMEC in the context of flexible backbone and minimized rotamers. Fung and coworkers have developed a new algorithm that identifies an optimal backbone via a continuum template and NMR structure refinement (Fung et al. 2007; Fung et al. 2008). Continuous values of Ca - Ca distances and dihedral angles within a preset boundary were considered and by using this method they showed that the sequences of the designed P-defensins recapitulate the sequences found in nature to a large degree. However, the energy function used for side-chain selection is based on the Ca - Ca distance and not on explicit consideration of side-chain rotamers. The accuracy in the prediction of structure, energetics, and design is important, especially with flexible backbone incorporation. In-depth study of the successful results found in the literature as well as of the various failures not reported is therefore a necessary step toward developing a reliable and accurate flexible-backbone protein design method.

Was this article helpful?

0 0

Post a comment