Predicting Native Protein Core Sequences

With few exceptions, the native sequence of a protein can be assumed to be among the most stable sequences for a given fold. Though evolution is driven by additional factors besides thermostability, and sequence variation between species strongly demonstrates that a given fold is compatible with many sequence solutions, the ability of a design algorithm to predict the native sequence in its solution set is an excellent first test of its accuracy. If the native sequence does not score favorably, then it is likely that the force field used in the design has missed an important considerations for the system and must be revised.

One of the first efforts at computationally predicting alternate protein core sequences explicitly considered the need for native-like sequences in the solution set (Ponder and Richards 1987). Their model consisted of a template with fixed main-chain and ^-carbon atoms, and side-chain conformations given in a rotamer library. The rotamer library (discussed in Chapter 13) was derived from the existing protein structure database. Protein sequences compatible with the target structure were identified by passing two filter criteria: a vdW and a core volume calculation. A hard-sphere model for vdW repulsions excluded structures with clashes in the core, while a side-chain volume constraint maintained that the new side chains should occupy a similar volume as the wild type. The latter constraint reflects the experimental observation that cavities in the core were typically destabilizing (Eriksson et al. 1992). This eliminated many obviously incorrect sequences, but the resulting set was still too large to experimentally investigate.

Though modern protein design algorithms do not explicitly bias their results toward native sequences, it is still important that the native sequence and similar sequences are scored favorably, even if the algorithm rarely recapitulates the native sequence as the lowest energy configuration.

0 0

Post a comment