## Physicsbased Force Fields

DIY 3D Solar Panels

Get Instant Access

One of the most important elements of any computational protein sequence design strategy is an ability to measure the "compatibility" of a sequence for a given target protein structure conformation, which combines stability and specificity.

Compatibility is usually measured as the free energy difference between the target structure and an unfolded conformation. We start this section by defining the free energy of a protein, and then show how it applies to sequence design. The rest of the section discusses various methods of computing the free energy.

Stability of Protein Sequences: Application to Sequence Design

The stability of a protein sequence P is a thermodynamics quantity. It is measured as the difference AG(P) in free energy between its native state, N, and an unfolded state, U:

Note that P here refers to the "solvated" protein, that is, accounts for the protein and its surrounding solvent and ionic atmosphere. G refers to the Gibbs free energy of the system, defined as

The three terms are the internal energy U of the protein, its entropy S, which is a measure of disorder, times the temperature of the system T, and the product pV, where p and V are the pressure and volume of the system, respectively. Since the pressure and the volume can be considered constant for proteins in solution, Equation 12.1 can be rewritten as

A native-like, "stable" sequence has a minimum (negative) AG(P): ideally this is reached when AU(P) is minimum and AS(P) is maximum. However, AU(P) is minimal when the native state has many stabilizing, nonlocal contacts, which requires an organized structure, whereas the entropy is maximal when the structure has a low level of organization. Stability therefore is reached through a compromise between these two effects. To estimate the free energy of a protein, we need to compute its internal energy U, and sample its conformational space to measure the entropy S. This sampling is usually performed through simulations.

A typical computational protein sequence design experiment starts from a known protein structure template N and tests the "compatibility" of many sequences for this template, searching for sequences that are both stable (positive design) and specific (negative design) to the structure N. Two putative sequences P0 and P1 for N are compared based on their stability, as defined by Equation 12.1 (Figure 12.1):

gn(p1)

Unfolded

Folded