FIGuRE 4.2 Codon bias in diversification using mixtures of nucleotides. The bar graphs indicate the number of codons encoding each amino-acid residue and the stop codon (*) when the diversified region is encoded by NNN (A) vs. NNS or NNK (B).

1.E+20 £ 1.E+18 g 1.E+16 £ 1.E+14 jb 1.E+12 | 1.E+10 ^ 1.E+08 1.E+06 1.E+04 1.E+02 1.E+00

# Diversified Codons

FIGuRE 4.3 Number of possible codon combinations encoded by NNS or NNK increases exponentially with the number of codons that are diversified. The boxes show typical ranges in library size that can be sampled by plate-based high-throughput (HT) screening, microbial display, and in vitro display.

illustrated in Figure 4.3, a typical plate-based, high-throughput screening assay can thoroughly sample a library with no more than two to three NNS - or NNK-diversified positions; phage, bacterial, or yeast display, a library with no more than five to seven diversified positions; and mRNA and ribosome display, a library with no more than eight to nine diversified positions. Once these thresholds are exceeded, only a small proportion of each possible combination of residues allowed by the diversification scheme is sampled, and it becomes highly unlikely that the globally best combination of residues in the diversified positions will be tested and identified.

The sampling problem associated with site-saturation libraries made with mixtures of nucleotides can be mitigated in several different ways. The simplest approach is to limit the number of residues diversified in any single library, and to assume that many mutations discovered in separate libraries with different diversified residues will be additive. An extreme example of this approach are libraries created by scanning-mutagenesis (described in the next section), which diversify only one position at a time. A more aggressive approach, combinatorial active-site saturation test (CAST) (Reetz et al. 2005), was first applied to amino-acid residues found in enzyme active sites. CASTing uses available structural information on the starting enzyme to identify pairs of residues that are close in space, and are thus presumed to interact or to have synergistic effects on enzyme function. A separate site-saturation library is made for each pair of residues, yielding 322 = 1024 unique sequences per library, which are then oversampled by screening 3000 randomly picked transformants. This

# Diversified Codons concept can be extrapolated to display-based selection, for example, by constructing sets of libraries with six diversified positions per library for phage display, or with eight diversified positions per library for mRNA display. An example of this approach is walk-through mutagenesis, which was used to affinity-mature antibodies by constructing site-saturation libraries of one complementarity determining region (CDR) at a time (Barbas et al. 1994). Another major problem of site-saturation libraries that use degenerate oligonucleotides with NNS or NNK codons is the introduction of unwanted stop codons into the diversified region due to the 1 in 32 translational stop encoded by each diversified codon. This problem, too, can be limited by limiting the number of diversified residues.

An alternative to limiting the number of diversified positions is to limit the depth of diversification at each position to fewer than 20 amino acids. The subset of amino-acid residues chosen in this approach depends on information available on the system being engineered, and on the limitations of the method used to encode that subset. An elegant example of such an approach is the use of restricted-alphabet libraries in antibody engineering (Fellouse et al. 2004; Fellouse et al. 2005), which takes advantage of the fact that the two amino-acid residues found most commonly at the interface between antibodies and antigens, tyrosine and serine, are encoded by a single degenerate codon. The theoretical complexity of binary tyrosine/serine antibody libraries, where n diversified positions encode four different amino acids, grows as 2n rather than 32n, allowing the efficient sampling by phage display of libraries with more than 20 diversified positions. Restricted-alphabet libraries can also use a limited codon set that encodes chemically diverse amino-acid residues (Reetz et al. 2008), or they can be informed by structure- or homology-based protein design. The relative advantages of thorough sampling versus a highly diverse sequence space depend on the specific system and design used, with published examples where limiting the alphabet size yielded selected antibodies with lower affinity (Munoz and Deem 2008). Due to the restrictions of genetic code, design-based restricted-alphabet libraries that use nucleotide mixtures require constant compromise between including extra residues that are not part of the design and excluding some of the desired residues (Mena and Daugherty 2005).

Despite their suboptimal sampling of sequence space, the preceding methods, which are based on oligonucleotides synthesized using mixtures of nucleotides, remain the most popular approaches to constructing libraries with site-directed diversity, primarily due to their technical simplicity and relatively low reagent cost. Most commercial suppliers of synthetic oligonucleotides also sell affordable oligo-nucleotides diversified in this manner. However, the last decade has seen the development of several new methods that allow fine control over the exact sequence of diversified regions, including the identity and proportion of specific codons allowed at each diversified position in synthetic oligonucleotides, the exclusion of transla-tional stops, and error reduction. Whereas these new methods allow tighter control over library composition and quality, they are technically demanding, less readily available commercially, and more expensive. These methods are described in detail in the next few paragraphs.

Oligonucleotides that contain defined mixtures of codons unrestricted by the genetic code can be synthesized using defined mixtures of trinucleotide phosphoramidite codons (Virnekas et al. 1994; Kayushin et al. 1996; Yanez et al. 2004), split-and-mix strategies (Glaser et al. 1992; Lahr et al. 1999), or enzymatic ligation of defined trinucleotides (Van den Brulle et al. 2008; Xiong et al. 2008). These methods can be used to introduce into the diversified position a mixture of codons for all 20 amino-acid residues or for a smaller set that follows specific rules, such as those derived by computational protein design. Amino-acid residues inconsistent with library design are avoided, and library sampling is greatly improved.

A further improvement in user control over library sequences has been made possible by recent advances in parallel oligonucleotide synthesis (Singh-Gasson et al. 1999; Pirrung 2002; Cleary et al. 2004; Zhou et al. 2004), which allow the simultaneous small-scale synthesis of 1000 to 100,000 oligonucleotides, each with a defined sequence. Libraries built using such complex pools of defined-sequence oligonucle-otides as the source of diversity (Cleary et al. 2004; Richmond et al. 2004; Tian et al. 2004) allow the control of not only specific amino-acid residues allowed at each diversified position, but also which residues are found close to each other in primary sequence. This in turn makes possible the control over many protein properties that are defined by primary oligopeptide sequence, such as net charge, average hydropho-bicity, and the presence of protease cleavage sites, deamination sites, N-glycosylation sites, and predicted T-cell epitopes. Many structure- and homology-based design constraints can also be incorporated into libraries using this method. Library diversity restricted to oligopeptides compatible with protein design yields a higher density of clones with predicted favorable properties, leading to a vast improvement in physical sampling of the theoretical sequence space of interest. Table 4.1 illustrates this improvement for the case of an enzyme-engineering problem, by comparing the physical library size required to represent a particular protein design using different site-directed library-construction methods (S. M. Lippow, S. Basu, K. Prather, and T. S. Moon, unpublished data).

An additional advantage of libraries assembled from complex mixtures of defined-sequence oligonucleotides is that they are the only source of library diversity compatible with error correction during the assembly process, reducing the impact of mutations introduced during oligonucleotide synthesis. The method used, error correction by consensus filtering (Figure 4.4) (Carr et al. 2004), requires that every oligonucleotide used to assemble the diversified gene be synthesized in both its forward- and reverse-complementary form, then allowed to anneal. Given the random nature of errors introduced during synthesis, it is highly unlikely that the same error will occur in both the forward and the reverse strand encoding a particular variant; thus, a double-stranded fragment containing an oligonucleotide with an error is almost certain to contain a mismatch between the forward and the reverse strand. MutS, a protein that binds preferentially to mismatches, small insertions and small deletions in double-stranded DNA, is then used to remove such mismatched fragments from the mixture, greatly improving the content of wild-type and designed-diversity sequences. For oligonucleotides encoding constant regions in the library or diversified positions with a small number of changes, each oligonucleotide pair can be error-corrected in a separate reaction. For oligonucleotides encoding regions of high diversity, which require the use of a mixture of hundreds or thousands of oli-gonucleotides, the forward- and the reverse-complementary forms of each sequence table 4.1A

Top 48 Predicted sequences spanning Residues 324-334 in Galactose oxidase

Amino-Acid Position

0 0

Post a comment