RNA Peptidation

While the evolutionary advantage of switching from ribozymes to enzymes could be straightforward, the switch had to be implemented in compliance with two basic requirements. First, the formation of proteins must strictly obey instructions from the RNA genes, so that the wisdom already accumulated in the RNA sequences through evolution would not be wasted. The system could not afford to twice invent the genes. It followed that a dictionary, or code, had to be developed to translate RNA language into protein language. There were alternative approaches to the process. The coding units, or codons, might contain overlapping or non-overlapping nucleotides and each codon might comprise one, two, three or more nucleotides. Likely most ifnot all ofthese alternatives were explored by the ribo-organisms and those found wanting were eliminated, leaving the non-overlapping triplet code at the end as an optimal balance accomodating enough amino acid variety and not being overly cumbersome. Secondly, planning ahead is a trait that does not surface in biological systems until the vertebrate stage. Accordingly every step of genetic code development had to be accompanied by some immediate advantage for the system. One development mechanism that meets both requirements is the two-stage RNA Peptidation mechanism16 (Fig. 14.3).

The starting point (left, Fig. 14.3) was a ribozyme segment unassisted by any postreplication modification. Later, modifications would be recruited to add extra sidechains to the ribozyme. Since amino acids were available in the prebiotic environment, they would be included in some of the added modifications. The incorporation of aminoacyl- or peptidyl-sidechains to the RNAs finds support in the range of peptide-containing nucleotide type molecules that are utilized by organisms to-day (Table 14.1) and the discoveries ofribozymic aminoacylation ofRNA17-19 and amino acid and peptide activation of ribozyme.20-21 A ribozymic nonribo-somal peptide synthetase (NRPS) system, the enzymic version of which is used nowadays to synthesize peptide antibiotics,22 could be employed to energize peptide-bond formation on RNA. RNA nucleotide sequence directed the sequence in which amino acids were incorporated into a peptide on the RNA, matching different amino acids to cognate three-nucleotide patches, to yield a first-stage peptide-enhanced ribozyme (middle, Fig. 14.3). In flavoproteins, the flavin cofactor is usually firmly bound to the enzyme; in some instances, e.g., D-amino acid oxidase, it may also dissociate revers-ibly from the enzyme. Likewise, peptides and amino acids may be attached to the RNA as in the case of other posttranscriptional modifications, or they may bind to the RNA reversibly.23 With co -valent attachment, unintended binding of a peptide factor to other ribozymes could be prevented and construction of a constellation of multiple sidechains around a catalytic site, as in the case of most enzyme active sites, facilitated.

Even short peptides, e.g., 14 amino acids in length, are known to be endowed with catalytic activity and 106-fold fewer peptide sequences than RNA sequences need to be searched to obtain an effective catalyst.8 Thus the catalytic versatility of the peptide side-chains on the RNA very soon and unmistakably manifested itself. Thereupon they were detached from the RNA to function on their own as primitive enzymes, replacing the ribozymes. In this second, triplet-coding stage, the cognate three-nucleotide RNA patches that directed amino acid incorporation acted as triplet codons (right, Fig. 14.3). Different adaptor RNAs would become aminoacylated with different amino acids, leading to a family of precursors to modern aminoacyl-tRNAs.3 These aminoacylatable adaptor RNAs could be simple stem-loop minihelices,24,25 or elaborate aminoacylating ribozymes.3 Still later, when DNA took over from RNA as genes, the RNA molecules are confined to their present day roles associated

Figure 14.2. Universal genetic code. Codons for different biosynthetic amino acid families are color coded. Pyrrolysine (Pyl) only has part use of UAG and selenocysteine (Sec) part use of UGA. The Asp-family biosynthetic pathways are shown on the right.
Figure 14.3. Development of triplet coding by the RNA Peptidation mechanism.

QB928 LC33 HR15 LC88


Figure 14.4. Growth of B. subtilis strains on Trp and 4-, 5- or 6-fluoro-Trp measured by [33P]-phosphate incorporation into colonies on agar.61

Table 14.1. Compounds containing peptide or amino acid moieties linked to nucleotidyl or organic bases16


Base-Amino Acid Link



UDP-N-acetylmuramyl-pentapeptide Coenzyme A Folic acid Factor 420

Carbon dioxide reduction factor


Adenylated protein

ADP-ribosylated protein

Viral RNA-protein







Flavinoid-linked (Glu)2

Flavinoid-linked (Glu)2



N- and C- glycosides

5'-Protein-pUUAAAACAG-for polio virus

Purine-6-carbamoyl-threonyl-amido group


Bacterial cell wall synthesis Bacterial cell wall synthesis Acyl transfer One-carbon transfer One-carbon transfer One-carbon transfer Antibiotic Protein regulation Protein regulation Primer in RNA synthesis Transfer RNA

Intermediate in protein synthesis with protein synthesis: as rRNA, tRNA, mRNA and ribozymes catalyzing ribosomal peptide bond formation and structural RNA processing.

(iii) Code Expansion

Once the genetic code was established, the sidechain imperative that drove the formation of polypeptide surrogates began to press for increased amino acid variety in the genetic code beyond the Phase 1 amino acids available from the prebiotic environment. The only avenue in this direction was development ofnovel biosynthetic pathways within the cell to produce the Phase 2 amino acids for the code26-28 (Table 1.1). These Phase 2 amino acids added to the protein sidechains with phenyl, indole, imidazole, sulfhydryl, amide and cationic groups. The catalytic adeptness ofimidazole, for instance, is such that a simple Ser-His dipeptide is capable of catalyzing DNA, protein and ester cleavages.29

As Box 14.1 shows, all alphabetic languages require a competent collection ofletters to represent the human voice. Once competence is attained, the alphabet freezes. Likewise, by trying out novel Phase 2 amino acids, selecting those best suited and finding them satisfactory, the genetic code froze for the next three billion years. The prefreeze code expansion was in effect a search for sidechain excellence for the proteins. Whenever a novel amino acid was added to the code, it brought with it new variety but also noise, e.g., when Gln was introduced into the code by pretran synthesis (Fig. 1.4) and took over the erstwhile CAA-CAG codons from Glu, the code benefited from the gain of a new amide amino acid, but replacing all CAA-CAG encoded Glu residues with Gln created considerable noise. Just as one hesitates to talk loudly in a library but not in a market place, noise is more acceptable against a noisy background than against a quiet one. Likewise, the noise created by a novel amino acid carried less evolutionary penalty when background translation error/noise was higher. Thus low error/noise determined when code expansion adding novel amino acids was to cease.28 Because the translation error rate depended on the competence of proteins in the translation machinery, only excellence of performance by all of

Box 14.1. Evolution ofAlphabets

In human languages, the letters of an alphabet represent different sounds of the human voice. Because the human voice is capable of making 40 different basic sounds, one needs up to 40 letters in an effective alphabet. Since a combination of letters can sometimes be used instead of a new letter to represent a sound, e.g., 'sh, 'ch, 'oe' and so on, different alphabets can vary in the number of their constituent letters:

Cyrillic 33

English 26

Hebrew 22

Hungarian runan (archaic) 39 International Phonetic 40 Latin 23

Two characteristics of alphabet evolution are particularly relevant to the protein alphabet. First, the number of letters has to be adequate, for too few letters will under-represent the human voice, ending up with grunts rather than language. Secondly, once an alphabet is established, it tends to stay frozen. An alphabet is a cultural heritage, and any attempt to change its letters, e.g., bringing the number of letters in the English alphabet to equal that of the Cyrillic alphabet, or vice versa, will run into considerable resistance.

By the same token, the number and balance of amino acids in the protein alphabet must be adequate to support a high level of performance by the proteins. For example, a protein consisting ofonly the three amino acids Gly, Ala and Val can generate an enormous amount of information content based on the formula Imax = N Log2(M) by making N huge, but the information generated is not very useful. No matter how long the protein is or how much its sequence is permutated, such a protein will be severely limited in the kinds of enzymic reactions it can catalyze. It lacks the sidechain variety required for excellence. Since the protein alphabet is also a cellular heritage as the foundation of the inherited information in the proteome, it tends to stay frozen as well. Changes to this alphabet will introduce a great deal of nonsensical noise into the protein sequences, and therefore elicit substantial resistance.

these proteins could bring about a low enough error rate to freeze the code. Therefore code evolution would never cease until excellence was achieved in the encoded amino acid ensemble. Accordingly, the accomplishments of protein molecules is not an accident or good fortune but the consequence of rigorous control of code expansion by error-feedback.

The sidechain imperative has been one of the most powerful and insatiable driving forces in prebiotic/early biotic evolution. Starting from the RNA World, it has introduced postreplication modifications, peptidyl ribozymes, Phase 1 genetic code, Phase 2 genetic code, last minute insertion of selenocysteine (Sec) and pyr-rolysine (Pyl) into the half frozen code (Fig. 14.2) and, even after the code became totally frozen, proliferation of posttranslational modifications (PTM) to add many more amino acid sidechains to proteins (Table 14.2). The PTM glycosylations alone include sugars O-linked to Ser or Thr, or N-linked to Asn, in linear or branched chains and comprising combinations ofmannose, galactose, fucose, acetylgalactosamine, acetylglucosamine and sialic acid.

(iv) Code Evolution

The genetic code is a fascinating admixture of the rational in its orderly allocation of the majority of 4-codon boxes to one amino acid or equally to two amino acids and the seemingly irrational such as the split codon domains of Ser. At one time this strange structure of the code and its universality were thought to be the outcome of 'frozen accident'.31 Since then at least three evolutionary mechanisms shaping the code have been identified.

Error Minimization

A feature of the code is that physically similar amino acids tend to occupy codons that are neighbors in the code, neighbors meaning any two codons characterized by only a single-base difference between them. For example, the CUU-CUC-CUA Leu codons, the AUU-AUC-AUA Ile codons and GUU-GUC-GUA Val codons are neighbors and these amino acids all have bulky hydrophobic side chains. As a result, misreading CUU as AUU or GUU generates only minor disturbance of protein structure. Codon allocations that serve to reduce the impact oferrors could be favored in evolution, especially in primordial eons when replication, transcription and translation mistakes were frequent. This mode of error minimization holds not only for Leu-Ile-Val, but also for others such as Ser-Thr, Phe-Tyr, Asp-Glu and Lys-Arg.32-36

However, there are three areas in the code, Non-Ops I-III, where violation of error minimization is self-evident:

1. Non-Op I: The physicochemical difference between any two amino acids can be calculated by the Grantham chemical-difference formula combining the three parameters of composition, polarity and volume. Among the 190 possible pairs formed from the 20 encoded amino acids, Cys-Trp ranks as the least-alike pair: the chemical distance of 215 between Cys and Trp is many times the chemical distance of5 between Leu and Ile.37 Yet the Cys UGU-UGC codons share the same UGN box as the Trp UGG codon.

2. Non-Op II: The ultimate in error minimization is the zero error incurred when a codon is misread for its synonymous codon. Accordingly, it is expected that the first step toward error minimization is to maximize the placement ofsynony-mous codons in neighboring positions. This is complied to by the majority of codons. However, the Ser AGU-AGC codons are not neighbors to any of the Ser UCN codons, in a clear departure from error minimization.

3. Non-Op III: The frequency of total codon usage is 2.48% for Met and 1.39% for Cys in aerobic Escherichia coli K12 and 1.66% for Met and 1.31% for Cys in anaerobic

Methanopyrus kandleri (species closest to LUCA: see Section 15.2). Yet Met is allocated only one codon while Cys receives two. Such disproportionate codon allocations suggests non-optimization of the code with respect to any physical guidelines.

It is now known from the coevolution theory (Section 14.3) that Non-Ops I-III do not represent an abandonment or lapse of error minimization. The UGN box used to belong to Ser, forming a contiguous domain with the Ser UCN and AGY codons. However, when Ser produced Cys and Trp through biosynthesis, it ceded its UGN codons to Cys, Trp and termination signal. Still later, Sec, another Ser-derived amino acid, managed to acquire a share of UGA, thereby joining its Ser-derived siblings Cys and Trp in the same box. This ceding of the Ser UGN codons to its biosynthetic products interrupted the contiguity of the Ser codon domain and caused both Non-Op I and Non-Op II. The single codons given to Trp and Met suggests that these amino acids arrived late in code expansion. Accordingly the ceding ofSer UGN codons was completed evidently not too long prior to the rise ofLUCA and the freezing of the code, leaving inadequate time for evolution to repair Non-Op I and Non-Op II toward error reduction. Likewise, the late arrivals of Met and Trp imply that there was little time to fine tune the number ofcodons they receive and hence Non-Op III. For the same reason, there also might not be time for other late changes in the code to be optimally adjusted for error reduction before the freeze.

Besides insufficiency oftime for adjustment, error minimization is limited by the fact that every codon has six neighbors, three per base position, even counting only the first two bases. Improvement in error reduction with respect to some of the six neighboring positions could bring deterioration with respect to others. Overall, the extent of error minimization achieved in the code is about 40-45% 38,39 and it contributes a 10-6 selection factor toward the emergence of a unique code.34

Stereochemical Interaction

When an aminoacyl-tRNA compound is positioned on the ribosome for peptide bond formation, the amino acid attached to its 3'-terminus might be too far away from the codon and anticodon for direct physical interactions with them. Direct interactions could be more easily effected, however, in the case of a primitive tRNA minihelix25 or where the codon or anticodon sequence is present in the tRNA acceptor stem.40 As well, the amino acid and its anti-codon on the tRNA might both bind to the active site of the aaRS, thereupon interacting with one another either directly on the aaRS or indirectly through the aaRS. For example, a hydrophobic aaRS active site might preferentially bind a hydrophobic amino acid together with a tRNA possessing a hydrophobic anticodon, thereby promoting a general correlation between hydrophobic amino acids and hydrophobic anticodons.5 So both direct and indirect stereo-chemical interactions could be significant in bringing about the experimentally observable correlations between the hydrophobici-ties of amino acids and their codons/anticodons.32,33,41,42

RNA aptamers capable of binding Trp, Arg, Val, Ile, Tyr, Phe, His, Trp or Leu have been found to contain in their amino acid binding pockets a cognate codon or anticodon triplet for the bound amino acid.43,44 These findings add to the suggestion from hydrophobicity correlations that amino acid-codon/anticodon interactions played a significant role in deciding codon assign-ments.41-47 This might be especially the case during the initial codon partition between the Phase 1 amino acids, before either translational errors or coevolution with amino acids became important guides. The Stereochemical Interaction mechanism contributes a 0.04%, or 4 x 10-4 selection factor toward the emergence of a unique code.43

Amino Acid Biosynthesis

Inspection ofthe codon locations for various biosynthetic amino acid families reveals correlations between amino acid biosynthesis and codon assignments.48,49 The mechanisms by which such correlations came to be established in the course ofgenetic code evolution are examined in the next section.

0 0

Post a comment