Transfer RNA

With the search for LUCA based on protein paralogs troubled by artifacts, other biopolymers have to be looked to for sequence information. DNA and ribosomal RNAs are not useful in this regard because there is no DNA or rRNA paralog in cells. This leaves only the tRNAs. A limitation with tRNA sequences has always been that they are too short, containing only about 75 bases, some of which are only semi-variant. Thus the amount of sequence information from any particular tRNA sequence is small. This limitation may be overcome however by analyzing the entire tRNAomes of species. Since the genomes of free living organisms contain more than thirty tRNA genes, altogether these genes will furnish over two thousand base residues.

(i) Alloa cceptor Distan ces

Based on the coevolution theory of the genetic code (Section 14.3), during the development of the code some of the tRNAs belonging to precursor amino acids were transferred along with their anticodons to product amino acids. That being the case, the kinships between the original tRNAs and their transferred copies could leave behind detectable sequence similarities between same-species tRNAs with different amino acid acceptor specificities, which may be designated as alloacceptor tRNAs, in distinction from isoacceptor tRNAs that accept the same amino acid. Comparisons ofalloaccep-tors disclose many tRNA pairs exhibiting a high degree ofsequence homology, especially among species in the Archaea domain on the the universal tRNA phylogenetic tree. The tRNAPhe-tRNATyr pair from the archaeon Aeropyrum pernix (or Ape—see species abbreviations in Fig. 15.1) strikingly differ from one another at only four base positions. In comparison, there are distinctly more differences between the same pair from either the bacterium Eco or the eukary-ote Ecu (Fig. 15.2). These findings suggest that tRNAPhe-tRNATyr are paralogs derived from a primordial gene duplication. Just as sisters are closer genetically than first-cousins, who are in turn closer than second-cousins and so on, these two paralogous tRNAs gradually diverge more and more from each other with the progress of evolution. On this basis Ape evidently has evolved much less than Ecu or Eco from the root of life.11

In the cell, an aminoacyl-tRNA synthetase (aaRS) must recognize its cognate tRNA accurately and charge only that tRNA with its amino acid substrate. If it charges by mistake a noncognate tRNA, a wrong amino acid will be incorporated into proteins. This recognition process depends on identification by the aaRS ofspecial nucleotide residues on the cognate tRNA called indentity elements, which are not found on the noncognate tRNAs. Crystal structures of aaRS-tRNA complexes show the aaRS and its cognate tRNA making close contacts at these positions and usually more than three idenity elements are required for accurate recognition. Accordingly, tRNAPhe and tRNATyr, from the moment they became paralogs to separately accept Phe and Tyr, would always need to maintain a difference of three or more bases between them as differentiating identity elements. This suggests that the tRNAPhe and tRNATyr of Ape, with only a 4-base difference, have likely undergone little change for an estimated 3.6 billion years (Fig. 15.2). Such sequence ultra-conservatism is unheard of among proteins.

From vertebrate evolution, it is known that the earliest vertebrates are a group of jawless fishes (Agnatha). Many of them are now extinct, but others such as lamprey and hagfish are still alive and well. From the phylogenetic viewpoint, it may be said that lamprey is closer than the rabbit to the ancestral vertebrate. This by no means implies that the lamprey is a more ancient organism than the rabbit. In fact they are both modern animals

*J. Tze-Fei Wong—Applied Genomics Center, Fok Ying Tung Graduate School and Department of Biochemistry, Hong Kong University of Science and Technology Clear Water Bay, Hong Kong, China. Email: [email protected]

Figure 15.1. Universal tRNA phylogenetic tree with the DaMo distances of various species shown in thermal scale.11,8 Dashed line shows formation of Tma-proximal LBACA from ancient relative of Ape. Dotted lines show formation of Pfa-proximal LECA from endosymbio-sis between Fac-related archaeal host and Rpr-related bacterium. Species names: ARCHAEA. Crenarchaeota: Ape Aeropyrum pernix, Neq Nanoarchaeum equitans, Pae Pyrobaculum aerophilum, Sso Sulfolobus solfataricus, Sto Sulfolobus tokodaii; Euryarchaeota: Afu Archaeoglobus fulgidus, Fac Ferroplasma acidarmanus, Hal Halobacterium NRC-1, Mja Methanococcus jannaschii, Mka Methanopyrus kandleri, Mac Methanosarcina acetivorans, Mba Methanosarcina barkeri, Mma Methanosarcina mazei, Mth Methanothermobacter thermau-totrophicum, Neq Nanoarchaeum equitans, Pab Pyrococcus abyssi, Pfu Pyrococcus furiosus, Pho Pyrococcus horikoshii, Tac Thermoplasma acidophilum, Tvo Thermoplasma volcanium; BACTERIA. Aae Aquifex aeolicus, Tma Thermotoga maritima, Dra Deinococcus radiodurans, Ctr Chlamydia trachomatis, Bbu Borrelia burgdorferi, Tpa Treponema pallidum, Blo Bifidobacterium longum, Mtu Mycobacterium tuberculosis, Sco Streptomyces coelicolor, Ana Anabaena sp, Syn Synechocystis 6803, Tel Thermosynechococcus elongatus, Bsu Bacillus subtilis, Cac Clostridium acetobutylicum, Lla Lactococcus lactis, Lin Listeria innocua, Mpn Mycoplasma pneumoniae, Spn Streptococcus pneumoniae, Tte Thermoanaerobacter tengcongensis, Atu Agrobacterium tumefaciens, Ccr Caulobacter crescentus, Rpr Rickettsia prowazekii, Nme Neisseria meningitidis, Rso Ralstonia solanacearum, Bap Buchnera aphidicola, Eco Escherichia coli, Hin Haemophilus influenzae, Psa Pseudomonas aeruginosa, Sty Salmonella typhi, Vch Vibrio cholerae, Xca Xanthomonas campestris, Xfa Xylella fastidios, Cje Campylobacter jejuni, Hpy Helicobacter pylori; EUKARYA. Ath Arabidopsis thaliana, Cel Caenorhabditis elegans, Dme Drosophila melanogaster, Ecu Encephalitozoon cuniculi, Gth Guillardia theta, Hsa Homo sapiens, Pfa Plasmodium falciparum, Sce Saccharomyces cerevisiae, Spo Schizosaccharomyces pombe.

and both their lineages are traceable back to the same vertebrate beginning in the late Cambrian period close to 500 million years ago. However, because the lamprey lineage has stayed close to the habitat of the ancestral vertebrate, they have experienced relatively little need to develop for example extensive anatomical and physi ological modifications. In contrast, the rabbit lineage went on land, changed to air breathing and started to run around on four legs. Not surprisingly the rabbit looks and behaves unlike the early Agnatha. Therefore, for the purpose ofunderstanding the respiratory physiology or brain structure of the earliest vertebrates, lamprey makes a

far better model than rabbit. By the same token, judging by tRNA sequences, the Ape (archaeal, not primate) lineage for whatever the reasons has undergone much less molecular evolution than Eco or Ecu. It follows that Ape is a closer model for LUCA compared to Eco and Ecu.

There are twenty kinds of alloacceptor tRNAs in the genome of any free living organism and one may determine for any genome the average pairwise genetic distance for its 190 alloacceptor pairs to obtain its alloacceptor distance Dalio, which varies from a lowest 0.351 for Methanopyrus kandleri (Mka), to the second-lowest 0.402 for Ape, up to 0.760 for humans (Hsa) and the peak value of 0.839 for Sce (namely yeast). These results indicate that the various Mka tRNAs are tightly clustered in sequence space with a high level of resemblance between them. In contrast, the tRNA sequences in Sce are far more dispersed. These results may be interpreted based on the cluster dispersion model of tRNA evolution (Fig. 15.3) where the tRNAs were initially closely packed in sequence space in the center P ofthe diagram, but evolved outward away from one another into unoccupied sequence space. In the sky, the cosmic Big Bang scatters stars from a point source, with the result that the distances between stars continually increase with time. In the tRNA dispersion brought about by mutations, the distance between any two tRNA lineages mostly increases with time, e.g., for tRNAs A and B, but occasionally it may also decrease, e.g., for C and D. For any genome, Dallo is a meaure of its evolved distance from P: the slower evolving species would have smaller Dallo and the faster ones larg er Dallo.

On this basis, LUCA would be located closest to the least evolved genomes identifiable by their minimal Dallo distances. Figure 15.1 shows the Dallo values of various genomes in thermal scale on the tRNA tree. The genomes with low Dallo are not scattered all over the tree but centered at the deep-branching Archaea, which suggests that Dallo is not an erratic and therefore useless parameter but a well behaved one. Since Mka in the Euryarchaeota and Ape in the Crenarchaeota, divisions ofthe Archaea domain are the two genomes with the lowest Dallo, LUCA is located between the branches leading to Mka and Ape in these two divisions.11 Moreover, because the elongator tRNAMet and initiator tRNAMet accept the same amino acid but for different functions, they are not treated as alloacceptors for calculating Dallo. When the distance between these two tRNAs is estimated for the various genomes, the minimum elongator-initiator distance is again displayed by Mka.12 Accordingly the Dallo distances and the elongator-initiator tRNAMet distances contribute Lines 1 and 2 to the array of evidence for locating LUCA proximal to Mka (Table 15.1). Line 1 is further supported by constraint analysis, which shows archaeal tRNAs to be the ancestral group relative to viral, eukaryotic and bacterial tRNAs.13

The universal tRNA tree confirms the three-domain structure of the SSU rRNA tree (Fig. 2.3), the deduction of which by Woese constitutes an outstanding accomplishment in molecular evolution.14,15 However, the tRNA tree shows the Gram-positives to be deeper-branching than cynaobacteria in the Bacteria domain. In this regard it differs from the early SSU rRNA tree16 which shows the cyanobacteria to be deeper-branching than Gram-positives, but agrees with recent SSU rRNA tree17 and SSU/LSU rRNA tree18 that have reversed this branching order in the early SSU rRNA tree. Because SSU rRNA is devoid of paralog and therefore cannot supply a basis for finding the root of life by itself, the SSU rRNA tree was rooted in the Bacteria domain instead ofthe Archaea domain15 based on preliminary paralogous rootings of elongation factor and ATPase employing in each case only a single achaeal species, which is grossly inadequate in view of the pitfalls of this type of rootings. Thus the intra-archaeal location of LUCA on the tRNA tree (Fig.

Table 15.1. Lines of evidence locating LUCA close to Methanopyrus8


Type of Evidence


Alloacceptor tRNA distances


Initiator-elongator tRNAMet distances


Anticodon usages


Aminoacyl-tRNA synthetase distances


Archaeal root of ValRS


Lack of GlnRS in Mka


Lack of AsnRS in Mka


Lack of CysRS in Mka


Lack of cytochromes in Mka


Early Euryarchaea-Crenarchaea separation


Mka as deep-branching archaeon


Primitivity of methanogenesis


Primitivity of anaerobiosis


Primitivity of hyperthermophily


Primitivity of barophily


Primitivity of acidophily


Use of CO2 as electron acceptor




Advantage of hydrothermal vents


Minimalist regulations

15.1) departs only from these preliminary paralogous protein rootings and not from the SSU rRNA tree itself in this regard.

Earlier, tRNA sequence space analysis by statistical geometry has suggested that the process oftranslation started from a distribution of RNA molecules comprising GC-rich sequences less than 100 nulceotides in length.19 This suggestion is validated by the 72.5% GC content of the tRNAs of Mka close to LUCA. Figure 15.4 shows the GC-rich consensal Mka tRNA as ancestral tRNA archetype. Even though the majority of Mka tRNAs and accordingly also the consensal Mka tRNA do not possess a long variable arm located 3' to the anticodon stem, it is found that the variable arm is ancient in origin and likely to be present among LUCA tRNAs.13 Prior to the development of translation, sequences similar to the tRNA-archetype might have played other roles such as the formation ofpeptidyl-ribozymes (Section 14.2), 3'-aminoacylatable structures on RNA,20 or service as 'genomic tag' replication-initiation sites at the 3'-terminus of linear RNA genomes.21 The encoding of primitive tRNA genes is discussed in Section 13.5.

(ii) Anticodon Usages

The collection of tRNA genes in a genome, determined from the complete genomic sequence, reveal the nature ofthe anticodons employed by the species. These species-specific anticodon usages provide a wealth of interesting information on how genetic coding is implemented in different species. In the genetic code there are thirteen standard 4-codon boxes where the four codons in the box are allocated either all to the same amino acid, e.g., the family box of GUN for Val, or equally to two amino acids, e.g., AAU-AAC for Asn and AAA-AAG for Lys. In most bacterial and eukaryotic species these thirteen boxes are read by varying combinations of anticodons (Fig. 15.5). For example, Sce employs three anticodons bearing a 3'G, U or C to read the UUN, CAN, AAN, GAN, AGN and GGN boxes; three anticodons bearing a 3'A, U or C to read the GUN, UCN and ACN boxes; two anticodons bearing a 3'G or U to read the CUN box; two anticodons bearing a 3'A or U to read the CCN and GCN boxes; and two anticodons bearing a 3'A or C to read the CGN box. In contrast, no archaeon uses more than two combinations. The overall results underline a sharp divide between the complex, mainly multiple-combination anticodon usages

Figure 15.3. Cluster-dispersion model of tRNA evolution. Branchings are gene duplications which generate either new isoacceptors (black lines) or alloacceptors (red lines). The representative numbers of tRNA genes in the tRNAome at different stages are shown in circles.11

of Bacteria and Eukarya on the one hand and the simple, mainly single-combination anticodon usages of Archaea on the other:22 Bacteria, 34 species: 2 species each use five or more combinations, 7 use four combinations, 19 use three combinations, 5 use two combinations and only Tma uses a single GNN+UNN+CNN combination. Eukarya, 7 free living species: 1 species uses five combinations, 1 uses four combinations, 3 use three combinations and 2 use two combinations. Archaea, 18 free living species: 13 species each use a single GNN+UNN+CNN three-anticodon combination. Mka, befitting its proximity to LUCA, displays even greater simplicity in its use ofa single GNN+UNN two-anticodon combination. Mja, Mth and Tvo use a transitional mixture of these two kinds of combinations. Fac oddly uses GNN+UNN+CNN for 12/13 of its standard boxes, but ANN+UNN+CNN for its CUN family box for Leu.

Figure 15.4. Consensal Methanopyrus tRNA.

Insofar that evolution typically moves from the simple toward the complex, the simple archaeal anticodon usages compared to the complex bacterial and eukaryotic usages favors greater primitivity of Archaea (Line 3, Table 15.1). As well, the suggestion has been made based on ribosome morphology that an 'Eocyte' domain should be split off from the Archaea domain.23 However, the use of a single GNN+UNN+CNN combination by all four free living crenarchaeons and the great majority of euryarchaeons points strongly to the unity of the Archaea domain.

While Archaea is phylogenetically distinct from Eukarya and Bacteria, hitherto few unique archaeal characteristics unshared by both of the other domains have been identified besides the archaeal use of ether-lipids versus the bacterial and eukaryotic use of ester-lipids. In this regard, the simple versus complex anticodon usages represents a rare divide between Archaea and Bacteria-Eukarya with respect to a fundamental molecular biological characteristic. What is the evolutionary significance of this divide ? One plausible answer suggested by Knud Nierhaus (personal communication) relates to the ribosomal elongation fac

Figure 15.5. Anticodon usages.22 (*the Tac usage pattern is shared by the following archaeons: Ape, Pae, Sso, Sto, Afu, Hal, Mac, Mba, Mma, Pab, Pfu and Pho).

tor LepA. This factor, one of the most highly conserved proteins, is present in all bacteria and in nearly all eukaryotes, but not in Archaea. It enables back-translocation during translation, prevents ribosome stalling and enhances ribosomal tolerance to changes in ionic concentrations.24 Thus Bacteria, by having Lep A, might be more tolerant of internal-milieu variations than Archaea and therefore better equipped to adapt to wide ranging ecologies. To cope with internal-milieu variations, the base-pairing strengths of codon-anticodon pairs might also have to be fine-tuned, thereby accounting for the multiple-combination anticodon usages of Bacteria and Eukarya. Based on this possibility, the formation of the Bacteria domain from Archaea could be the consequence of adaptive advances such as LepA, enabling the Bacteria to enter into new ecological niches far more easily than Archaea, including the human body. This would help to explain the broad ecological distribution of Bacteria compared to the relative confinement of Archaea to extreme environments and why there are so many human infections caused by bacteria and so few if any by archaeons.

0 0

Post a comment