Proteins

Based on fossils, palaeontology has unearthed a magnificent panorama of organisms that populated planet Earth during the fossil-bearing periods starting in the Cambrian period. Further back into the the Precambrian 600 million years ago, the Ediacara fauna discovered near Adelaide in Australia, for example, comprise shell-less specimens of jelly-fish and segmented worms. Still further back in time, fossil microorganisms could be recognized microscopically from their relatively uniform imprints inside rocks and molecular evolution may be traced through protein and nucleic acid sequences. Proteins such as cytochrome c, histones and hemoglobin have yielded invaluable information on biological evolution. However, for tracing events back to LUCA times three billion years ago, proteins tend to be too fast evolving and burdened by horizontal gene transfers (HGT) and other perturbations. For example, the RNA polymerase P and P' subunits tree of Bacteria positions Aquifex away from the root of the clade, but moves it close to the root upon removal oftwo Mycoplasmas from the tree.25 Likewise, the tree of a combined-protein set shows either spiro-chaetes or thermophiles as the earliest bacterial group depending on whether some species are excluded from the tree.26

Two approaches may be called upon to reduce the impact ofarti-facts in extracting phylogenetic information from protein sequences. Sequence homology of paralogous proteins can be estimated through pairwise comparisons without invoking tree construction and inclusion of a large number of species in tree construction can be utilized to maximize detection of invalidities, as illustrated in the following sections.

(i) aaRS Distances

The BLASTP algorithm may be used to estimate the genetic distance between proteins.27 Its application to the 190 pairs of the twenty kinds of aaRS within any genome on the tRNA tree generates 190 bitscores. Whenever a bitscore greater than 60 is observed between two aaRS within any genome, the two aaRS might be regarded as potential paralogs derived from gene duplication. On the basis of this criterion, 10 out of the 190 aaRS pairs are potentially paralogous.12

Among the genomes from the tRNA tree, the ValRS -IleRS pair reaches the highest maximum bitscore of506.5 in Mka (Table 15.2), qualifying readily as potentially paralogous. Nine other pairs also achieve a maximum bitscore higher than 60 in one of the genomes, upward from the 66.2 shown by the ThrRS-GlyRS pair. The top bitscore is not always found in the same genome, even though Mka

Table 15.2. Potentially paralogous pairs of aminoacyl-tRNA synthetases12

aaRS pair

Max Bitscore Top Score

2nd Highest 3rd Highest

ValRS-IleRS

506.5

Mka

Mth

Mja

ValRS-LeuRS

232.3

Tma

Hal

Mka

LeuRS-IleRS

202.6

Pab

Pfu

Pho

MetRS-LeuRS

94.4

Tte

Cac

Bsu

ThrRS-ProRS

91.7

Mka

Pab

Sco

MetRS-ValRS

82.4

Cac

Lla

Tte

IleRS-MetRS

75.1

Mka

Lla

Aae

TrpRS-TyrRS

75.1

Mka

Sso

Sto

SerRS-ProRS

67.4

Lla

Spn

Tte

ThrRS-GlyRS

66.2

Mka

Pho

Pfu

Average (Qars)

Mka 138.5

Mth 119.3

Mja 115.2

turns out to be top scorer in five out of the ten pairs. Averaging the ten bitscores achieved by any genome gives its QARS quotient, which measures how closely its aaRS paralogs still resemble one another today. The QARS of the various genomes are shown in thermal scale on the tRNA tree in Figure 15.6. The three highest QARS of 138.5, 119.3 and 115.2 for Mka, Mth and Mja, respectively, exceed by far for example the 88.2 for Bsu, 60.4 for Eco and 40.9 for Hsa, demonstrating the existence of a clear-cut gradient among the species. Mka is indicated by its highest QARS score to be the slowest evolver in aaRS genotypes on the tree in Figure 15.6 (Line 4, Table 15.1), just as it is indicated by its lowest Dallo score to be the the slowest evolver in tRNA genotypes on the tree in Figure 15.1. However, there are significant differences between the two trees relating to some of the other species. Mth and Mja are quite fast evolving in tRNA genotypes but slow evolving in aaRS genotypes. On the other hand, the Crenarchaea are slow evolving in tRNA but quite fast evolving in aaRS.

(ii) ArchaealRoot ofValRS Tree

Since the ValRS-IleRS pair displays the highest maximum bitscore among all the paralogous aaRS pairs, these two protein sequences could be suitable for paralogous rooting provided there are not excessive perturbations. In order to detect such perturbations, the ValRS and IleRS sequences from the large number of genomes on the tRNA tree are employed to construct the paralog tree in Figure 15.7. The IleRS sequences from Archaea, Bacteria and Eukarya are each split into separate groupings, with the middle cluster on the tree comprising sequences from all three domains. As a result, rooting the IleRS tree using the ValRS sequences as outgroup would be lacking in validity. In comparison, the ValRS sequences are more orderly and largely divided into domain-specific clusters, even though the bacterial sequences of Blo and Rpr are mislocated in the archaeal cluster. Therefore rooting of the ValRS tree using the IleRS sequences as outgroup is comparatively more valid. In this regard, the link from the IleRS outgroup joins the ValRS tree on the red line linking the Archaea ValRS to their junction (point J) with the ValRS from Bacteria and Eukarya, which roots the ValRS tree in the Archaea (Line 5, Table 15.1).8

(iii) Missing Genes

In present day organisms, the Phase 2 amino acids Gln, Asn and Cys are incorporated into proteins via GlnRS, AsnRS and CysRS in some species, but via pretran synthesis from Glu-tRNA, Asp-tRNA and o-phospho-Ser-tRNA respectively in other species.

In genetic code evolution, use ofpretran synthesis predated the use ofaaRS (Section 14.3). Accordingly, using pretran synthesis instead of GlnRS, AsnRS or CysRS to incorporate Gln, Asn or Cys into proteins is a primordial trait and slow-evolving species that have not evolved far from LUCA may be expected to be deficient in one or more of these three aaRS. The fact that the genes for these three aaRS are all missing from the Mka genome adds further evidence to Mka's closeness to LUCA (Lines 6-8, Table 15.1).

Cytochromes are heme-containing proteins that participate widely in electron transport in mitochondria, chloroplasts, sulfate-reducing organisms and even the anaerobic methanogenic Methanosarcina. It is therefore a surprise that cytochrome genes are missing from the genomes of Mka, Mth, Mja, Pfu, Pab and Pho, which are clustered together on the tRNA tree. Since Mka, Pfu, Pab and Pho display some of the lowest DaJio distances and Mka, Mth and Mja display the three highest QARS quotients, this cytochrome-less group represents an ancient six group of genomes that are ultra-conservative in molecular evolution. Since Mka, Mth and Mja employ H2 and CO2 to make methane, whereas Pfu, Pab and Pho produce H2 and CO2 metabolically, the cytochrome deficiency of the group evidently does not stem from metabolic similarity, but from their closeness to a cytochrome-less LUCA. In this light, the cytochrome deficiency of Mka contributes Line 9 to Table 15.1.8

(iv) Ancestral Proteins

The composite phylogenetic tree built from 32 proteins, yielding a timescale for protein evolution, points to the Euryarchaea-Crenarchaea separation being the most ancient biological event, Mka as the deepest branching archaeon and advent of methanogenesis as far back as 3.8-4.1 Gya28 (Lines 10-12, Table 15.1).

The accumulation of atmospheric oxygen from oxygenic photosynthesis brought about extensive proteome adaptations in most organisms. Among the free living organisms, Tma, Mka and Mth exhibit the least post-oxygen proteome adaptations, confirming that Mka is an ultra-conservative organism that has retained the anaerobic character of LUCA29 and this is also suggested by the amino acid composition of reconstructed ancestral proteins (Line 13).30 Reconstructed ancestral proteins in combination with the structure of the genetic code futher support LUCA being a hyper-thermophile, a barophile living in ocean depths and an acidophile, much like Mka to-day.31-33 The correlation between hyperthermo-phily and deep-branching primitivity applies to both Archaea and Bacteria.34 Moreover, the thermostabilities of ancestral elongation factors resurrected by cloning and expression reveal that ancestral organisms far back in time lived at elevated temperatures.35,36 These findings, by throwing light on properties common to Mka and LUCA, contribute Lines 14-16 to Table 15.1.

0 0

Post a comment