Catalytic Propensity of Amino Acids and Organization of the Genetic Code

Catalytic propensity of amino acids (Fig. 2), collected from catalytic sites of known enzymes, are taken from Bartlett et al. (2002), who argued that the sample is representative.

Amino acids with the highest values gather in column A and (with smaller values) column G (Fig. 3). Is this pattern due to chance, or is it significant?

List of catalytic residues were obtained from the Catalytic Site Atlas of EMBL (Porter et al., 2004). Only literature-based entries pertaining to amino acids were used (residues inferred from sequence homologies and non-amino acid residues, such as metal ions and cofactors are left out of our analysis). In total, there were 5845 catalytic residues. The distribution of amino acids among the catalytic residues is markedly different from the frequencies of amino acids found in peptides (Bartlett et al., 2002). We performed a randomization test as follows. We took the biosynthetically restricted random set (Fig. 4) as defined by Freeland et al. (2000), which rests on the potential importance of the co-evolution theory (Wong, 1975) of the genetic code (code assignment was influenced by biosynthetic kinship of amino acids) and the observation that amino acids belonging to the same biosynthetic family tend to share the same first codon letter (i.e. they are in the same row of the table; Taylor and Coates, 1989).

Alternative tables of the genetic code were generated according to Freeland et al. (2000), limiting the number of possible alternatives to 6.48 x 106, compared to the 20, «2.43 x 1018 totally random codes. Each of the 6.48 x 106 code tables was analysed according to the following procedure. First, the list of amino acids is ordered according to catalytic frequency in active sites. The place they occupy in

Fig. 2 Catalytic propensity of amino acids in catalytic sites of known enzymes. (From Bartlett et al., 2002.)

Middle letter

u

c

A

G

Phe

Tyr

Cys

U

U

Ser

C

Stop

Stop

A

G

Leu

His

U

c

Pro

Arg

C

0) tl

Gin

A

zr

G

w

lie

Asn

Ser

U

CD"

lZ

A

Thr

C

CD

Lys

Arg

A

Met

G

Asp

U

G

Val

Ala

Gly

C

Glu

A

G

Fig. 3 Catalytic propensities and /i-turn propensities superimposed on the genetic code. Only the highest values are shown (very high catalytic propensity: red, moderately high catalytic propensity: pink, highest turn propensities: green frame)

Fig. 4 The set of possible codes constrained by biosynthetic kinship (Freeland et al., 2000). In a randomized code any amino acid from set A can occupy any single position in the table, but only from A1 to A5. There are 6.48 x 106 possible alternative codes

Ane {Phe, Ser, Tyr, Cys, Trp} Bne {Leu, Pro, His, Gln, Arg} Cne {île, Met, Thr, Asn, Lys} Dne {Val, Ala, Asp, Glu, Gly}

this list is assigned to them as a rank. In case of a tie the average of the ranks are assigned to each amino acid having the same value. With regard to catalytic frequencies, only serine (or Phe, Tyr, Cys, and Trp in alternative codes) appears twice in the list for being in two columns, and thus has the same catalytic frequency. Sum of the ranks belonging to amino acids present in the same column of the genetic code were squared, and then summed. This procedure is identical to the calculation employed in the Kruskal-Wallis test (Zar, 1998), which is a non-parametric test employed in testing differences between multiple groups. It is similar to one-factor ANOVA, except normality of the data is not required. The pattern that amino acids segregate according to columns of the genetic code (in this order) is statistically significant (p = 0.0107) (Fig. 5), in agreement with the cluster analysis of catalytic propensities (Fig. 6).

We have performed a similar test of propensities for the j-turns (Prevelige and Fasman, 1989) having been taken from the EMBOSS programme (Rice et al., 2000), j-sheets (Muñoz and Serrano, 1994) and a-helices (Muñoz and Serrano, 1994): The values for the b-turns (p = 0.0059) and j-sheets (p = 0.028) have significant columnar organization at the level of single columns.

Activity count

Activity count

2600 2700 2800 2900 3000 3100 3200

Sum of Rank Squares

2600 2700 2800 2900 3000 3100 3200

Sum of Rank Squares

^-sheet propensity

2800 2900 3000

Sum of Rank Squares

^turn propensity flOnqno.

2600 2700 2800 2900 3000 3100 Sum of Rank Squares

Fig. 5 Randomization test for the columnar organization of three amino acid properties in the genetic code. Thin pole indicates the position of the canonical genetic code. See text for further explanation

2600

Central nucleotide

Fig. 6 Cluster analysis of amino acid propensities. (a) Catalytic propensity, (b) a-helix and /-sheet building propensities (taken together), (c) /i-turn forming propensity. The hierarchical clustering was based on an Euclidean distance calculated between amino acid properties, and was carried out with the complete linkage clustering algorithm (Hartigan, 1975) as implemented in the R package, version 2.2.0 (From R Development Core Team, 2004)

Fig. 6 Cluster analysis of amino acid propensities. (a) Catalytic propensity, (b) a-helix and /-sheet building propensities (taken together), (c) /i-turn forming propensity. The hierarchical clustering was based on an Euclidean distance calculated between amino acid properties, and was carried out with the complete linkage clustering algorithm (Hartigan, 1975) as implemented in the R package, version 2.2.0 (From R Development Core Team, 2004)

It is clear that the chemically most 'exciting' amino acids are the catalytically most important ones (Fig. 7), and that the central purine bases play an exclusive role in their coding.

Our suggestion is that the introduction of the first amino acids into the genetic code coincided with the order of decreasing catalytic importance. This presumes that in the ribo-organisms the highly catalytic amino acids were available either in

Central nucleotide

Aliphatic

Tiny

Small

Aromatic

(a) Hydrophobic

Rank in catalytic frequency

Aromatic

(a) Hydrophobic

Small

Polar

Polar

Positive

Charged

Tiny

Small

Aliphatic

Aliphatic

Aromatic

Small

Polar

Fig. 7 Venn diagrams of amino acids (chemical sets from Taylor, 1986). (a) Distribution of amino acids based on the middle letter of the genetic code, (b) distribution of amino acids according catalytic frequency ranks and chemical properties

Aromatic

Polar

Positive Charged

Fig. 7 Venn diagrams of amino acids (chemical sets from Taylor, 1986). (a) Distribution of amino acids based on the middle letter of the genetic code, (b) distribution of amino acids according catalytic frequency ranks and chemical properties the medium or as a result or internal synthesis. As noted by Wong and Bronskill (1979), ideas about amino acid availability in the 'primordial soup' are inadequate when one considers origin of the genetic code. Indeed, if the RNA world was metabolically complex (which seems likely: Benner et al., 1989) then a protracted period of co-evolution of ribozymes, membranes, and metabolism is likely to have taken place (Szathmary, 2007). Nevertheless it is important that, with the exception of lysine and arginine, all catalytically important amino acids seem to have at least some prebiotic plausibility (Miller, 1986), including histidine (Shen et al., 1990). Lysine has two different, complicated biosynthetic routes in modern organisms (Berg et al., 2003), so for the time being it is safer to assume that it is a very late invention. We propose that its role and position in an interim genetic code could have been taken by arginine (see section on protein appearance). We believe that arginine goes back to the RNA world, supported by its recognition by RNA aptam-ers with codonic binding sites (Knight and Landweber, 2000).

As discussed above, the ancient charging enzymes are assumed to have been ribozymes. Specific aptamers for the amino acids Arg, Ile, Tyr, Gln, Phe, His, Trp, and Leu have been selected by now. According to the 'escaped triplet theory', triplets overrepresented in aptamer binding sites for amino acids became part of the modern genetic code (Yarus et al., 2005). Noteworthy in this regard is that in vitro generated RNA aptamers contain - in a statistically important way - anticodonic and, to a lesser degree, codonic binding sites for these amino acids (Caporaso et al., 2005). Although aptamers for Asp and Glu have not yet been selected, it is likely that divalent metal ions could neutralize the repulsion between RNA and these negatively charged amino acids and, ultimately, aptamers will be selected with success (R. Knight, personal communication, email, 2007).

Finally, it is important to mention that RNA molecules can charge amino acids either in cis (Illangasekare et al., 1995) or trans (Lee et al., 2000). Even the phosphate anhydride activation reaction of amino acids is feasible by RNA (Kumar and Yarus, 2001).

Was this article helpful?

0 0

Post a comment