Towards the Appearance of Proteins

One can imagine two ways to build up proteins: (i) to start with a more or less structural role of maybe otherwise 'boring' oligo/polypeptides, which later became complemented by slowly emerging catalytic potential; or (ii) to introduce catalytically highly promising amino acids which later become complemented by structural supports that would ultimately fold without the help of RNA. We prefer the second alternative, since (as explained above) it offers a straightforward way of the appearance of coding before translation, and it provides a substantial and immediate selective advantage in the RNA world.

But it is a good question to ask how single amino acids could have grown to polypeptides in this scenario. Presumably, first dipeptides would appear that would be kept in place after formation of the peptides bond between two adjacent CCH molecules (cf. Szathmary, 1999), which would then result in a dipeptide bound to one of the adaptors, an intermediate that is identical to Wong's (1991)

peptidyl-tRNA; the other adaptor would be liberated. Further growth can be envisaged under selection for improved enzymatic activity, but then the burning question arises: what would keep the growing polypeptide in a stable/useful conformation? One possibility is that it is binding to the RNA 'scaffold' of the ribozyme, as mentioned previously (Soding and Lupas, 2003). Another would be the build-up of the smallest possible foldable structures, which are the b-turns stabilized by short b-sheets (Lesk, 2001). Knowing the opportunistic nature of evolution, we would not exclude either possibility. Following a suggestion by Orgel (1977), Jurka and Smith (1987) argued that the first b-turns were encoded by RRN codons, which include, with exception of His, all the catalytically most important amino acids! But the second most important group of amino acids for the b-turns is the YRN group, which includes His, and the two add up to NRN, the last two columns of the code (Fig. 3). Dendrogram Fig. 6c confirms this idea strongly. Amino acids with the NRN pattern are also mediocre a-helix and b-sheet builders (Fig. 6b), so experimentation in that direction was not totally excluded either. Data (Prevelige and Fasman, 1989) show that proline is the third strongest loop builder, which makes it the only real exception to the NRN rule (Ser has also an AGY codon).

It is known that one can make enzymes with fewer than 20 amino acids. Walter et al. (2005) managed to evolve a chorismate mutase built of nine amino acids only: Arg, Asp, Glu, Asn, Lys, Phe, Ile, Leu, and Met. Noting the redundancies Asp/Glu and Ile/Leu, the enzyme could be probably even more simplified in the future. Remarkably, its active site (Fig. 9) is built of RRN codon-type

Arg 11


Fig. 9 Proposed active site of a chorismate mutase built of just nine amino acids. (Walter et al. 2005)





Fig. 9 Proposed active site of a chorismate mutase built of just nine amino acids. (Walter et al. 2005)

amino acids only! Adding amino acids of the NUN codon-type allows the set to build proper a-helices; amino acids with NCN codons are not required at all. Evolution of this enzyme is in rather good agreement with the theoretical estimate that the minimum number of amino acids to fold a protein is around ten (Fan and Wang, 2003).

The next amino acids, to further stabilize the ^-turns with ^-sheets were amino acids with NYN codons (Jurka and Smith, 1987). The proposal by Di Giulio (1996) that the genetic code was driven by the need to form ^-sheets is in our view secondary to the catalytic propensity/^-turn-driven primary evolution, but independently important to build the scaffolds for the catalytic structures. But selection for structures in this order gives a-helices for free since the code is virtually complete. We suggest that the multiplicity of catalytically boring amino acids in the genetic code is explained by selection for fine-tuning of the 3D structure of the scaffolds to optimize the geometric arrangements of the active sites.

This scenario is supported by the correlation coefficients between the amino acids properties (Table 1).

There is a weak negative correlation between catalytic and a-helix propensity, so the latter structures cannot arise based on amino acids selected for catalysis; it is easiest to go for the ^-turns. From these, one can go by evolution (amino acid vocabulary extension) in the direction of either the ^-sheets (slightly favoured) or the a-helices, but presumably not in both.

We performed a network analysis of the BLOSUM amino acid substitution matrix (Henikoff and Henikoff, 1992) in order to see along which lines amino acid vocabulary extension/replacement could have been most likely (Fig. 10).

The most common substitutions occur within the (Lys, Arg), (Ile, Val), and (Phe, Tyr, Trp) sets. The firs dyad is clearly a catalytically very important one, and this is another reason why we suggest that Arg replaced Lys before LUCA. Note the remarkable role of His in these plots also. The catalytically most important amino acid His builds the bridge via the Tyr-His-(Asn, Gln) link between the catalytically unimportant and important (internally well connected) clusters (Fig. 10c,d)! We have specific RNA aptamers (Caporaso et al., 2005) for the whole bridge (none for Asn, but the bridge is still functional via Gln), so we suggest that it has been built in the RNA world.

Table 1 Correlation between pairs of amino acid properties related to the construction of active sites and scaffolds

Trait pair

Correlation coefficient

Activity, a-helix Activity, P-sheet Activity, P-turn a-helix, P-sheet a-helix, P-turn P-sheet, P-turn

-0.15049 0.31893 0.38407 0.42399 0.64948 0.67883

Fig. 10 Connectivity of the amino acid substitution network based on the BLOSUM 62 matrix (Henikoff and Henikoff, 1992). Colour codes of amino acids refer to those of Fig. 7a (a, c, e) and Fig. 7b (b, d, f). Substitution data have been transformed: we added 6 to the originals given by Henikoff and Henikoff (1992). For clarity, loops given in the main diagonal are not shown. We illustrate the network with different lower thresholds (minimal frequency values) for the transformed substitution data: 9 in a and b, 7 in c and d, and 6 in e and f (9 is the strongest value, i.e. it was equal to 3 in the original data matrix). Note that the network is undirected. Drawn by UCINET. (From Borgatti et al., 2002)

Was this article helpful?

0 0

Post a comment