Deciphering the Code

The first step after the discovery of mRNA (1956-1961) was to elucidate the code by which amino acid sequences of proteins are written in the nucleotide sequences of mRNA and correspondingly in the nucleotide sequence of one of the two DNA chains (see Gamov, Rich & Ycas, 1956). Even before the discovery of mRNA, theoretical considerations led to the assumption that each amino acid had to be coded by a combination of at least three nucleotides. Indeed, proteins are composed of 20 sorts of natural amino acids (Fig. 2.1), whereas nucleic acids contain only 4 types of nucleotide residues; the nitrogenous bases of nucleic acids are adenine (A), guanine (G), cytosine (C), and either uracil (U) for RNA or thymine (T) for DNA. It was obvious that one nucleotide could not code for one amino acid (4 vs. 20). There could be 16 dinucleotide combinations, or doublets, a number again insufficient to code for 20 amino acids. Thus, the minimal number of nucleotide residues in a combination coding for one amino acid had to be three; in other words, amino acids most probably had to be coded by the nucleotide triplets. The number of possible triplets is 64, more than enough for the coding of 20 amino acids.

There were two possible explanations for excessive triplets: either only 20 triplets are "meaningful", i.e. may code for one or another amino acid, while the other 44 are nonsense ones, or amino acids may be coded by more than one triplet, in which case the code would be degenerate.

Furthermore, the triplet code could be overlapping when a given nucleotide is part of three strongly overlapping or two less overlapping coding triplets; alternatively, it could be nonoverlapping when independent coding triplets are adjacent to each other in the template nucleic acid or are even separated by noncoding nucleotides. The observation that point mutations (i.e. changes of a single nucleotide in the nucleic acid molecule) usually lead to a change of only one amino acid in the corresponding protein provided evidence against the idea of an overlapping code. Moreover, the overlapping code would inevitably result in the possible neighbors of a given amino acid residue being restricted, a situation that has never been observed in actual protein sequences. Therefore a nonoverlapping cod appeared more likely.

Finally, it had to be demonstrated whether the coding triplets were separated by noncoding residues, or commas, or whether they were read along the chain without any punctuation; in other words, whether the code was comma-free or not. The comma-free case leads to the problem of the reading frame of the template nucleic acid: only a strict triplet-by-triplet readout from a fixed point on the polynucleotide chain could result in an unambiguous amino acid sequence.

The classic experiments of Crick, Brenner and associates published at the end of 1961 established that the code is triplet, degenerate, nonoverlapping, and comma-free. In these experiments, numerous mutants were obtained in the rII region of the T4 bacteriophage gene B using chemical agents which produced either insertions or deletions of one nucleotide residue during DNA replication. Proflavine and other acridine dyes were used for this purpose. Nucleotide insertions or deletions close to the gene origin resulted in a loss of gene expression. By recombining different mutant phages in Escherichia coli cells, phenotypic revertants showing normal gene expression were obtained. An analysis of the revertants demonstrated that gene expression was restored if the region with the deletion was located near the region with the insertion, or vice versa. Gene expression could also be restored if two additional insertions (or deletions) were introduced near the region with the initial insertion (or, respectively, deletion). The following conclusions were drawn: (1) Insertion or deletion of a single nucleotide at the beginning of the coding region appeared to result in a loss of all the coding potential of the corresponding gene instead of simply a point mutation; the inactivation could be the result of a shift of the reading frame. (2) Deletion or insertion located close to the initial insertion or deletion, respectively, restored the coding potential of the sequence because the original reading frame was restored. (3) Three, but no fewer, closely located insertions or deletions also restored the initial coding potential of the nucleotide sequence. From the results of these experiments, it follows that the code is triplet, and that triplets are read sequentially without commas from a strictly fixed point in the same frame. These experiments also provided additional evidence that the code is degenerate: if many of the 64 possible triplets were nonsense ones, it was highly probable that at least one nonsense triplet appeared in the region between the insertion and deletion or between the three insertions where the readout occurs with a shift of frame; this would lead to an interruption of the polypeptide chain synthesis.

Deciphering the nucleotide triplets also began in 1961 when Nirenberg and Matthaei discovered the coding properties of synthetic polyribonucleotides in cell-free translation systems. The possibility of


UUUl uucr

0 0


  • frank
    How were the 64 possible triplet codes deciphered?
    1 year ago

Post a comment