I

Encoding & Transmission

Transmission Channel

Reception & Decoding

Information Sink

Reception & Decoding

Information Sink

MRNA + Translation: Transmission and Decoding

Proteins: Information Sink

DNA: Stored Information

MRNA: Encoded Information

MRNA + Translation: Transmission and Decoding

Proteins: Information Sink

Fig. 1 (a) Generic communication process; (b) simplified information flux for protein synthesis and sinks pertain to the same world. Going into the details of the coding strategy, using an analogy with language we can say that words in the mRNA world are composed of three letters picked out from an alphabet of four letters corresponding to the four possible bases along the mRNA chain, that is, Uracil, Cytosine, Adenine, and Guanine (U, C, A, G). Such a sequence of three letters - an elementary word -is called a codon and there are 64 possible different codons, which are the possible words of length 3' using four different letters in any possible order including repetitions: 4 x 4 x 4 = 64. On the amino acid side, instead, there are 20 different amino acids (see also the interesting paper by Hayes (1998), about the history of the discovery of the standard genetic code and the different original hypothesis about its organization). Because 64 different codons encode 20 different amino acids plus the termination signal, some amino acids are necessarily coded by more than one codon. This fact leads us to the redundancy and degeneracy properties of the code, which are described in detail below. In language too we find the same kind of redundancy, because there exist words that are synonyms. Some elemental means of error correction are connected to redundancy and to the concept of distance between words in human languages (see Chapter 18, this volume). In the case of the genetic code we also have synonymous codons, but these correspond to different words in a language (mRNA codons) that have the same meaning in the other language (amino acids), and which are listed in the bilingual dictionary (the genetic code).

In this contribution we shall study the genetic code from the point of view of communication and coding theory. Now that we have established that the genetic code is truly a code in the sense assigned to the word in communication theory, we attempt to discover the internal mathematical organization of the code, if any. The finding of different levels of internal structure in the genetic code needs to be related to the functionality of the mechanisms that are mediated by the code itself, that is mainly protein coding. However, if highly structured patterns are found, which is the case here, such an organization of the genetic information may carry a deeper meaning: perhaps it can contribute to revealing the basic mechanism for storage and utilization of the genetic information in other stages of the genetic machinery, such as replication and transcription. Some aspects related to the organizational structure of the genetic code were clear soon after its discovery. However, complete descriptions of such order are still missing and their meaning remains unclear.

From a fundamental point of view, we need to search for the structural organization of the code at a mathematical level. Thus, we search for mathematical structures describing in a closed and complete way the known properties of the genetic code, mainly the two levels of degeneracy distribution; this kind of study can contribute to clarifying the origin of the code and the functionality of the biological processes mediated by it. The existence of a strong mathematical organization of the code is demonstrated here, and it is guessed that this mathematical organization implies a coding strategy exploited by suitable non-random error detection/ correction mechanisms. It is thought also that such mechanisms may operate on the basis of principles borrowed from the theory of dynamical systems and, in particular, based on the description of their associated symbolic dynamics.

Was this article helpful?

0 0

Post a comment