Yuri Rumer (1966) was the first who saw an element of information hidden in the genetic code. Rumer gathered the triplets of the solid series into one set and the broken series into the other as shown in Fig. 4a. Both sets have the same number of triplets - 32 each. This halving provides an opportunity to map triplets in a one-to-one manner. A mapping was realized with the help of Rumer's unique transformation T o G, C o A or, in otherwise notation, TCAG ^ GACT. The transformation maps each triplet of degeneracy IV set onto a certain triplet within degeneracy III, II, and I set, and vice versa. As an example, consider the GGT triplet in the top left-hand corner. Rumer's transformation maps this triplet onto the TTG triplet that resides in the degeneracy III, II, and I set.
It is easy to see that there are two more transformations of the same type T o C, G o A and T o A, C o G or TCAG ^ CTGA and TCAG ^ AGTC. They each make half of the same that Rumer's transformation makes alone. Because of this, they are referred hereinafter to as 50% transformations. Though each of these three transformations can act independently or can even be absent, they constitute an ordered assembly (shCherbak, 1989a). One can arbitrarily substitute the actual degeneracy for another one to simulate hypothetical evolutionary changes in the genetic code. Substitutions in an overwhelming majority of cases result in the destruction of that assembly.
Fig. 4 Rumer's bisection and transformation of the genetic code and Hasegawa's and Miyata's antisymmetrical correlation of the degeneracy and amino acid nucleon numbers. (a) Rumer halved the 64 life-size triplets into two sets of degeneracy IV and III, II, and I. Each set maps its own 32 triplets onto the triplets of the opposite set either by Rumer's unique transformation or by two other 50% transformations. (b) The amino acid categorization by degeneracy. The amino acids are gathered correspondingly into two sets of degeneracy IV and III, II, and I. The degeneracy of these sets is denoted by the line of variable thickness. Generally, the smallest amino acids are concentrated in the set of highest degeneracy IV, whereas the mid-size and large amino acids occupy degeneracy III, II, and I set. The 20 amino acids are shown as free neutrally charged molecules. The amino acids have two specific component parts each. These parts are the standard block and individual side chain. They are covalently bound within a whole amino acid molecule. There is an imaginary cross-cut in the present research that virtually separates the standard blocks and the side chains of these 20 amino acids. Such virtual act reveals the new arithmetical order in the genetic code. In the figure we showed as an example for glycine (Gly) a standard block common for nineteen amino acids. The standard block of the amino acid proline (Pro) is the only exception. The arrow with a hydrogen atom represents an imaginary borrowing that standardizes the proline block. The bisected genetic code is represented by E. octocarinatus code version; the revealed regularities are also true for the universal genetic code
7 Hasegawa's and Miyata's Nucleons
The amino acids and syntactic signs followed the bisected triplets and gathered correspondingly into two own sets as shown in Fig. 4b. However, Rumer shelved the amino acid sets. Indeed, a mixture of diverse chemical structures together with syntactic signs Start and Stop is undesirable for a general arithmetical approach. One should substitute some common units for that mixture to apply arithmetic.
The relevance of amino acid mass and codon distribution had been recognized soon after the code decipherment and still remains in the sphere of active research interest (e.g. Schutzenberger et al., 1969; Di Giulio, 1989; Taylor and Coates, 1989; Chiusano et al., 2000; Downes and Richardson, 2002). Generally in the genetic code: the greater the degeneracy the smaller the amino acid size. This rather rough correlation leaves a gap for speculations that in the evolutionary history of the genetic code most of these small amino acids (due to their prevalence in number) had captured the biggest series with degeneracy IV. As shown in Section 12, this correlation is nothing but an external representation of the new arithmetical order in the genetic code. Nevertheless, Hasegawa and Miyata (1980) supplied the correlation with an integer-valued parameter - a nucleon number. They noted once again that the degeneracy and nucleon number of the 20 amino acids correlate antisymmetrically. After Hasegawa and Miyata, we have used a nucleon as an embodiment of arithmetical unit inside the genetic code.
"Nucleon" is the common name for two nuclear particles - a positive charged proton and an uncharged neutron. The most common and stable isotopes are taken to calculate the nucleon numbers of the 20 amino acids. For instance, the nucleon number of the glycine and tryptophan side chain equals 1 and 130, respectively. The only exception from the general structure of amino acids is provided by proline. It holds its own side chain with two bonds and has one less hydrogen inside the standard block. However, an imaginary borrowing of one nucleon from the proline side chain in favor of its block brings the block nucleon number to the standard 73 + 1 = 74, whereas the side chain nucleon number becomes 42 - 1 = 41. The syntactic sign Start is associated with the amino acid methionine whose nucleon number is 75. The nucleon number of the syntactic sign Stop, which has no associated amino acids, is designated as zero. Note that this assertion has introduced zero into the genetic code arithmetic. Both imaginary acts - the cross-cut and the borrowing - are an artificial obstacle insurmountable to natural events preventing them from establishing the new arithmetical order in the genetic code. Such virtualization of the genetic code acts regularly in what follows.
We have chosen the common arithmetical units and zero instead of the mixture of chemistry and linguistics. Finally, we should specify some particular sets for these units. Reasonably, we can assume that the genetic code as a whole is the best set for an initial research.
Was this article helpful?